en:pfw:sha-256
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:pfw:sha-256 [2024-10-14 12:55] – willem | en:pfw:sha-256 [2024-10-17 13:09] (current) – jeroenh | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ##SHA-256 HASH-generator | + | ##SHA-256 HASH-generator## |
Line 184: | Line 184: | ||
h4 .32hex | h4 .32hex | ||
- | create ABC 97 c, 98 c, 99 c, | + | \ ===== END of code |
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | **Testing the SHA-256 algorithm** | ||
+ | |||
+ | |||
+ | Testing the SHA-256 algorithm can be done with the following code. The ' | ||
+ | |||
+ | |||
+ | < | ||
+ | |||
+ | \ ===== TEST routines | ||
+ | |||
+ | create ABC 97 c, 98 c, 99 c, | ||
+ | : ldr cr ." must be: " ; | ||
: proof ( -- ) | : proof ( -- ) | ||
ABC 3 sha256 | ABC 3 sha256 | ||
- | cr 9 spaces .hahdr | + | cr 9 spaces .hahdr |
- | cr ." must be: " | + | |
." ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad" | ." ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad" | ||
cr ." SHA-256: " .hash ; | cr ." SHA-256: " .hash ; | ||
+ | : 65aaa 65 0 do [char] a c, loop ; create AAA 65aaa | ||
+ | : aaatest ( n -- ) | ||
+ | aaa swap 0 max 65 min sha256 cr ." SHA-256: " .hash ; | ||
+ | |||
+ | : fulltest \ tests all critical lengths for the function | ||
+ | cr 9 spaces .hahdr ldr | ||
+ | ." e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855" | ||
+ | 0 aaatest ldr | ||
+ | ." ca978112 ca1bbdca fac231b3 9a23dc4d a786eff8 147c4e72 b9807785 afee48bb" | ||
+ | 1 aaatest ldr | ||
+ | ." 9f4390f8 d30c2dd9 2ec9f095 b65e2b9a e9b0a925 a5258e24 1c9f1e91 0f734318" | ||
+ | 55 aaatest ldr | ||
+ | ." b35439a4 ac6f0948 b6d6f9e3 c6af0f5f 590ce20f 1bde7090 ef797068 6ec6738a" | ||
+ | 56 aaatest ldr | ||
+ | ." 7d3e74a0 5d7db15b ce4ad9ec 0658ea98 e3f06eee cf16b4c6 fff2da45 7ddc2f34" | ||
+ | 63 aaatest ldr | ||
+ | ." ffe054fe 7ae0cb6d c65c3af9 b61d5209 f439851d b43d0ba5 997337df 154668eb" | ||
+ | 64 aaatest ldr | ||
+ | ." 635361c4 8bb9eab1 4198e76e a8ab7f1a 41685d6a d62aa914 6d301d4f 17eb0ae0" | ||
+ | 65 aaatest cr ; | ||
- | \ ===== the END | ||
</ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | **Optimising performance of the SHA-256 hash generator - step 1 ** | ||
- | **Optimising performance of the SHA-256 hash generator** | + | Creating SHA-256 hashes is an area where time of execution is usually seen as relevant. The above version, using generic Forth, will run on most Forth systems but is rather slow. Running on wabiForth in tests it achieves a throughput of around 2.6 MB/s. |
+ | |||
+ | One way of optimising performance is to use Forth-Macros, | ||
+ | |||
+ | |||
+ | |||
+ | < | ||
+ | |||
+ | |||
+ | \ ======================================================== | ||
+ | \ ANS Forth code for Secure Hash Algorithms SHA-256 | ||
+ | \ This version: copyright (c) 2024 Jeroen Hoekstra | ||
+ | \ ======================================================== | ||
+ | |||
+ | \ ===== THIS WORK BASED ON: ============================ | ||
+ | \ FIPS 180-4 specs at: http:// | ||
+ | |||
+ | \ ...previous work by Jabari Zakiya | ||
+ | \ Copyright (c) 2001-2013 Jabari Zakiya - jzakiya@mail.com - 15jan2013 | ||
+ | \ https:// | ||
+ | |||
+ | \ ...previous work by Wil Baden | ||
+ | \ Forth Dimensions (FD) Vol. 19, No. 2, July/August 1997 | ||
+ | |||
+ | \ Use of this code is free, subject to acknowledgment of the copyrights, | ||
+ | \ especially those of Jabari Zakiya and Wil Baden | ||
+ | |||
+ | \ ===== DEPENDENCIES | ||
+ | \ This version for 32bit, little endian, byte-addressable CPUs, | ||
+ | \ and strings with a 32bit index | ||
+ | |||
+ | \ dependencies: | ||
+ | \ common usage: ROTR TUCK BYTES>< | ||
+ | \ and CASE-independency | ||
+ | |||
+ | \ ===== MACRO Wordset =================================== | ||
+ | \ This implementation depends heavily on MACROs to speed up | ||
+ | \ the performance. Macros are not very portable. If the code does | ||
+ | \ not run on your system, than use the reference of Wil Baden to | ||
+ | \ adapt the MACRO words to your system | ||
+ | |||
+ | \ This version of MACRO allows insertion of parameters following | ||
+ | \ the macro -- " | ||
+ | \ Example: | ||
+ | \ : FOO .. ?? exit .... ; ?? compiles to -- IF EXIT THEN | ||
+ | |||
+ | \ ===== USAGE ========================================== | ||
+ | \ To generate and show the hash of a string or counted message do: | ||
+ | \ Example: S" some message string" | ||
+ | |||
+ | \ ===== words not in CORE or CORE_EXT =================== | ||
+ | |||
+ | \ for optimal performance the following two words could be done in assembly | ||
+ | : ROTR ( n rot -- rotated_n ) | ||
+ | 32 mod 2dup 32 swap - lshift >r rshift r> or ; | ||
+ | hex : BYTES>< | ||
+ | dup >r 18 lshift r@ ff00 and 8 lshift or | ||
+ | r@ ff0000 and 8 rshift or r> 18 rshift or ; decimal | ||
+ | |||
+ | \ ======================================================== | ||
+ | \ unused \ 8296 bytes excluding BMs and BM-buffers | ||
+ | |||
+ | \ ===== MACRO definitions =============================== | ||
+ | decimal | ||
+ | : PLACE ( caddr n addr -- ) 2dup ! 4 + swap move ; \ pressumes a 32b string-index | ||
+ | : SSTRING ( char " | ||
+ | |||
+ | : SPLIT-AT-CHAR | ||
+ | >r 2dup begin | ||
+ | dup | ||
+ | while | ||
+ | over c@ r@ - | ||
+ | while | ||
+ | 1 /string | ||
+ | repeat then | ||
+ | r> drop tuck 2>r - 2r> ; | ||
+ | |||
+ | : [MACRO] | ||
+ | count begin | ||
+ | 92 ( =' | ||
+ | while | ||
+ | bl word count evaluate 2r> 1 /string | ||
+ | repeat | ||
+ | r> drop r> drop ; | ||
+ | |||
+ | \ Macro creation word - allows parameter insertion | ||
+ | : MACRO create immediate sstring does> [macro] ; | ||
+ | |||
+ | \ ===== Utility Words ================================== | ||
+ | |||
+ | : ]L postpone ] postpone literal ; immediate | ||
+ | |||
+ | \ ===== Start SHA-256 Code ============================= | ||
+ | 32 constant cellsize | ||
+ | |||
+ | 2variable shalen | ||
+ | create | ||
+ | create | ||
+ | create | ||
+ | |||
+ | hex create | ||
+ | 428a2f98 , 71374491 , b5c0fbcf , e9b5dba5 , 3956c25b , 59f111f1 , 923f82a4 , ab1c5ed5 , | ||
+ | d807aa98 , 12835b01 , 243185be , 550c7dc3 , 72be5d74 , 80deb1fe , 9bdc06a7 , c19bf174 , | ||
+ | e49b69c1 , efbe4786 , 0fc19dc6 , 240ca1cc , 2de92c6f , 4a7484aa , 5cb0a9dc , 76f988da , | ||
+ | 983e5152 , a831c66d , b00327c8 , bf597fc7 , c6e00bf3 , d5a79147 , 06ca6351 , 14292967 , | ||
+ | 27b70a85 , 2e1b2138 , 4d2c6dfc , 53380d13 , 650a7354 , 766a0abb , 81c2c92e , 92722c85 , | ||
+ | a2bfe8a1 , a81a664b , c24b8b70 , c76c51a3 , d192e819 , d6990624 , f40e3585 , 106aa070 , | ||
+ | 19a4c116 , 1e376c08 , 2748774c , 34b0bcb5 , 391c0cb3 , 4ed8aa4a , 5b9cca4f , 682e6ff3 , | ||
+ | 748f82ee , 78a5636f , 84c87814 , 8cc70208 , 90befffa , a4506ceb , bef9a3f7 , c67178f2 , | ||
+ | decimal | ||
+ | |||
+ | 0 value H[H] \ Pointer to addr of hash value H for each round | ||
+ | macro H[G] " h[h] [ 1 cells ]l +" | ||
+ | macro H[F] " h[h] [ 2 cells ]l +" | ||
+ | macro H[E] " h[h] [ 3 cells ]l +" | ||
+ | macro H[D] " h[h] [ 4 cells ]l +" | ||
+ | macro H[C] " h[h] [ 5 cells ]l +" | ||
+ | macro H[B] " h[h] [ 6 cells ]l +" | ||
+ | macro H[A] " h[h] [ 7 cells ]l +" | ||
+ | |||
+ | hex : SHAINIT ( -- ) \ load initial sha-256 hash values h0 - h7 | ||
+ | 6a09e667 ( h0) dup [ shaval 7 cells + ]l ! [ shash 7 cells + ]l ! | ||
+ | bb67ae85 ( h1) dup [ shaval 6 cells + ]l ! [ shash 6 cells + ]l ! | ||
+ | 3c6ef372 ( h2) dup [ shaval 5 cells + ]l ! [ shash 5 cells + ]l ! | ||
+ | a54ff53a ( h3) dup [ shaval 4 cells + ]l ! [ shash 4 cells + ]l ! | ||
+ | 510e527f ( h4) dup [ shaval 3 cells + ]l ! [ shash 3 cells + ]l ! | ||
+ | 9b05688c ( h5) dup [ shaval 2 cells + ]l ! [ shash 2 cells + ]l ! | ||
+ | 1f83d9ab ( h6) dup [ shaval 1 cells + ]l ! [ shash 1 cells + ]l ! | ||
+ | 5be0cd19 ( h7) dup | ||
+ | shash to h[h] ; decimal | ||
+ | |||
+ | : UPDATEHASH ( -- ) \ 156c -- update values: shash(i) = shaval(i) <= shaval(i-1) + h(i-1) | ||
+ | h[h] dup @ [ shaval 0 cells + ]l tuck +! @ [ shash 0 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 1 cells + ]l tuck +! @ [ shash 1 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 2 cells + ]l tuck +! @ [ shash 2 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 3 cells + ]l tuck +! @ [ shash 3 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 4 cells + ]l tuck +! @ [ shash 4 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 5 cells + ]l tuck +! @ [ shash 5 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 6 cells + ]l tuck +! @ [ shash 6 cells + ]l ! cell+ | ||
+ | @ [ shaval 7 cells + ]l tuck +! @ [ shash 7 cells + ]l ! | ||
+ | shash to h[h] ; \ init pointer to last hash value h7=h | ||
+ | |||
+ | \ ( -- n ) T1x = Ch(e,f,g) + SIGMA1(e) + h ( 74c ) | ||
+ | : T1X | ||
+ | h[f] 2@ over dup >r | ||
+ | and swap invert | ||
+ | h[g] @ and xor | ||
+ | r@ 6 rotr r@ 11 rotr xor r> 25 rotr xor + | ||
+ | h[h] @ + ; | ||
+ | |||
+ | \ ( -- n ) T2 = Maj(a,b,c) + SIGMA0(a) ( 87c ) | ||
+ | : T2 h[b] 2@ ( a b) >r dup >r 2 rotr r@ 13 rotr xor r@ 22 rotr xor | ||
+ | h[c] @ dup r@ and r> r@ and xor swap r> and xor + ; | ||
+ | |||
+ | : SIG0 ( x -- n ) dup >r 7 rotr r@ 18 rotr xor r> 3 rshift xor ; \ 30c | ||
+ | : SIG1 ( x -- n ) dup >r 17 rotr r@ 19 rotr xor r> 10 rshift xor ; \ 30c | ||
+ | |||
+ | \ Put two copies of original Wi on stack, keep its address | ||
+ | MACRO WI@ " dup @ tuck " ( [wi] -- wi [wi] wi) | ||
+ | |||
+ | \ Create 2 copies of new Wi' from Wi on stack ( ..Wi -- ..Wi' Wi') | ||
+ | MACRO WI " 15 pick 15 pick sig0 + 7 pick + 2 pick sig1 + dup " | ||
+ | |||
+ | \ Drop 64 Wi cells from stack ( W0..W63 -- ) | ||
+ | MACRO WIDROP " 4 0 do 2drop 2drop 2drop 2drop 2drop 2drop 2drop 2drop loop " | ||
+ | |||
+ | \ Add round constant Ki (to T1x to make T1) for each round | ||
+ | MACRO KI+ " | ||
+ | MACRO SUBRND | ||
+ | |||
+ | MACRO KI16 " [ ki 16 cells + ]l " | ||
+ | MACRO KI64 " [ ki 64 cells + ]l " | ||
+ | |||
+ | : SHA256 | ||
+ | ki16 ki do wi@ subrnd cell+ 4 +loop drop \ ( w0..w15 ) original block | ||
+ | ki64 ki16 do wi subrnd | ||
+ | updatehash ; | ||
+ | |||
+ | : SETLEN | ||
+ | shalen 2@ d2* d2* d2* ( bytes-> | ||
+ | |||
+ | : cellsreverse | ||
+ | 0 do dup @ bytes>< | ||
+ | |||
+ | MACRO ENDIAN16 " dup 16 cellsreverse " | ||
+ | MACRO ENDIAN14 " dup 14 cellsreverse " | ||
+ | |||
+ | : HASHFULLBLOCKS ( adr1 ud -- adr2 count ) \ hash all but last block | ||
+ | swap dup >r 6 rshift | ||
+ | over [ cellsize 6 - ]l lshift or >r \ return is now: :r lo lo' | ||
+ | 6 rshift 0 ?do \ do if hi'= hi/64 > 0 | ||
+ | 0 0 do dup endian16 sha256 64 + loop \ hash for 2^cellsize full blocks | ||
+ | loop \ hash for hi' | ||
+ | r> 0 ?do dup endian16 sha256 64 + loop \ hash for lo' count full 64 byte blocks | ||
+ | r> 63 and ; \ leave address and count for partial block | ||
+ | |||
+ | : HASHFINAL ( addr count -- ) \ hash partial and/or last block | ||
+ | dup >r w swap cmove \ move bytes into block w array | ||
+ | w r@ + 128 over c! \ put 80h after last message byte | ||
+ | char+ 55 r@ - \ compute tentative 0 byte fill count | ||
+ | r> 55 > \ is partial block byte count > 55 ? | ||
+ | if 8 + 0 fill \ if yes, fill rest of block w/zeroes | ||
+ | w endian16 sha256 | ||
+ | w 56 \ now setup last block containing bit count | ||
+ | then | ||
+ | 0 fill setlen w endian14 sha256 | ||
+ | ; \ endian adjust, except bit count, then hash | ||
+ | |||
+ | \ Generate SHA-256 from a counted buffer of text | ||
+ | : GENSHA256 ( addr n -- ) | ||
+ | 0 shainit | ||
+ | 2dup shalen 2! | ||
+ | hashfullblocks | ||
+ | hashfinal ; | ||
+ | |||
+ | \ print SHA-256 hash -- | ||
+ | : .SHA cr ." sha-256: " 0 7 do h[h] i 4* + @ .hex -1 +loop ; | ||
+ | |||
+ | \ unused - | ||
+ | |||
+ | |||
+ | |||
+ | \ ===== TESTING | ||
+ | : .HAHDR | ||
+ | ." ---h0--- ---h1--- ---h2--- ---h3--- ---h4--- ---h5--- ---h6--- ---h7---" | ||
+ | |||
+ | : proof ( -- ) \ ~11-12us for s" abc" | ||
+ | t[ s" abc" gensha256 ]t. | ||
+ | cr 9 spaces .hahdr cr ." must be: " | ||
+ | ." ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad" | ||
+ | .sha ; | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | **Optimising performance of the SHA-256 hash generator | ||
+ | |||
+ | A more thorough optimisation is possible by looking at often used parts of the code and optimising those. | ||
- | Creating SHA-256 hashes is an area where time of execution is usually seen as relevant. The above version, using generic Forth, will run on most Forth systems. But is not the fastest version possible. Running on wabiForth in tests it achieves a throughput of around 2.7 MB/s. | ||
- | The first step to make the algorithm faster is check if BYTES>< | + | A first step to make the algorithm faster is to check if BYTES>< |
- | The next step is to use a data-array for the hash-variables H0-H7 and the temp variables. On systems with a memory-cache, | + | The next step could be to use a data-array for the hash-variables H0-H7 and the temp variables. On systems with a memory-cache, |
- | A next step would be to program the six logical functions in assembly. Tested on wabiForth, combining these optimisation steps makes the program run about 5 times faster, with a throughput of around 12.9 MB/s | + | A next step would be to program the six logical functions in assembly. Tested on wabiForth, combining these optimisation steps makes the program run about 5 times faster, with a throughput of around 12.1 MB/s |
- | The nest step would be to program the subloop as a whole in assembly. This is a surprisingly short assembly routine of only 37 opcodes | + | The nest step would be to program the subloop as a whole in assembly. This is a surprisingly short assembly routine of only 37 opcodes, including |
- | The last step tested by the author is to also program the HASH1BLOCK word in assembly. The final throughput achieved is 48.3 MB/s. Around 17 times faster than using generic Forth. | + | The last step tested by the author is to also program the HASH1BLOCK word in assembly. The final throughput achieved is 45 MB/s. Around 17 times faster than using generic Forth. |
+ | The following is an example where ARM32 assembly is used for the subloop and the HASH1BLOCK word: | ||
+ | |||
+ | |||
< | < | ||
Line 492: | Line 776: | ||
</ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Please note that this is all done using bog-standard ARM32 assembly, no SHA-256 specific opcodes are used. These SHA_256 specific opcodes would increase the performance even more. Another option would be to use the NEON coprocessor. This is available on Raspberry Pi2 and later and would allow some parallel processing of the subloop. If this really raises throughput is as yet unproven. | ||
- | + | \j2h | |
- | + | ||
- | + | ||
- | Please note that this is al done using normal ARM32 assembly, no SHA-256 specific opcodes were used. These would increase the performance even more. | + | |
en/pfw/sha-256.1728903354.txt.gz · Last modified: 2024-10-14 12:55 by willem