en:pfw:sha-256
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:pfw:sha-256 [2024-10-17 12:34] – jeroenh | en:pfw:sha-256 [2024-10-17 13:09] (current) – jeroenh | ||
---|---|---|---|
Line 185: | Line 185: | ||
\ ===== END of code | \ ===== END of code | ||
+ | |||
</ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
**Testing the SHA-256 algorithm** | **Testing the SHA-256 algorithm** | ||
- | Testing the SHA-256 algorithm can be done with the following code. The ' | ||
+ | Testing the SHA-256 algorithm can be done with the following code. The ' | ||
+ | |||
+ | |||
< | < | ||
+ | |||
\ ===== TEST routines | \ ===== TEST routines | ||
Line 225: | Line 233: | ||
</ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | **Optimising performance of the SHA-256 hash generator - step 1 ** | ||
- | **Optimising performance of the SHA-256 | + | Creating |
- | Creating SHA-256 hashes is an area where time of execution | + | One way of optimising performance |
+ | |||
+ | |||
+ | |||
+ | < | ||
+ | |||
+ | |||
+ | \ ======================================================== | ||
+ | \ ANS Forth code for Secure Hash Algorithms SHA-256 | ||
+ | \ This version: copyright (c) 2024 Jeroen Hoekstra | ||
+ | \ ======================================================== | ||
- | One way of doing this is to use Forth-Macros as described by Wil Baden in Forth Dimensions (FD) Vol. 19, No. 2. There is a version floating around on the internet which does exactly that and here is a adapted version of that code. It is not totally in generic Forth as I wanted to keep te original as unchanged as possible. | + | \ ===== THIS WORK BASED ON: ============================ |
+ | \ FIPS 180-4 specs at: http://csrc.nist.gov/ | ||
- | < | + | \ ...previous work by Jabari Zakiya |
+ | \ Copyright (c) 2001-2013 Jabari Zakiya - jzakiya@mail.com - 15jan2013 | ||
+ | \ https:// | ||
+ | |||
+ | \ ...previous work by Wil Baden | ||
+ | \ Forth Dimensions (FD) Vol. 19, No. 2, July/August 1997 | ||
+ | |||
+ | \ Use of this code is free, subject to acknowledgment of the copyrights, | ||
+ | \ especially those of Jabari Zakiya and Wil Baden | ||
+ | |||
+ | \ ===== DEPENDENCIES | ||
+ | \ This version for 32bit, little endian, byte-addressable CPUs, | ||
+ | \ and strings with a 32bit index | ||
+ | |||
+ | \ dependencies: | ||
+ | \ common usage: ROTR TUCK BYTES>< d2* | ||
+ | \ and CASE-independency | ||
+ | |||
+ | \ ===== MACRO Wordset =================================== | ||
+ | \ This implementation depends heavily on MACROs to speed up | ||
+ | \ the performance. Macros are not very portable. If the code does | ||
+ | \ not run on your system, than use the reference of Wil Baden to | ||
+ | \ adapt the MACRO words to your system | ||
+ | |||
+ | \ This version of MACRO allows insertion of parameters following | ||
+ | \ the macro -- " | ||
+ | \ Example: | ||
+ | \ : FOO .. ?? exit .... ; ?? compiles to -- IF EXIT THEN | ||
+ | |||
+ | \ ===== USAGE ========================================== | ||
+ | \ To generate and show the hash of a string or counted message do: | ||
+ | \ Example: S" some message string" | ||
+ | |||
+ | \ ===== words not in CORE or CORE_EXT =================== | ||
+ | |||
+ | \ for optimal performance the following two words could be done in assembly | ||
+ | : ROTR ( n rot -- rotated_n ) | ||
+ | 32 mod 2dup 32 swap - lshift >r rshift r> or ; | ||
+ | hex : BYTES>< | ||
+ | dup >r 18 lshift r@ ff00 and 8 lshift or | ||
+ | r@ ff0000 and 8 rshift or r> 18 rshift or ; decimal | ||
+ | |||
+ | \ ======================================================== | ||
+ | \ unused \ 8296 bytes excluding BMs and BM-buffers | ||
+ | |||
+ | \ ===== MACRO definitions =============================== | ||
+ | decimal | ||
+ | : PLACE ( caddr n addr -- ) 2dup ! 4 + swap move ; \ pressumes a 32b string-index | ||
+ | : SSTRING ( char " | ||
+ | |||
+ | : SPLIT-AT-CHAR | ||
+ | >r 2dup begin | ||
+ | dup | ||
+ | while | ||
+ | over c@ r@ - | ||
+ | while | ||
+ | 1 /string | ||
+ | repeat then | ||
+ | r> drop tuck 2>r - 2r> ; | ||
+ | |||
+ | : [MACRO] | ||
+ | count begin | ||
+ | 92 ( =' | ||
+ | while | ||
+ | bl word count evaluate 2r> 1 /string | ||
+ | repeat | ||
+ | r> drop r> drop ; | ||
+ | |||
+ | \ Macro creation word - allows parameter insertion | ||
+ | : MACRO create immediate sstring does> [macro] ; | ||
+ | |||
+ | \ ===== Utility Words ================================== | ||
+ | |||
+ | : ]L postpone ] postpone literal ; immediate | ||
+ | |||
+ | \ ===== Start SHA-256 Code ============================= | ||
+ | 32 constant cellsize | ||
+ | |||
+ | 2variable shalen | ||
+ | create | ||
+ | create | ||
+ | create | ||
+ | |||
+ | hex create | ||
+ | 428a2f98 , 71374491 , b5c0fbcf , e9b5dba5 , 3956c25b , 59f111f1 , 923f82a4 , ab1c5ed5 , | ||
+ | d807aa98 , 12835b01 , 243185be , 550c7dc3 , 72be5d74 , 80deb1fe , 9bdc06a7 , c19bf174 , | ||
+ | e49b69c1 , efbe4786 , 0fc19dc6 , 240ca1cc , 2de92c6f , 4a7484aa , 5cb0a9dc , 76f988da , | ||
+ | 983e5152 , a831c66d , b00327c8 , bf597fc7 , c6e00bf3 , d5a79147 , 06ca6351 , 14292967 , | ||
+ | 27b70a85 , 2e1b2138 , 4d2c6dfc , 53380d13 , 650a7354 , 766a0abb , 81c2c92e , 92722c85 , | ||
+ | a2bfe8a1 , a81a664b , c24b8b70 , c76c51a3 , d192e819 , d6990624 , f40e3585 , 106aa070 , | ||
+ | 19a4c116 , 1e376c08 , 2748774c , 34b0bcb5 , 391c0cb3 , 4ed8aa4a , 5b9cca4f , 682e6ff3 , | ||
+ | 748f82ee , 78a5636f , 84c87814 , 8cc70208 , 90befffa , a4506ceb , bef9a3f7 , c67178f2 , | ||
+ | decimal | ||
+ | |||
+ | 0 value H[H] \ Pointer to addr of hash value H for each round | ||
+ | macro H[G] " h[h] [ 1 cells ]l +" | ||
+ | macro H[F] " h[h] [ 2 cells ]l +" | ||
+ | macro H[E] " h[h] [ 3 cells ]l +" | ||
+ | macro H[D] " h[h] [ 4 cells ]l +" | ||
+ | macro H[C] " h[h] [ 5 cells ]l +" | ||
+ | macro H[B] " h[h] [ 6 cells ]l +" | ||
+ | macro H[A] " h[h] [ 7 cells ]l +" | ||
+ | |||
+ | hex : SHAINIT ( -- ) \ load initial sha-256 hash values h0 - h7 | ||
+ | 6a09e667 ( h0) dup [ shaval 7 cells + ]l ! [ shash 7 cells + ]l ! | ||
+ | bb67ae85 ( h1) dup [ shaval 6 cells + ]l ! [ shash 6 cells + ]l ! | ||
+ | 3c6ef372 ( h2) dup [ shaval 5 cells + ]l ! [ shash 5 cells + ]l ! | ||
+ | a54ff53a ( h3) dup [ shaval 4 cells + ]l ! [ shash 4 cells + ]l ! | ||
+ | 510e527f ( h4) dup [ shaval 3 cells + ]l ! [ shash 3 cells + ]l ! | ||
+ | 9b05688c ( h5) dup [ shaval 2 cells + ]l ! [ shash 2 cells + ]l ! | ||
+ | 1f83d9ab ( h6) dup [ shaval 1 cells + ]l ! [ shash 1 cells + ]l ! | ||
+ | 5be0cd19 ( h7) dup | ||
+ | shash to h[h] ; decimal | ||
+ | : UPDATEHASH ( -- ) \ 156c -- update values: shash(i) = shaval(i) <= shaval(i-1) + h(i-1) | ||
+ | h[h] dup @ [ shaval 0 cells + ]l tuck +! @ [ shash 0 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 1 cells + ]l tuck +! @ [ shash 1 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 2 cells + ]l tuck +! @ [ shash 2 cells + ]l ! cell+ | ||
+ | dup @ [ shaval 3 cells + ]l tuck +! @ [ shash 3 cells + ]l ! cell+ | ||
dup @ [ shaval 4 cells + ]l tuck +! @ [ shash 4 cells + ]l ! cell+ | dup @ [ shaval 4 cells + ]l tuck +! @ [ shash 4 cells + ]l ! cell+ | ||
dup @ [ shaval 5 cells + ]l tuck +! @ [ shash 5 cells + ]l ! cell+ | dup @ [ shaval 5 cells + ]l tuck +! @ [ shash 5 cells + ]l ! cell+ | ||
Line 321: | Line 463: | ||
- | \ ===== | + | \ ===== |
: .HAHDR | : .HAHDR | ||
." ---h0--- ---h1--- ---h2--- ---h3--- ---h4--- ---h5--- ---h6--- ---h7---" | ." ---h0--- ---h1--- ---h2--- ---h3--- ---h4--- ---h5--- ---h6--- ---h7---" | ||
Line 332: | Line 474: | ||
- | </code\ | + | </code> |
+ | |||
+ | |||
+ | |||
+ | |||
+ | **Optimising performance of the SHA-256 hash generator - step 2 ** | ||
+ | |||
+ | A more thorough optimisation is possible by looking at often used parts of the code and optimising those. | ||
- | The first step to make the algorithm faster is check if BYTES>< | + | A first step to make the algorithm faster is to check if BYTES>< |
- | The next step is to use a data-array for the hash-variables H0-H7 and the temp variables. On systems with a memory-cache, | + | The next step could be to use a data-array for the hash-variables H0-H7 and the temp variables. On systems with a memory-cache, |
A next step would be to program the six logical functions in assembly. Tested on wabiForth, combining these optimisation steps makes the program run about 5 times faster, with a throughput of around 12.1 MB/s | A next step would be to program the six logical functions in assembly. Tested on wabiForth, combining these optimisation steps makes the program run about 5 times faster, with a throughput of around 12.1 MB/s | ||
- | The nest step would be to program the subloop as a whole in assembly. This is a surprisingly short assembly routine of only 37 opcodes | + | The nest step would be to program the subloop as a whole in assembly. This is a surprisingly short assembly routine of only 37 opcodes, including |
The last step tested by the author is to also program the HASH1BLOCK word in assembly. The final throughput achieved is 45 MB/s. Around 17 times faster than using generic Forth. | The last step tested by the author is to also program the HASH1BLOCK word in assembly. The final throughput achieved is 45 MB/s. Around 17 times faster than using generic Forth. | ||
+ | The following is an example where ARM32 assembly is used for the subloop and the HASH1BLOCK word: | ||
+ | |||
+ | |||
< | < | ||
Line 623: | Line 776: | ||
</ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Please note that this is all done using bog-standard ARM32 assembly, no SHA-256 specific opcodes are used. These SHA_256 specific opcodes would increase the performance even more. Another option would be to use the NEON coprocessor. This is available on Raspberry Pi2 and later and would allow some parallel processing of the subloop. If this really raises throughput is as yet unproven. | ||
- | + | \j2h | |
- | + | ||
- | + | ||
- | Please note that this is al done using normal ARM32 assembly, no SHA-256 specific opcodes were used. These would increase the performance even more. | + | |
en/pfw/sha-256.1729161249.txt.gz · Last modified: 2024-10-17 12:34 by jeroenh