en:pfw:sha-256
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:pfw:sha-256 [2024-10-17 12:40] – jeroenh | en:pfw:sha-256 [2024-10-17 13:09] (current) – jeroenh | ||
---|---|---|---|
Line 185: | Line 185: | ||
\ ===== END of code | \ ===== END of code | ||
- | </ | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
**Testing the SHA-256 algorithm** | **Testing the SHA-256 algorithm** | ||
+ | |||
Testing the SHA-256 algorithm can be done with the following code. The ' | Testing the SHA-256 algorithm can be done with the following code. The ' | ||
+ | |||
+ | |||
< | < | ||
+ | |||
\ ===== TEST routines | \ ===== TEST routines | ||
Line 225: | Line 233: | ||
</ | </ | ||
- | + | ||
+ | |||
+ | |||
+ | |||
+ | |||
**Optimising performance of the SHA-256 hash generator - step 1 ** | **Optimising performance of the SHA-256 hash generator - step 1 ** | ||
- | Creating SHA-256 hashes is an area where time of execution is usually seen as relevant. The above version, using generic Forth, will run on most Forth systems. But is not the fastest version possible. Running on wabiForth in tests it achieves a throughput of around 2.6 MB/s. | ||
- | One way of doing this is to use Forth-Macros | + | Creating SHA-256 hashes is an area where time of execution |
+ | One way of optimising performance is to use Forth-Macros, | ||
+ | |||
+ | |||
+ | |||
< | < | ||
+ | |||
+ | |||
\ ======================================================== | \ ======================================================== | ||
\ ANS Forth code for Secure Hash Algorithms SHA-256 | \ ANS Forth code for Secure Hash Algorithms SHA-256 | ||
Line 448: | Line 463: | ||
- | \ ===== | + | \ ===== |
: .HAHDR | : .HAHDR | ||
." ---h0--- ---h1--- ---h2--- ---h3--- ---h4--- ---h5--- ---h6--- ---h7---" | ." ---h0--- ---h1--- ---h2--- ---h3--- ---h4--- ---h5--- ---h6--- ---h7---" | ||
Line 459: | Line 474: | ||
- | </code\ | + | </code> |
+ | |||
+ | |||
+ | |||
+ | |||
**Optimising performance of the SHA-256 hash generator - step 2 ** | **Optimising performance of the SHA-256 hash generator - step 2 ** | ||
- | A more thorough optimisation | + | A more thorough optimisation |
- | A first step to make the algorithm faster is check if BYTES>< | + | A first step to make the algorithm faster is to check if BYTES>< |
The next step could be to use a data-array for the hash-variables H0-H7 and the temp variables. On systems with a memory-cache, | The next step could be to use a data-array for the hash-variables H0-H7 and the temp variables. On systems with a memory-cache, | ||
Line 474: | Line 492: | ||
The nest step would be to program the subloop as a whole in assembly. This is a surprisingly short assembly routine of only 37 opcodes, including the 4 logical functions, in ARM32 assembly. The throughput is now around 25 MB/s | The nest step would be to program the subloop as a whole in assembly. This is a surprisingly short assembly routine of only 37 opcodes, including the 4 logical functions, in ARM32 assembly. The throughput is now around 25 MB/s | ||
- | The last step tested by the author is to also program the HASH1BLOCK word in assembly. The final throughput achieved is 45 MB/s. Around 17 times faster than using generic Forth. | + | The last step tested by the author is to also program the HASH1BLOCK word in assembly. The final throughput achieved is 45 MB/s. Around 17 times faster than using generic Forth. |
+ | The following is an example where ARM32 assembly is used for the subloop and the HASH1BLOCK word: | ||
+ | |||
+ | |||
< | < | ||
Line 755: | Line 776: | ||
</ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | Please note that this is all done using bog-standard ARM32 assembly, no SHA-256 specific opcodes are used. These SHA_256 specific opcodes would increase the performance even more. Another option would be to use the NEON coprocessor. This is available on Raspberry Pi2 and later and would allow some parallel processing of the subloop. If this really raises throughput is as yet unproven. | ||
- | + | \j2h | |
- | + | ||
- | Please note that this is al done using normal ARM32 assembly, no SHA-256 specific opcodes were used. These SHA_256 specific opcodes would increase the performance even more. Another option would be to use the NEON coprocessor. This is available on Raspberry Pi2 and later and would allow some parallel processing of the subloop. If this really enhances performance is as yet unproven. | + | |
en/pfw/sha-256.1729161646.txt.gz · Last modified: 2024-10-17 12:40 by jeroenh