User Tools

Site Tools


en:pfw:sha-256

**This is an old revision of the document!**

SHA-256 HASH-generator in generic Forth - for Project Forth Works

The SHA-256 hash is widely used as the basis for computer security. This version closely follows the original description by using and naming all the variables and functions as put down in the documentation. This helps in understanding the actual SHA-hash algorithm. More efficient implementation are easily possible, two of those are presented below.

This program generates SHA-256 hashes on files with a maximum length of 2^32 bytes. The hash is created in the 8 values called H0-H7, and can be printed using the .HASH word.

This version is runs on 32 bit Forths with little-endian storage in memory

In this setup it is suitable for messages upto 4GB (ie 32Gb)

\ ===== support words - use primitives if available in your system
: D8* ( n -- d_n*8 ) \ takes a single as input, multiplies with 8 into a double
    dup 29 rshift swap 3 lshift swap ;

: ROTR ( n rot -- rotated_n ) \ rotate right n with rot bits
    31 and 2dup 32 swap - lshift >r rshift r> or ;

hex : BYTES>< ( m -- w ) \ reverse cell bytes: 0x12345678 <-> 0x78563412
    dup >r 18 lshift r@ ff00 and 8 lshift or
    r@ ff0000 and 8 rshift or r> 18 rshift or
 ; decimal


\ ===== values, arrays and related helper-functions
\ h0-h7 form the actual hash
    0 value h0  0 value h1  0 value h2  0 value h3
    0 value h4  0 value h5  0 value h6  0 value h7

\ wa-wg are working variables
    0 value wa  0 value wb  0 value wc  0 value wd
    0 value we  0 value wf  0 value wg  0 value wh

: HASH->WVAR
    h0 to wa  h1 to wb  h2 to wc  h3 to wd
    h4 to we  h5 to wf  h6 to wg  h7 to wh ;

: WVAR->HASH
    wa +to h0 wb +to h1 wc +to h2 wd +to h3
    we +to h4 wf +to h5 wg +to h6 wh +to h7 ;


\ temp values
    0 value t1  0 value t2  0 value lenmess  0 value addrmess

\ message block = 16 words for message and...
\ ...48 words for temp storage of Wi for last 48 rounds
    create  MBLCK 64 cells allot

: MESS@ ( n -- mess ) 4* mblck + @ ;
: MESS! ( m n -- )    4* mblck + ! ;
: CLEARMESS ( -- ) 64 0 do 0 i mess! loop ;     \ clear message-block

hex create  KI[]  \ 64 SHA-256 constants, 1 for each subloop - checked
428a2f98 , 71374491 , b5c0fbcf , e9b5dba5 , 3956c25b , 59f111f1 , 923f82a4 , ab1c5ed5 ,
d807aa98 , 12835b01 , 243185be , 550c7dc3 , 72be5d74 , 80deb1fe , 9bdc06a7 , c19bf174 ,
e49b69c1 , efbe4786 , 0fc19dc6 , 240ca1cc , 2de92c6f , 4a7484aa , 5cb0a9dc , 76f988da ,
983e5152 , a831c66d , b00327c8 , bf597fc7 , c6e00bf3 , d5a79147 , 06ca6351 , 14292967 ,
27b70a85 , 2e1b2138 , 4d2c6dfc , 53380d13 , 650a7354 , 766a0abb , 81c2c92e , 92722c85 ,
a2bfe8a1 , a81a664b , c24b8b70 , c76c51a3 , d192e819 , d6990624 , f40e3585 , 106aa070 ,
19a4c116 , 1e376c08 , 2748774c , 34b0bcb5 , 391c0cb3 , 4ed8aa4a , 5b9cca4f , 682e6ff3 ,
748f82ee , 78a5636f , 84c87814 , 8cc70208 , 90befffa , a4506ceb , bef9a3f7 , c67178f2 ,
decimal

: KI@ ( i -- KI ) 4* ki[] + @ ;


\ ===== the 6 basic logic functions of SHA-256 - all 6 functions mix and combine data
: SIGMA0 ( x -- xn )
    >r r@  2 rotr  r@ 13 rotr xor  r> 22 rotr   xor ;
: SIGMA1 ( x -- xn )
    >r r@  6 rotr  r@ 11 rotr xor  r> 25 rotr   xor ;

: SIG0  ( x -- n )
    >r r@  7 rotr  r@ 18 rotr xor  r>  3 rshift xor ;
: SIG1 ( x -- n )
     >r r@ 17 rotr  r@ 19 rotr xor  r> 10 rshift xor ;

: CH ( -- an )
    we wf and we invert wg and xor ;
: MAJ ( -- an )
    wa wb and  wa wc and xor  wb wc and xor ;


\ ===== the HASH functions
hex
: INITHASH \ fill first hash with intial values
  6a09e667 to h0  bb67ae85 to h1  3c6ef372 to h2  a54ff53a to h3
  510e527f to h4  9b05688c to h5  1f83d9ab to h6  5be0cd19 to h7
; decimal

\ do 1 of the 64 rounds per block
: SUBLOOP ( message+ki -- )
    wh +  CH +  we sigma1 + to t1   \ t1 filled with temp value
    MAJ  wa sigma0 + to t2          \ t2 filled with temp value
    wg to wh
    wf to wg
    we to wf
    wd t1 + to we
    wc to wd
    wb to wc
    wa to wb
    t1 t2 + to wa ;

: HASH1BLOCK \ 27us -> MACRO based=12us
    hash->wvar                      \ copy hash-variables h0-h7 to temp-variables wa-wh

    \ first 16 rounds digest message-data from message-block & does initial scrambling
    16 0 do
        i  mess@                    \ get data from message
        i ki@ + subloop             \ message+Ki to subloop
    loop

    \ the next 48 rounds scramble the data from the first 16 rounds by combining
    \ data from 4 earlier messages into a new 'message', as input to the subloop
    64 16 do
        i  2 - mess@ sig1
        i  7 - mess@      +
        i 15 - mess@ sig0 +
        i 16 - mess@      +         \ ( Wi )

        dup  i mess!                \ store Wi for following rounds
        i ki@ + subloop             \ message+Ki to subloop
    loop

    wvar->hash ;                    \ this is where the hash is actually created

: FILLMESS ( addr -- ) \ fills message block from message with reversed words
    16 0 do
        dup i 4* + @  bytes><  i mess!
    loop drop ;

: MBLCKREVERSE  ( -- )  \ reverse bytes words in message-block - only for last block!
    16 0 do  i mess@  bytes><  i mess!  loop ;

: PUTLEN ( len -- ) d8*  14 mess!  15 mess! ;

: GENSHA256 ( addr len -- )
    to lenmess  to addrmess         \ addr en len point to message-array

    \ first hash all full blocks
    lenmess 6 rshift dup >r         \ number of full blocks - also in r:
    0 ?do
        addrmess i 64 * + fillmess  \ i = blocknumber
        hash1block
    loop

    \ than hash the last block << including 1 extra block if needed
    r> 64 * addrmess + to addrmess  \ addrmess now address of last block of data
    lenmess 63 and >r               \ bytes_left in r:
    clearmess                       \ clear mblck

    addrmess mblck r@ cmove         \ move last bytes to mblck
    mblck r@ + 128 swap c!          \ add bit after last byte
    mblckreverse                    \ block now definitely filled with message

    r> 55 > if
        hash1block                  \ mblck is already filled and zeroed till the end
        clearmess                   \ clear mblck again
    then

    lenmess putlen                  \ put len at correct place
    hash1block ;                    \ and last block

: sha256 ( addr len -- )            \ create a hash in hash-variables H0-H7 for the data starting at addr
                                    \ with length len
    inithash gensha256 ;


\ ===== printing 

: .HAHDR
    ." ---h0--- ---h1--- ---h2--- ---h3--- ---h4--- ---h5--- ---h6--- ---h7---" ;
: .32HEX base @ hex >r 0 <# # # # # # # # # #> type r> base ! ;
: .HASH h0 .32hex  h1 .32hex  h2 .32hex  h3 .32hex
        h4 .32hex  h5 .32hex  h6 .32hex  h7 .32hex ;

create ABC 97 c, 98 c, 99 c,

: proof ( -- )
    ABC 3 sha256
    cr 9 spaces .hahdr cr ." must be: "
    ." ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c b410ff61 f20015ad"
    cr ." SHA-256: "
	.hash ;


\ ===== the END

Creating SHA-256 hashes is an area where time of execution is usually seen as relevant. The above version, using generic Forth, will run on most Forth systems. But is not the fastest version possible. Running on wabiForth in tests it achieves a throughput of around 2.7 MB/s.

The first step to make the algorithm faster is check if BYTES>< and ROTR are available as faster primitives on your forth-system. Especially the ROTR is common as most processors have a specific opcode for that function.

The next step is to use a data-array for the hash-variables H0-H7 and the temp variables. On systems with a memory-cache, this saves time by reducing writes to and reads from memory.

A next step would be to program the six logical functions in assembly. Tested on wabiForth, combining these optimisation steps makes the program run about 5 times faster, with a throughput of around 12.9 MB/s

The nest step would be to program the subloop as a whole in assembly. This is a surprisingly short assembly routine of only 37 opcodes for the whole in ARM32 assembly. The throughput is now around 26.4 MB/s

The last step tested by the author is to also program the HASH1BLOCK word in assembly. The final throughput achieved is 48.3 MB/s. Around 17 times faster than using generic Forth.

Please note that this is al done using normal assembly, no SHA-256 specific opcodes where used. These ould increase the performance even more.

en/pfw/sha-256.1728744396.txt.gz · Last modified: 2024-10-12 16:46 by jeroenh