User Tools

Site Tools


en:pfw:marsaglia_s_xorshift_for_arm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
en:pfw:marsaglia_s_xorshift_for_arm [2023-09-04 18:16] – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1en:pfw:marsaglia_s_xorshift_for_arm [2024-10-08 23:03] (current) – [Raspberry Pi 3b+ with wabiForth] jeroenh
Line 1: Line 1:
 +====== Marsaglia's XORshift routine on the ARM processor ======
 +
 +===== Raspberry Pi 3b+ with wabiForth =====
 +
 +
 +The Forth version of the randomisation routines is the same on any processor as only standard Forth words are used. But the ARM-processor can do do a neat
 +trick: it can do 1 cycle (dup, shift and xor) in 1 opcode!! And as
 +most Forths include an assembler it is an interesting exercise to see how
 +much faster the routine is when coded in assembly.
 +This example is coded using wabiForth on a Raspberry 3b+, but the principle is the same for any ARMv8 Aarch32 processor.
 +
 +The routine uses three registers named top, v and w. Top contains the top of the stack, v and w are scratch registers.
 +
 +
 +==== XORshift in ARM Aarch32 assembly ====
 +<code>
 +variable seed
 +2345 seed !
 +
 +code: ASMRANDOM ( address_seed -- rndm_val )
 +  [ w, top, ldr,       \ get value in seed in w
 +  
 +  w, w, w, 13 lsl#, eor,
 +  w, w, w, 17 lsr#, eor,
 +  w, w, w,  5 lsl#, eor,
 +  
 +  v, v, w, eor,        \ xor old seed value with generated random number
 +  v, top, str,         \ save xor'd value in seed
 +  top, w, mov,
 +  
 +  ] ; 7 inlinable
 +
 +</code>
 +
 +===== Comparison of Forth vs assembly =====
 +
 +Tested with wabiForth on Raspberry 3b+ @ 1.5 GHz  
 +Here some simple benchmarks which compare the 1 and 2 seed
 +versions coded in Forth and the 1 seed version in assembly. Just
 +to get an idea about execution-speeds. 
 +
 +<code>
 +    ---------------------------
 +    1 seed 32bit Forth:     40c
 +    2 seed 32bit Forth:     60c
 +    1 seed 32bit assembly:  13c
 +    ---------------------------
 +</code>
 +
 +Time measured is the number of CPU-cycles required to put a
 +random number on the stack with a given method. The routine in assembly
 +is 3 times as fast as the corresponding routine in Forth. Which is a
 +decent speed-up of the routine.