en:pfw:marsaglia_s_xorshift_for_arm
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
en:pfw:marsaglia_s_xorshift_for_arm [2023-09-04 18:16] – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1 | en:pfw:marsaglia_s_xorshift_for_arm [2024-10-08 23:03] (current) – [Raspberry Pi 3b+ with wabiForth] jeroenh | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Marsaglia' | ||
+ | |||
+ | ===== Raspberry Pi 3b+ with wabiForth ===== | ||
+ | |||
+ | |||
+ | The Forth version of the randomisation routines is the same on any processor as only standard Forth words are used. But the ARM-processor can do do a neat | ||
+ | trick: it can do 1 cycle (dup, shift and xor) in 1 opcode!! And as | ||
+ | most Forths include an assembler it is an interesting exercise to see how | ||
+ | much faster the routine is when coded in assembly. | ||
+ | This example is coded using wabiForth on a Raspberry 3b+, but the principle is the same for any ARMv8 Aarch32 processor. | ||
+ | |||
+ | The routine uses three registers named top, v and w. Top contains the top of the stack, v and w are scratch registers. | ||
+ | |||
+ | |||
+ | ==== XORshift in ARM Aarch32 assembly ==== | ||
+ | < | ||
+ | variable seed | ||
+ | 2345 seed ! | ||
+ | |||
+ | code: ASMRANDOM ( address_seed -- rndm_val ) | ||
+ | [ w, top, ldr, \ get value in seed in w | ||
+ | | ||
+ | w, w, w, 13 lsl#, eor, | ||
+ | w, w, w, 17 lsr#, eor, | ||
+ | w, w, w, 5 lsl#, eor, | ||
+ | | ||
+ | v, v, w, eor, \ xor old seed value with generated random number | ||
+ | v, top, str, \ save xor'd value in seed | ||
+ | top, w, mov, | ||
+ | | ||
+ | ] ; 7 inlinable | ||
+ | |||
+ | </ | ||
+ | |||
+ | ===== Comparison of Forth vs assembly ===== | ||
+ | |||
+ | Tested with wabiForth on Raspberry 3b+ @ 1.5 GHz | ||
+ | Here some simple benchmarks which compare the 1 and 2 seed | ||
+ | versions coded in Forth and the 1 seed version in assembly. Just | ||
+ | to get an idea about execution-speeds. | ||
+ | |||
+ | < | ||
+ | --------------------------- | ||
+ | 1 seed 32bit Forth: | ||
+ | 2 seed 32bit Forth: | ||
+ | 1 seed 32bit assembly: | ||
+ | --------------------------- | ||
+ | </ | ||
+ | |||
+ | Time measured is the number of CPU-cycles required to put a | ||
+ | random number on the stack with a given method. The routine in assembly | ||
+ | is 3 times as fast as the corresponding routine in Forth. Which is a | ||
+ | decent speed-up of the routine. | ||