en:pfw:marsaglia_s_xorshift_for_arm
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| en:pfw:marsaglia_s_xorshift_for_arm [2023-09-04 18:16] – gelöscht - Externe Bearbeitung (Unbekanntes Datum) 127.0.0.1 | en:pfw:marsaglia_s_xorshift_for_arm [2024-10-08 23:03] (current) – [Raspberry Pi 3b+ with wabiForth] jeroenh | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | ====== Marsaglia' | ||
| + | |||
| + | ===== Raspberry Pi 3b+ with wabiForth ===== | ||
| + | |||
| + | |||
| + | The Forth version of the randomisation routines is the same on any processor as only standard Forth words are used. But the ARM-processor can do do a neat | ||
| + | trick: it can do 1 cycle (dup, shift and xor) in 1 opcode!! And as | ||
| + | most Forths include an assembler it is an interesting exercise to see how | ||
| + | much faster the routine is when coded in assembly. | ||
| + | This example is coded using wabiForth on a Raspberry 3b+, but the principle is the same for any ARMv8 Aarch32 processor. | ||
| + | |||
| + | The routine uses three registers named top, v and w. Top contains the top of the stack, v and w are scratch registers. | ||
| + | |||
| + | |||
| + | ==== XORshift in ARM Aarch32 assembly ==== | ||
| + | < | ||
| + | variable seed | ||
| + | 2345 seed ! | ||
| + | |||
| + | code: ASMRANDOM ( address_seed -- rndm_val ) | ||
| + | [ w, top, ldr, \ get value in seed in w | ||
| + | | ||
| + | w, w, w, 13 lsl#, eor, | ||
| + | w, w, w, 17 lsr#, eor, | ||
| + | w, w, w, 5 lsl#, eor, | ||
| + | | ||
| + | v, v, w, eor, \ xor old seed value with generated random number | ||
| + | v, top, str, \ save xor'd value in seed | ||
| + | top, w, mov, | ||
| + | | ||
| + | ] ; 7 inlinable | ||
| + | |||
| + | </ | ||
| + | |||
| + | ===== Comparison of Forth vs assembly ===== | ||
| + | |||
| + | Tested with wabiForth on Raspberry 3b+ @ 1.5 GHz | ||
| + | Here some simple benchmarks which compare the 1 and 2 seed | ||
| + | versions coded in Forth and the 1 seed version in assembly. Just | ||
| + | to get an idea about execution-speeds. | ||
| + | |||
| + | < | ||
| + | --------------------------- | ||
| + | 1 seed 32bit Forth: | ||
| + | 2 seed 32bit Forth: | ||
| + | 1 seed 32bit assembly: | ||
| + | --------------------------- | ||
| + | </ | ||
| + | |||
| + | Time measured is the number of CPU-cycles required to put a | ||
| + | random number on the stack with a given method. The routine in assembly | ||
| + | is 3 times as fast as the corresponding routine in Forth. Which is a | ||
| + | decent speed-up of the routine. | ||