How to calculate cycles/limb in assembly routines

Albin Ahlbäck albin.ahlback at gmail.com
Thu Apr 4 18:43:52 CEST 2024


Hello,

I am looking at Torbjörn's `aorsmul_1.asm' for Apple M1, and I am having 
trouble understanding how the cycles per limb number was calculated.

As I understand it, the cycles per limb number represents the loop(s) in 
any routine. Looking at the main loop, it seems like it should scale at 
10 cycles per loop (of which 2 cycles are lost due to latency from 
loading x4, I believe), for which it treats four limbs from `up' at a 
time. However, the given number is 1.25 which is half the size of my 
calculated 10 / 4.

Do you use the number of limbs from both `rp' and `up' in this 
calculation to obtain this number, or is my calculations wrong due to 
miscalculation or overseeing some clever trick that the CPU employs?

Best,
Albin


More information about the gmp-devel mailing list