How to calculate cycles/limb in assembly routines
Albin Ahlbäck
albin.ahlback at gmail.com
Thu Apr 4 18:43:52 CEST 2024
Hello,
I am looking at Torbjörn's `aorsmul_1.asm' for Apple M1, and I am having
trouble understanding how the cycles per limb number was calculated.
As I understand it, the cycles per limb number represents the loop(s) in
any routine. Looking at the main loop, it seems like it should scale at
10 cycles per loop (of which 2 cycles are lost due to latency from
loading x4, I believe), for which it treats four limbs from `up' at a
time. However, the given number is 1.25 which is half the size of my
calculated 10 / 4.
Do you use the number of limbs from both `rp' and `up' in this
calculation to obtain this number, or is my calculations wrong due to
miscalculation or overseeing some clever trick that the CPU employs?
Best,
Albin
More information about the gmp-devel
mailing list