GMP on Pentium 2
Torbjorn Granlund
tg at swox.com
Sat Nov 8 02:26:57 CET 2003
I found the reason for the 3.7 vs 3.2 cycles/limb performance for
mpn/x86/aors_n.asm on p6. Alignment. If the loop start is at an
address 8 mod 16, the loop needs 3.7 cycles/limb, but if it is
aligned 0 mod 16, it needs only 3.2 cycles/limb. Since the code forces
just 0 mod 8 alignment, both timing results happen depending on
where the code end up being put by the linker.
--
Torbjörn
More information about the gmp-discuss
mailing list