Why assembler version of addmul_1 is so fast?

Andy borucki.andrzej at gmail.com
Sat Feb 1 21:30:53 UTC 2020

version C: https://github.com/wbhart/mpir/blob/master/mpn/generic/addmul_1.c
versus asm:
In my computer assembler version is twice fast as best optimized C version,
and my assembler trials. What is the riddle of speed?
loop are partially expanded, but this is not enough. This code is specific
to Haswell but how obtained this speedup?

More information about the gmp-discuss mailing list