Why assembler version of addmul_1 is so fast?
borucki.andrzej at gmail.com
Sat Feb 1 21:30:53 UTC 2020
version C: https://github.com/wbhart/mpir/blob/master/mpn/generic/addmul_1.c
In my computer assembler version is twice fast as best optimized C version,
and my assembler trials. What is the riddle of speed?
loop are partially expanded, but this is not enough. This code is specific
to Haswell but how obtained this speedup?
More information about the gmp-discuss