mul_fft

Niels Möller nisse at lysator.liu.se
Wed Jun 30 19:30:55 UTC 2021


Paul Zimmermann <Paul.Zimmermann at inria.fr> writes:

> 1) the use of mpn_add_n_sub_n is not activated by default in mul_fft.c.
>    It might give a small speedup in some cases.

I think add_n_sub_n was originally motivated by improved locality (could
apply at different levels of memory hierarcy).

But maybe we could get close to twice the speed using newer instructions
with multiple carry flags (I guess that's what
powerpc64/mode64/p9/add_n_sub_n.asm is doing)? We could probably do
something similar on x86_64 with adox and adcx (if corresponding
subtract instructions are missing, on-the-fly negation should be fairly
efficient).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list