div_qr_1n_pi1

Niels Möller nisse at lysator.liu.se
Sun Jun 6 17:55:16 UTC 2021


Marco Bodrato <bodrato at mail.dm.unipi.it> writes:

> Using masks does not always give the fastest code. I tried the
> following variation on Niels' code, and, on my laptop with "g++-10 -O2
> -mtune=icelake-client -march=icelake-client", the resulting code is
> comparable (faster?) with the current asm.

Maybe we should have some macrology for that? Or do all relevant
processors and compilers support efficient cmov these days? I'm sticking
to masking expressions for now.

Worries about side-channel leakage of cmov isn't so relevant for these
particular functions, since the use of MPN_INCR_U is a data dependent
loop anyway.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list