div_qr_1n_pi1

Sun Jun 6 17:55:16 UTC 2021

Marco Bodrato <bodrato at mail.dm.unipi.it> writes:

> Using masks does not always give the fastest code. I tried the
> following variation on Niels' code, and, on my laptop with "g++-10 -O2
> -mtune=icelake-client -march=icelake-client", the resulting code is
> comparable (faster?) with the current asm.

Maybe we should have some macrology for that? Or do all relevant
processors and compilers support efficient cmov these days? I'm sticking
to masking expressions for now.

Worries about side-channel leakage of cmov isn't so relevant for these
particular functions, since the use of MPN_INCR_U is a data dependent
loop anyway.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.