div_qr_1n_pi1

Niels Möller nisse at lysator.liu.se
Thu Jun 3 20:38:35 UTC 2021


Marco Bodrato <bodrato at mail.dm.unipi.it> writes:

> Using masks does not always give the fastest code. I tried the
> following variation on Niels' code, and, on my laptop with "g++-10 -O2
> -mtune=icelake-client -march=icelake-client", the resulting code is
> comparable (faster?) with the current asm.

Cool! 

For assembly, it looks like we currently only have assembly for x86_64/
and x86_64/k8/. I think it's possibly to do something more clever on
more recent processors with mulx, e.g, it will get pretty neat to keep
the u1 recurrency variable in the special %rdx register.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list