div_qr_1n_pi1
Niels Möller
nisse at lysator.liu.se
Thu Jun 3 20:38:35 UTC 2021
Marco Bodrato <bodrato at mail.dm.unipi.it> writes:
> Using masks does not always give the fastest code. I tried the
> following variation on Niels' code, and, on my laptop with "g++-10 -O2
> -mtune=icelake-client -march=icelake-client", the resulting code is
> comparable (faster?) with the current asm.
Cool!
For assembly, it looks like we currently only have assembly for x86_64/
and x86_64/k8/. I think it's possibly to do something more clever on
more recent processors with mulx, e.g, it will get pretty neat to keep
the u1 recurrency variable in the special %rdx register.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list