sqrt algorithm
Torbjörn Granlund
tg at gmplib.org
Mon Aug 3 10:02:36 UTC 2015
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
> And for small sizes, one ought to sqrlo rather than mulmod_bnm1 to
> exploit the cancellation. So the tuning would be between sqrlo and
> sqrmod_bnm1. But it seems we only have mullo, not sqrlo?
Right, we currently don't have sqrlo...
I don't think I ever considered a sqrlo.
All the *_basecase assembly code is a major PITA to write for all
relevant CPUs. The effort behind what we have today is considerable,
much more work is needed. E.g., we only have 4 variant of
mullo_basecase, while we have around 20 variant of mul_basecase and
sqr_basecase.
To improve the situation, we ought to look into semi-automatic
*_basecase generation. The input would be annotated innerloops of
addmul_(k), mul_(k), and sqr_addlsh1. A special compiler would take
such pieces and generate outter loops, cleverly reusing pointers between
iterations.
This is not easy, but it would save a lot of time.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list