GMP 4.4 remaining tasks
Niels Möller
nisse at lysator.liu.se
Tue Dec 8 12:20:48 CET 2009
Torbjorn Granlund <tg at gmplib.org> writes:
> We should realise of course that mpn_bc_mulmod_bnm1 will be called
> just once and that mpn_bc_mulmod_bnp1 will be called once for small n
> and zero times for large n.
I don't agree with the "zero times" part. mulmod_bnp1 is called
multiple times, for various sizes. E.g., with a basecase threshold
around 20 (for bnm1),
B^96 - 1 = (B^48 + 1) (B^24 + 1) (B^12 + 1) (B^12 - 1)
so this gives one bnm1 below the threshold (hence not split further as
(B^6 + 1)(B^6 - 1) ), one bnp1 of that same size, and two bnp1 of size
one half and one quarter of the original input size.
The basecase threshold really isn't relevant for bnp1, it's just a
choice between either plain mul + wraparound, or fft.
One may still want to write assembler for the case that the plain mul in
bnp1 will use schoolbook, I guess. I think assembler basecases for bnp1
and bnm1 will have a significant effect only on the smallest sizes; for
sizes above a hundred limbs or so most time will be spent in bnp1 for
sizes above schoolbook.
One might also consider writing wrapping Karatsuba multiplication, maybe
one can save one addition there. But I would expect a modest saving, at
best.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list