Two 10-lines functions
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Wed Feb 3 13:25:39 CET 2010
Ciao,
>> IIRC Niels suggested some time ago that we could try to split
>> odd-limb-sized mulmod using bitwise shifts... Do I remember wrongly?
>
> Right, we discussed that. We can consider that if/when we find some
> application where the size for some can't be rounded up. For current
> applications of wraparound arithmetic, I think it will be slower than
> rounding the size up to a multiple of a suitable (small) power of two.
> Shifting will add linear (i.e., O(n)) cost, while the cost for
> increasing the product size a limb or two should be sublinear (above
> schoolbook range, and at least on average).
Currently we use a _bnm1_next_size function, based on _fft_next_size,
which in turn is based on tuned tables. On shell I can read the following
line from MUL_FFT_TABLE3:
{ 39, 9}, { 23, 8}, { 47, 9}, { 27,10}, \
This means, I think, that for large (not terribly large: 27 << 9 = 13824)
products we will chose a size of x*2^n limbs, with x ~= 30, possibly odd
and greater than the threshold:
#define MULMOD_BNM1_THRESHOLD 11
I mean: after many recursions, we can hit an odd number above the threshold.
I agree: speeding up that small product, buried in the 9th recursion level
of a large product, will probably not be very relevant; but it may be
cheaper than adding 2^9 limbs at the top level.
Of course, the expected value for bitwise wrap-mul threshold is greater
than the value above computed for limbs... but FFT_TABLE3 contains much
larger values...
Regards,
Marco
--
http://bodrato.it/
More information about the gmp-devel
mailing list