Two 10-lines functions

bodrato at bodrato at
Wed Feb 3 13:25:39 CET 2010


>> IIRC Niels suggested some time ago that we could try to split
>> odd-limb-sized mulmod using bitwise shifts... Do I remember wrongly?
> Right, we discussed that. We can consider that if/when we find some
> application where the size for some can't be rounded up. For current
> applications of wraparound arithmetic, I think it will be slower than
> rounding the size up to a multiple of a suitable (small) power of two.
> Shifting will add linear (i.e., O(n)) cost, while the cost for
> increasing the product size a limb or two should be sublinear (above
> schoolbook range, and at least on average).

Currently we use a _bnm1_next_size function, based on _fft_next_size,
which in turn is based on tuned tables. On shell I can read the following
line from MUL_FFT_TABLE3:
    {     39, 9}, {     23, 8}, {     47, 9}, {     27,10}, \
This means, I think, that for large (not terribly large: 27 << 9 = 13824)
products we will chose a size of x*2^n limbs, with x ~= 30, possibly odd
and greater than the threshold:
#define MULMOD_BNM1_THRESHOLD               11

I mean: after many recursions, we can hit an odd number above the threshold.
I agree: speeding up that small product, buried in the 9th recursion level
of a large product, will probably not be very relevant; but it may be
cheaper than adding 2^9 limbs at the top level.
Of course, the expected value for bitwise wrap-mul threshold is greater
than the value above computed for limbs... but FFT_TABLE3 contains much
larger values...



More information about the gmp-devel mailing list