ToomN2 with "factored" evaluation
zanoni at volterra.uniroma2.it
Wed Nov 18 10:36:53 CET 2009
I'm working on a "iterated" version of Toom-2.5 (toom32) to be applied to
_very_ unbalanced operands.
Torbjornd and I noticed that if a is much bigger than b, then one can use
iteratively the most unbalanced version of toomk2 methods (toom32, toom42,
toom52,...), splitting b in halves (say, b length is n + t, where t=n or
n-1), and considering "blocks" of a with length 3n, until the most meaningful
part is reached (which can be shorter: from 1 to 3n-1 limbs). Recomposing of
the result is in this case quite easy: just sum the various parts in the
right place of the result.
I pointed out some time ago that in this case there is one benefit to exploit:
one can evaluate b in the needed points just once (instead of about
an/(k*n)). Keeping these values apart, just the various blocks of a must be
evaluated each time (and multiplied and recomposed, of course).
With the help of Marco, I'm darely entering the mpn world and currently
working on a mpn version of a function implementing this idea. I've already a
working one, but I'm trying a different schema, that should exploit in all
cases the evaluation of b, keeping the code sufficiently compact. I hope to
be able to cope with carries and borrows as soon as possible.
If it will prove to be effective, one could think about possible
generalizations to the use of toom42, etc. to this case.
> Reduced storage for toom32
> Niels Möller nisse at lysator.liu.se
> Mon Nov 16 23:05:05 CET 2009
> The following version of toom32_mul uses 2n + 2 limbs of scratch
> space, rather than 4n + 2 (both figures not including any scratch
> needed by the pointwise multiplies).
> Almost a drop in replacement, but requires s + t >= n + 2, which is
> slightly stricter than the old code.
> The allocation is quite clever, if I may say so. It has the drawback
> of some more function call/loop setup overhead since some adds or subs
> on 2n limbs have to be done n limbs at time due to the allocation layout.
Centro Interdipartimentale "Vito Volterra"
Universita' degli Studi di Roma "Tor Vergata"
Via Columbia 2
00133 Roma, Italia
More information about the gmp-devel