ToomN2 with "factored" evaluation

Alberto Zanoni zanoni at volterra.uniroma2.it
Wed Nov 18 10:36:53 CET 2009


I'm working on a "iterated" version of Toom-2.5 (toom32) to be applied to 
_very_ unbalanced operands.

Torbjornd and I noticed that if a is much bigger than b, then one can use 
iteratively the most unbalanced version of toomk2 methods (toom32, toom42, 
toom52,...), splitting b in halves (say, b length is n + t, where t=n or 
n-1), and considering "blocks" of a with length 3n, until the most meaningful 
part is reached (which can be shorter: from 1 to 3n-1 limbs). Recomposing of 
the result is in this case quite easy: just sum the various parts in the 
right place of the result.

I pointed out some time ago that in this case there is one benefit to exploit: 
one can evaluate b in the needed points just once (instead of about 
an/(k*n)). Keeping these values apart, just the various blocks of a must be 
evaluated each time (and multiplied and recomposed, of course).

With the help of Marco, I'm darely entering the mpn world and currently 
working on a mpn version of a function implementing this idea. I've already a 
working one, but I'm trying a different schema, that should exploit in all 
cases the evaluation of b, keeping the code sufficiently compact. I hope to 
be able to cope with carries and borrows as soon as possible.

If it will prove to be effective, one could think about possible 
generalizations to the use of toom42, etc. to this case.

> Reduced storage for toom32
> Niels Möller nisse at lysator.liu.se
> Mon Nov 16 23:05:05 CET 2009
> 
> The following version of toom32_mul uses 2n + 2 limbs of scratch
> space, rather than 4n + 2 (both figures not including any scratch
> needed by the pointwise multiplies).
> 
> Almost a drop in replacement, but requires s + t >= n + 2, which is
> slightly stricter than the old code.
> 
> The allocation is quite clever, if I may say so. It has the drawback
> of some more function call/loop setup overhead since some adds or subs
> on 2n limbs have to be done n limbs at time due to the allocation layout.
> 
> Regards,
> /Niels

-- 
Alberto Zanoni
Centro Interdipartimentale "Vito Volterra"
Universita' degli Studi di Roma "Tor Vergata"
Via Columbia 2
00133 Roma, Italia



More information about the gmp-devel mailing list