fast arithmetic with little extra memory
Niels Möller
nisse at lysator.liu.se
Thu Oct 15 21:35:52 CEST 2009
Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:
> Daniel S. Roche has designed a clever variant of Karatsuba's algorithm
> which requires only O(log n) extra memory (in addition to the read-only
> n+n input and to the read/write 2n output).
Fun! It seems new useful arithmetic building blocks are discovered all
the time. The building block here is a function that computes the 2n
limb value h + (f_0 + f_1) g, where h, f_0, f_1 and g are n limb
values, and h is passed in the result area. Fits quite well with mpn
primitives, basecase would be
mp_limb_t
add_mul_add_n (mp_ptr rp, mp_srcptr f0p, mp_srcptr f1p, mp_srcptr g, mp_size_t n)
{
mp_size_t i;
mp_limb_t c;
for (i = 0, c = 0; i < n; i++)
{
mp_limb_t f0, f1;
/* Is there any simpler way to do add with carry in and out? We compute
[c, f0] = f0p[i] + f1p[i] + c
*/
f0 = f0p[i] + c;
c &= (f0 == 0);
f1 = f1p[i];
f0 += f1;
c += (f0 < f1);
rp[n+i] = mpn_addmul_1 (rp + i, g, n, f0);
}
if (c)
c = mpn_add_n (rp + n, rp + n, g, n);
return c;
}
Questions:
1. How to handle carries and other details for the integer case,
unbalanced inputs, etc. We may want to compute h + (f0 - f1) * g
instead, which most likely leads to some new complications.
2. Is memory usage for Karatsuba important for GMP performance? The
paper quotes Brent and Zimmermann, "The efficiency of an
implementation of Karatsuba's algorithm depends heavily on memory
usage".
3. How many recursive Karatsuba calls do we typically have between
toom3 and basecase? The new algorithm costs additional additions
for deep recursion; the first two levels need 7 additions of size
n/2, just like plain Karatsuba, but some of the calls deeper down
needs 9 additions.
4. What about the basecase, can the above primitive be competitive
with plain mul_basecase? I wouldn't be too surprised if a
well-written assembler implementation can run at the same speed,
even though it does additional work. (There will also be a need
for a simpler basecase function that computes h + f * g, with h
located in the result area, i.e. an addmul_n, and of course for
plain multiplication).
5. Can the tricks be extended to higher toom functions?
Regards,
/Niels
More information about the gmp-devel
mailing list