dmharvey at cims.nyu.edu
Tue Jun 9 15:36:31 CEST 2009
On Jun 9, 2009, at 4:02 AM, Niels Möller wrote:
> David Harvey <dmharvey at cims.nyu.edu> writes:
>> I have written up some work I've been doing on middle products,
>> please see
> After a first quick reading, it seems that the executive summary is
> that middle-product is a more efficient primitive than low-product (at
> least in Schoolbook and Karatsuba range, the ratio compared to a full
> product is constant). While for mullo, the work saved is less
> significant the larger n is.
Yes. <plug>If you can build an algorithm out of middle products
rather than truncated products, you'll probably win.</plug>
Your comment reminds me that I forgot to put in a graphical
comparison between the classical and karatsuba middle products! I
have uploaded a new version with this.
> You say that for larger sizes, the middle product can be computed
> using fft-multiplication with wraparound. Can you say something more
> about how this is done? My understanding so far is that wraparound is
> useful only when one part of the product is known, and you you want to
> know the rest, which is the case for applications to newton-iteration
> and the like. But for a general mulmid, no part of the product is
> known a-priori.
Suppose you want the middle product of X and Y, where len(X) = 2n, len
(Y) = n. Let Z = X * Y = Z0 + Z1*B^n + Z2*B^(2n), where len(Zi) = n.
Then Z mod (B^(2n) + 1) = (Z0 - Z2) + Z1*B^n. Then neither Z0 nor Z2
overlap Z1 in the final 2n-limb answer. So you just ignore the first
half, and take the second half as the middle product.
Or you can do it with B^(2n) - 1 just as well.
This works beautifully with polynomials; with integers I guess there
is some carry fiddling, and correcting for the missing diagonals.
> In the FFT range, will it still hold that the time for MP_n will be
> almost the same as for a full n × n multiply, or what's the expected
Yes, and in fact I expect the ratio is much closer to 1 than in the
There are two ways to do it. One is exactly as in your previous
question: GMP's FFT multiplication is really doing multiplication
modulo B^(2n) - 1. So here you can see it's exactly the same --- you
just need to subtract off the diagonals from the final result, if you
really want the middle product as I've defined it.
Alternatively you can think at the polynomial level. The Schonhage--
Strassen algorithm rewrites the integer multiplication problem as a
polynomial multiplication problem in R[x]/(x^N - 1), where R =
something like Z/(2^M + 1)Z. It is possible to perform a middle
product of the *polynomials*, by applying the transposition principle
at the level of polynomials. This amounts to doing funny things with
FFTs like reversing the coefficients or using twisted roots of unity
or doing the inverse FFT first (I can't remember exactly which trick
you use). Sorry I'm being very vague here, I haven't thought this
through very carefully. What you end up with here is the sum over a
"blocked" middle product region, like this:
the block size corresponding to the fourier coefficient size (or
maybe half the fourier coefficient size). Then I guess you need to,
again, correct for diagonals, either by adding in the missing
triangular pieces, or stripping off the excess triangular pieces (in
the latter case, I suppose you just subtract off the single digit
products along the diagonal).
More information about the gmp-devel