middle products

Tue Jun 9 15:36:31 CEST 2009

On Jun 9, 2009, at 4:02 AM, Niels Möller wrote:

> David Harvey <dmharvey at cims.nyu.edu> writes:
>
>> I have written up some work I've been doing on middle products,   
>> please see
>>
>> http://cims.nyu.edu/~harvey/mulmid/
>
> After a first quick reading, it seems that the executive summary is
> that middle-product is a more efficient primitive than low-product (at
> least in Schoolbook and Karatsuba range, the ratio compared to a full
> product is constant). While for mullo, the work saved is less
> significant the larger n is.

Yes. <plug>If you can build an algorithm out of middle products  
rather than truncated products, you'll probably win.</plug>

Your comment reminds me that I forgot to put in a graphical  
comparison between the classical and karatsuba middle products! I  
have uploaded a new version with this.

> You say that for larger sizes, the middle product can be computed
> using fft-multiplication with wraparound. Can you say something more
> about how this is done? My understanding so far is that wraparound is
> useful only when one part of the product is known, and you you want to
> know the rest, which is the case for applications to newton-iteration
> and the like. But for a general mulmid, no part of the product is
> known a-priori.

Suppose you want the middle product of X and Y, where len(X) = 2n, len 
(Y) = n. Let Z = X * Y = Z0 + Z1*B^n + Z2*B^(2n), where len(Zi) = n.  
Then Z mod (B^(2n) + 1) = (Z0 - Z2) + Z1*B^n. Then neither Z0 nor Z2  
overlap Z1 in the final 2n-limb answer. So you just ignore the first  
half, and take the second half as the middle product.

Or you can do it with B^(2n) - 1 just as well.

This works beautifully with polynomials; with integers I guess there  
is some carry fiddling, and correcting for the missing diagonals.

> In the FFT range, will it still hold that the time for MP_n will be
> almost the same as for a full n × n multiply, or what's the expected
> ratio?

Yes, and in fact I expect the ratio is much closer to 1 than in the  
karatsuba case.

There are two ways to do it. One is exactly as in your previous  
question: GMP's FFT multiplication is really doing multiplication  
modulo B^(2n) - 1. So here you can see it's exactly the same --- you  
just need to subtract off the diagonals from the final result, if you  
really want the middle product as I've defined it.

Alternatively you can think at the polynomial level. The Schonhage-- 
Strassen algorithm rewrites the integer multiplication problem as a  
polynomial multiplication problem in R[x]/(x^N - 1), where R =  
something like Z/(2^M + 1)Z. It is possible to perform a middle  
product of the *polynomials*, by applying the transposition principle  
at the level of polynomials. This amounts to doing funny things with  
FFTs like reversing the coefficients or using twisted roots of unity  
or doing the inverse FFT first (I can't remember exactly which trick  
you use). Sorry I'm being very vague here, I haven't thought this  
through very carefully. What you end up with here is the sum over a  
"blocked" middle product region, like this:

XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
XXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXX
       XXXXXXXXXXXXXXX
       XXXXXXXXXXXXXXX
       XXXXXXXXXXXXXXX

the block size corresponding to the fourier coefficient size (or  
maybe half the fourier coefficient size). Then I guess you need to,  
again, correct for diagonals, either by adding in the missing  
triangular pieces, or stripping off the excess triangular pieces (in  
the latter case, I suppose you just subtract off the single digit  
products along the diagonal).

david