mpn_mulmod_bnm1

Fri Apr 4 00:36:08 UTC 2014

On 04/04/2014, at 4:28 AM, Niels Möller <nisse at lysator.liu.se> wrote:

>> (1) the bit bounds for the coefficients get worse. For example if u =
>> 5 then f(x) = x^5 + x^4 + x + 1, so when you compute say a(x) b(x) mod
>> f(x) (in Z[x]), every coefficient in the result is a sum of up to
>> *five* terms from a(x) b(x). If say u = 63, you need an extra 6 bits.
>> When you only work with x^n + 1, you never need extra bits like this.
> 
> I guess you could mitigate this a bit by rounding u up to get a
> polynomial with reasonably low weight.

Yes, but doesn't this miss the whole point???

>> (3) You probably need more specialised assembly code to handle the
>> unusual low-level TFT primitives. (I haven't thought through this
>> carefully.)
> 
> My code implements TFT using calls to assembly routines doing full-size
> FFT:s. And I round the size up to a multiple of some reasonable power of
> two, so that the calls don't do very small transforms. I'm not *sure* this
> is the best way, but at least I think it's what Joris suggested when we
> discussed this at ICMS some years ago.

My understanding is that O(n log n) of the work gets passed down to full-size FFTs as you state above, but I'm talking here about the O(n) work, at the top layer, including e.g. all the cross-butterflies.

david