Niels Möller nisse at
Sun Apr 6 13:44:11 UTC 2014

David Harvey <d.harvey at> writes:

>> I guess you could mitigate this a bit by rounding u up to get a
>> polynomial with reasonably low weight.
> Yes, but doesn't this miss the whole point???

You could round up so you get at most, say, three factors (x^{2^k_j} -
1). That should reduce the performance steps quite a lot compared to
using a single factor x^{2^k} - 1.

Anyway, doing this TFT-like processing on the input integer instead
seems preferable, as you have explained.

For wraparound applications, I guess it would be nice if one could give
the CRT reconstruction an additional input. So it gets x mod (2^u_j + 1)
for a couple of j:s from the core FFT multiply routines, and x mod 2^u
from the application, and then it reconstructs x.

> My understanding is that O(n log n) of the work gets passed down to
> full-size FFTs as you state above, but I'm talking here about the O(n)
> work, at the top layer, including e.g. all the cross-butterflies.

I see. I think I never got to even consider assembly optimzation for


Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list