mpn_mulmod_bnm1

Thu Apr 3 17:28:21 UTC 2014

David Harvey <d.harvey at unsw.edu.au> writes:

> It is possible to do everything using the polynomial f(x) you suggest
> above (in this case the factorisation does lift to Z[x]), and I think
> this may have been what I tried first.
>
> But it has a few disadvantages:
>
> (1) the bit bounds for the coefficients get worse. For example if u =
> 5 then f(x) = x^5 + x^4 + x + 1, so when you compute say a(x) b(x) mod
> f(x) (in Z[x]), every coefficient in the result is a sum of up to
> *five* terms from a(x) b(x). If say u = 63, you need an extra 6 bits.
> When you only work with x^n + 1, you never need extra bits like this.

I guess you could mitigate this a bit by rounding u up to get a
polynomial with reasonably low weight.

> (2) I haven't checked this carefully, but I believe my method is
> faster even in theory, mainly because some of the work is transferred
> out of the TFT into plain integer arithmetic. 

Good point. If you convert to polynomial coefficients first, you'd have
to do this folding multiple times, in the initial TFT rounds for each
prime. Doing it up front, working on the integer, sounds like less work.
If nothing else, fewer passes over the data.

> (3) You probably need more specialised assembly code to handle the
> unusual low-level TFT primitives. (I haven't thought through this
> carefully.)

My code implements TFT using calls to assembly routines doing full-size
FFT:s. And I round the size up to a multiple of some reasonable power of
two, so that the calls don't do very small transforms. I'm not *sure* this
is the best way, but at least I think it's what Joris suggested when we
discussed this at ICMS some years ago.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.