# mpn_mulmod_bnm1

Niels Möller nisse at lysator.liu.se
Thu Apr 3 05:53:04 UTC 2014

```David Harvey <d.harvey at unsw.edu.au> writes:

> The main problem with this approach is that there is not much
> granularity in the choice of r.

In my code, I used some huristics like this: Start with r = 5, and
choose the appropriate size k for the splitting into pieces, and
transform size 2^n. Then redo this choice with one prime less (which
requires a smaller k, and gives a higher degree polynomials), and use as
few primes as you can get away with, without doubling the transform
size.

This eliminates the large bumps. I probably don't get the details right
here, but you get steps like switching from 2^n and r = 5 to 2^{n+1} and r
= 3.

> You don't want r to get too large, because then the reduction/CRT steps
> can become quite expensive.

I looked into that some years ago. I just put my notes from back then at
http://www.lysator.liu.se/~nisse/mist/crc-complexity.pdf. For small r,
the cost appears to be sub-squadratic (which was a bit surprising to me,
since all multiplications are schoolbook; for large r, one would need to
use divide-and-conquer, building on larger multiplies). For CRT with r
limb sized moduli, 2 <= r <= 10, you need approximately 3.86
(r-1)^{1.57} limb multiplies.

> Another solution, which doesn't work at all, is to use TFTs. One
> cannot use TFTs directly for multiplication mod x^n - 1, because they
> evaluate at the wrong roots of unity.

I implemented TFT, but I think you're right that it's directly
applicable only in the no-wraparound case.

> I propose using a modulus of the form
>
> M = (B^{k u_1} + 1) (B^{k u_2 + 1) ... (B^{k u_s} + 1),
>
> where
>
> B = 2^64
> k is some integer
> u_1 > u_2 > ... > u_s is a decreasing sequence of powers of 2.

Seems closely related to variants of the TFT which use polynomial

f(x) = (x^{u_1}+1) (x^{u_2} + 1) ... (x^{u_s} + 1)

where u_j = 2^{k_j} are distinct powers of two and f(x) is a factor of
the larger x^{2^{2u_1} - 1. Which is a TFT, since we evaluate the input
polynomial (or reduced variants thereof) at the roots of f, and the
roots of f are a subset of the roots of x^{2^n} - 1 (and if needed, one
can always go between x^{2^k} - 1 and x^{2^k} + 1 by a simple change of
variable).

I've seen the details spelled out in some paper on in-place TFT
transforms.

> REDUCE: takes as input an arbitrary integer, and computes its residues
> modulo B^{k u_1} + 1, ..., B^{k u_s} + 1. [...] It ends up looking
> very similar to the top layers of the TFT algorithm, working on the
> input in chunks of k limbs.

I'd guess it really *is* a TFT algorithm...

And the cool thing, compared to the plain TFT, is that the subset of
roots used really correspond to a useful integer modulo. (*Maybe* that's
in some way the case also with the plain TFT, but I'm afraid that the
subset of roots used in my code corresponds to a factorization of
x^{2^n}-1 which can't be lifted up from Z_p[x] to Z[x]. I haven't
thought this through carefully).

Regards,
/Niels

--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
```