# Wraparound multiplicaton vs mullo

David Harvey dmharvey at cims.nyu.edu
Sun Oct 18 18:08:43 CEST 2009

```On Oct 18, 2009, at 10:30 AM, Niels Möller wrote:

> David Harvey <dmharvey at cims.nyu.edu> writes:
>
>> Then of course x^3 - x, or more generally (x - a) ... (x - a[n]).
>> So e.g. instead of x^4 - 1, you could do x^4 - 5*x^2 + 4 = (x^2 - 1)
>> * (x^2 - 4). I bet that could be made faster than using x^4 - 1.
>
> Do you think that more complicated "wraparound" would be useful, e.g.,
> for Newton iteration? Its usefulness is at least not obvious to me.

For Newton iteration I'm not sure, probably not. I had in mind
applications to e.g. modular arithmetic, like mpz_pow, generalising
Montgomery's algorithm, along the lines of McLaughlin's paper "New
frameworks for Montgomery's modular multiplication method" (Math Comp
vol 73 no 246 pp 899-906, 2003).

The simplest version runs as follows. Suppose you want to do
arithmetic modulo N. Select a "working modulus" M, where M > N and
(M, N) = 1. Precompute I = -1/N mod M. Given residues 0 <= A < N and
0 <= B < N, do the following:

C = A*B
P = C*I mod M
Q = C + P*N
R = Q/M

By construction P*N = -C mod M, so Q is divisible by M, i.e. R is an
integer. Moreover Q = C mod N, so R = A*B/M mod N. And 0 <= Q < N^2 +
M*N < 2*M*N, so 0 <= R < 2*N, so one can reduce R to the canonical
interval [0, N) by a single comparison with N and possibly a
subtraction.

The idea is to choose M so that multiplication mod M is efficient,
and so that division by M is efficient (i.e. I suppose linear time).
For example Montgomery's algorithm is the special case where M = 2^n.
I'm proposing choosing M to be something like B^n - 1, or B^(4n) -
5*B^(2n) + 4, etc.

There are some even better ideas in McLaughlin's paper that might
kick in for somewhat larger n.

david

```