Looking for advice on writing a high-speed, 64-bit, 2^n modulo q
Torbjorn Granlund
tege at swox.com
Sun Mar 13 00:32:51 CET 2005
Will Galway <galway at math.uiuc.edu> writes:
- - - I'm preparing to do a very large
computation where I expect most of the time to be spent in computing
2^n modulo q for various odd q, where both "n" and "q" are unsigned
64-bit integers. - - -
In any case, I'm now considering rewriting the code in assembly
language, since I suspect that there's still a fair amount of overhead
in calling the mpn code (and in the mpn code having to count the
limbs, etc.)
All of which leads to my question, which is a request for advice,
comments, and perhaps an offer to correspond with me outside this
mailing list to cover the many detailed questions that I have about
assembly language programming for Pentium-or-Athlon computers.
You might want to use some undocumented parts of GMP for such a
computation, since forwards compatibility should matter. I think
you could reach close to optimal code that way, without writing
directly in assembly.
If you build in the source directory, the include files
longlong.h and gmp-impl.h are available. Therein, umul_ppmm,
add_ssaaaa, sub_ddmmss, and perhaps udiv_qrnnd_preinv should be
useful for what you are doing.
You can then either use Montgomery mulmod, or udiv_qrnnd_preinv,
or if 63-bit arithmetic is enough and both one of the multiply
operands and your ring are invariant, a shorter sequence such as
this one (suggested to me by Victor Shoup) should work:
/* r = a*b mod p, with bpinv = floor(b*2^B/p). */
umul_ppmm (q, dummy, a, bpinv);
ab = a * b;
qp = q * p;
r = ab - qp;
if (r >= p)
r -= p;
You should only use 64-bit machines, I think, e.g., Athlon64.
One such machine will be at least 4 times faster than a 32-bit
machine. And programming your problem using longlong.h for
64-bit machines will be a reasonable task, much easier than to
the same thing for 32-bit machines.
--
Torbjörn
More information about the gmp-discuss
mailing list