perfect power / nth root improvements
arthur at ilvsun3.com
Wed May 21 04:56:51 CEST 2008
(continued from the gmp-discuss board)
> > I got busy and implemented a padic root routine for perfect power
> > testing that runs 20% faster than the current code up to inputs
> > around 10^100, then it really improves. I put the code and
> > instructions here:
> > www.ilvsun3.com/files/artroot.tar
> Does the code avoid calls to mpz_root/mpn_root?
Yes. The mpz_perfect_power_z routine is basically the same - I just
replaced its calls to mpz_root with calls to a new root extraction
routine. (I actually modified mpz_perfect_power_z slightly to
additionally pass in the number of bits in the number to be tested).
> Could you define, in mathematical terms, exactly what you've
Given integers N and R, the algorithm uses Newton iteration in a modular
fashion to to calculate an X for which X^R=N mod 2^(2^K) for smallest
2^(2^K) > N^(1/R). After calculating X, return true if X^R = N.
Benefits of p-adic vs. floating point iterations:
a) the exact number of iterations needed can be calculated precisely
b) the number of p-adic digits of accuracy exactly doubles with each
c) no divisions
d) power of 2 mods can be implemented efficiently by bit manipulations
Let me explain how I got here. I'm doing some number theory work with
many perfect power tests. Just a fun hobby. GMP seemed like a great
"bignum" package that was easy to use, and it had a good perfect power
routine. I started looking into faster methods of perfect power
testing. In my research, I found this maple worksheet with sample code:
Integer root extraction and perfect-power detection via p-adic
It was posted in 2003 by a graduate student, Carl Devore, with a message
"If you use this worksheet, please email me...to discuss it, or just to
say "hi"." I certainly need to give him a shout to "say hi", especially
if its important for licensing reasons as I implemented the GMP routine
after reading his paper.
Carl talks about how division can be done with multiplication and
subtraction, how Newton iteration works with modular arithmetic, and
Newton-Hensel lifting for modular polynomial roots - and gives
references for all these topics.
He provides a nth-root routine, and a perfect-power routine (which calls
the nth-root routine), all written in maple calls. As he points out,
his code runs horribly slowly compared to the built in maple calls.
But I liked that straightforward p-adic loop. I implemented in GMP
calls, and it ran much slower than GMPs mpz_perfect_power_p. I found
that the majority of the slowness was simply that GMPs perfect power
routine was better. So I just had GMPs perfect power routine call the
new p-adic nth root routine. It was still too slow.
I found that mpz_mod for mod powers of 2 was much slower than using
mpz_add and masking the desired bits. I found mpz_tdiv_r_2exp was
faster than mpz_add, but couldn't use it for negative numbers, because
the p-adic routine requires a "signed" mod instead of regular mod.
I then wrote a modular powering routine, taking advantage of the new mod
routine - as mpz_powm was another slow point for powers of 2.
At this point the new code was slightly faster than GMPs code across all
input sizes. But then I started thinking about how the code, after
finding a possible X base solution, actually fully computed X^R and
compared to N to see if it was an actual solution. I remembered
"But I can eliminate most (x,n) more efficiently, by checking whether n=x^k
is consistent with the first few digits of n and x." [Daniel J.
Bernstein "Detecting Perfect Powers in Essentially Linear Time", part
III, section 9. "How to test if n = x^k"].
Keeping in mind that the algorithm doesn't calculate X for which X is
closest to N^(1/R) but instead calculates X such that X^R is congruent
to N, it would be quite likely that if X was not a valid solution that X
would not be close to N^(1/R). Bernstein provides an algorithm: "How to
test if n = x^k" using truncated accuracy floating point. But before
implementing that, almost in passing, I added a simpler test.
I ensured that the number of bits in N was compatible with the number of
bits in X^R. If the number of bits was incompatible, simply return 0.
As an example of how useful this check is:
Time to perfect power test 10,000 numbers starting at 10^10,000
original code: 386 seconds
new code (without bit-length test): 45 seconds
new code (with bit-length test): 8 seconds
This simple bit test eliminated 97% of the computations X^R for possible
X values (8,700 computations of X^R that didn't equal N vs. 332,500
computations). We might be able to speed things up even more with a
better test (presuming that the better test didn't take too long).
Suggestions, regardless of what is done with this p-adic routine:
1) provide efficient mpz_mod and mpz_powm for powers of 2, similar to
the bit shifting calls mpz_tdiv_r_2exp.
2) provide signed mod routines as maple's "mods". It obviously has a
use, as this routine needs it, and I had to cobble together a crude
solution. Here's a definition of mods:
modp(a,p) returns the unique integer congruent to a modulo p in the
range 0..p-1, whereas mods(a,p) returns the image in the range
-(p-1)/2..(p+1)/2 (here I'm assuming p is an odd prime).
Suggestions for this routine:
1) check the p-adic math
2) fix some of the choices I made to quickly get something to work
(using current sqrt routine for r=2, method of testing negative numbers
- simply exiting if the number is too large)
3) check the bit-test and loops to ensure there's no "off-by-one" error
4) see about replacing bit-test with a more precise test, looking at the
trade-off in time of how long the test takes vs. the computation time
required for the candidates it excludes over and above the bit-test
> (2) assign the code to the FSF and get your employer to sign a disclaimer
heheehee - I don't think my employer would understand. As a funny
side-note, I mentioned to my girlfriend that I was doing some
"Newton-Hensel lifting" and she looked shocked and wanted to know why I
had any interest in nude pencil lifting.
Thanks. I just whipped this up at home to help in my perfect power
testing. Its certainly been fun and exciting, and it seems to be
working quickly and correctly. Hopefully its a good "proof of concept"
that can be implemented in GMP. I'll do what I can, but I've only been
using GMP for a few days and am still getting familiar with it.
More information about the gmp-devel