perfect power / nth root improvements

Wed May 21 04:56:51 CEST 2008

(continued from the gmp-discuss board)

 > > I got busy and implemented a padic root routine for perfect power
 > > testing that runs 20% faster than the current code up to inputs
 > > around 10^100, then it really improves.  I put the code and
 > > instructions here:
 > > www.ilvsun3.com/files/artroot.tar

 > Does the code avoid calls to mpz_root/mpn_root?

Yes.  The mpz_perfect_power_z routine is basically the same - I just 
replaced its calls to mpz_root with calls to a new root extraction 
routine.  (I actually modified mpz_perfect_power_z slightly to 
additionally pass in the number of bits in the number to be tested).

 > Could you define, in mathematical terms, exactly what you've
implemented?

Given integers N and R, the algorithm uses Newton iteration in a modular 
fashion to to calculate an X for which X^R=N mod 2^(2^K) for smallest 
2^(2^K) > N^(1/R).  After calculating X, return true if X^R = N.

Benefits of p-adic vs. floating point iterations:
a) the exact number of iterations needed can be calculated precisely
b) the number of p-adic digits of accuracy exactly doubles with each 
iteration
c) no divisions
d) power of 2 mods can be implemented efficiently by bit manipulations

Let me explain how I got here.  I'm doing some number theory work with 
many perfect power tests.  Just a fun hobby.  GMP seemed like a great 
"bignum" package that was easy to use, and it had a good perfect power 
routine.  I started looking into faster methods of perfect power 
testing.  In my research, I found this maple worksheet with sample code:

Integer root extraction and perfect-power detection via p-adic 
Newton-Hensel lifting
http://www.maplesoft.com/applications/app_center_view.aspx?AID=1138

It was posted in 2003 by a graduate student, Carl Devore, with a message 
"If you use this worksheet, please email me...to discuss it, or just to 
say "hi"."  I certainly need to give him a shout to "say hi", especially 
if its important for licensing reasons as I implemented the GMP routine 
after reading his paper.

Carl talks about how division can be done with multiplication and 
subtraction, how Newton iteration works with modular arithmetic, and 
Newton-Hensel lifting for modular polynomial roots - and gives 
references for all these topics.

He provides a nth-root routine, and a perfect-power routine (which calls 
the nth-root routine), all written in maple calls.  As he points out, 
his code runs horribly slowly compared to the built in maple calls.

But I liked that straightforward p-adic loop.  I implemented in GMP 
calls, and it ran much slower than GMPs mpz_perfect_power_p.  I found 
that the majority of the slowness was simply that GMPs perfect power 
routine was better.  So I just had GMPs perfect power routine call the 
new p-adic nth root routine.  It was still too slow.

I found that mpz_mod for mod powers of 2 was much slower than using 
mpz_add and masking the desired bits.  I found mpz_tdiv_r_2exp was 
faster than mpz_add, but couldn't use it for negative numbers, because 
the p-adic routine requires a "signed" mod instead of regular mod.

I then wrote a modular powering routine, taking advantage of the new mod 
routine - as mpz_powm was another slow point for powers of 2.

At this point the new code was slightly faster than GMPs code across all 
input sizes.  But then I started thinking about how the code, after 
finding a possible X base solution, actually fully computed X^R and 
compared to N to see if it was an actual solution.  I remembered 
Bernstein's comment
"But I can eliminate most (x,n) more efficiently, by checking whether n=x^k
  is consistent with the first few digits of n and x."  [Daniel J. 
Bernstein "Detecting Perfect Powers in Essentially Linear Time", part 
III, section 9. "How to test if n = x^k"].

Keeping in mind that the algorithm doesn't calculate X for which X is 
closest to N^(1/R) but instead calculates X such that X^R is congruent 
to N, it would be quite likely that if X was not a valid solution that X 
would not be close to N^(1/R).  Bernstein provides an algorithm: "How to 
test if n = x^k" using truncated accuracy floating point.  But before 
implementing that, almost in passing, I added a simpler test.

I ensured that the number of bits in N was compatible with the number of 
bits in X^R.  If the number of bits was incompatible, simply return 0. 
As an example of how useful this check is:

Time to perfect power test 10,000 numbers starting at 10^10,000

original code: 386 seconds
new code (without bit-length test): 45 seconds
new code (with bit-length test): 8 seconds

This simple bit test eliminated 97% of the computations X^R for possible 
X values (8,700 computations of X^R that didn't equal N vs. 332,500 
computations).  We might be able to speed things up even more with a 
better test (presuming that the better test didn't take too long).

Suggestions, regardless of what is done with this p-adic routine:

1) provide efficient mpz_mod and mpz_powm for powers of 2, similar to 
the bit shifting calls mpz_tdiv_r_2exp.

2) provide signed mod routines as maple's "mods".  It obviously has a 
use, as this routine needs it, and I had to cobble together a crude 
solution.  Here's a definition of mods:

http://www.student.cs.uwaterloo.ca/~cs487/maple.shtml
modp(a,p) returns the unique integer congruent to a modulo p  in the 
range 0..p-1, whereas mods(a,p) returns the image in the range 
-(p-1)/2..(p+1)/2  (here I'm assuming p is an odd prime).

Suggestions for this routine:

1) check the p-adic math
2) fix some of the choices I made to quickly get something to work 
(using current sqrt routine for r=2, method of testing negative numbers 
- simply exiting if the number is too large)
3) check the bit-test and loops to ensure there's no "off-by-one" error
4) see about replacing bit-test with a more precise test, looking at the 
trade-off in time of how long the test takes vs. the computation time 
required for the candidates it excludes over and above the bit-test

 > (2) assign the code to the FSF and get your employer to sign a disclaimer

heheehee - I don't think my employer would understand.  As a funny 
side-note, I mentioned to my girlfriend that I was doing some 
"Newton-Hensel lifting" and she looked shocked and wanted to know why I 
had any interest in nude pencil lifting.

 > Cool!

Thanks.  I just whipped this up at home to help in my perfect power 
testing.  Its certainly been fun and exciting, and it seems to be 
working quickly and correctly.  Hopefully its a good "proof of concept" 
that can be implemented in GMP.  I'll do what I can, but I've only been 
using GMP for a few days and am still getting familiar with it.

/arthur