# perfect power / nth root improvements

arthur arthur at ilvsun3.com
Wed May 21 04:56:51 CEST 2008

```(continued from the gmp-discuss board)

> > I got busy and implemented a padic root routine for perfect power
> > testing that runs 20% faster than the current code up to inputs
> > around 10^100, then it really improves.  I put the code and
> > instructions here:
> > www.ilvsun3.com/files/artroot.tar

> Does the code avoid calls to mpz_root/mpn_root?

Yes.  The mpz_perfect_power_z routine is basically the same - I just
replaced its calls to mpz_root with calls to a new root extraction
routine.  (I actually modified mpz_perfect_power_z slightly to
additionally pass in the number of bits in the number to be tested).

> Could you define, in mathematical terms, exactly what you've
implemented?

Given integers N and R, the algorithm uses Newton iteration in a modular
fashion to to calculate an X for which X^R=N mod 2^(2^K) for smallest
2^(2^K) > N^(1/R).  After calculating X, return true if X^R = N.

Benefits of p-adic vs. floating point iterations:
a) the exact number of iterations needed can be calculated precisely
b) the number of p-adic digits of accuracy exactly doubles with each
iteration
c) no divisions
d) power of 2 mods can be implemented efficiently by bit manipulations

Let me explain how I got here.  I'm doing some number theory work with
many perfect power tests.  Just a fun hobby.  GMP seemed like a great
"bignum" package that was easy to use, and it had a good perfect power
routine.  I started looking into faster methods of perfect power
testing.  In my research, I found this maple worksheet with sample code:

Integer root extraction and perfect-power detection via p-adic
Newton-Hensel lifting
http://www.maplesoft.com/applications/app_center_view.aspx?AID=1138

It was posted in 2003 by a graduate student, Carl Devore, with a message
"If you use this worksheet, please email me...to discuss it, or just to
say "hi"."  I certainly need to give him a shout to "say hi", especially
if its important for licensing reasons as I implemented the GMP routine
after reading his paper.

Carl talks about how division can be done with multiplication and
subtraction, how Newton iteration works with modular arithmetic, and
Newton-Hensel lifting for modular polynomial roots - and gives
references for all these topics.

He provides a nth-root routine, and a perfect-power routine (which calls
the nth-root routine), all written in maple calls.  As he points out,
his code runs horribly slowly compared to the built in maple calls.

But I liked that straightforward p-adic loop.  I implemented in GMP
calls, and it ran much slower than GMPs mpz_perfect_power_p.  I found
that the majority of the slowness was simply that GMPs perfect power
routine was better.  So I just had GMPs perfect power routine call the
new p-adic nth root routine.  It was still too slow.

I found that mpz_mod for mod powers of 2 was much slower than using
mpz_add and masking the desired bits.  I found mpz_tdiv_r_2exp was
faster than mpz_add, but couldn't use it for negative numbers, because
the p-adic routine requires a "signed" mod instead of regular mod.

I then wrote a modular powering routine, taking advantage of the new mod
routine - as mpz_powm was another slow point for powers of 2.

At this point the new code was slightly faster than GMPs code across all
input sizes.  But then I started thinking about how the code, after
finding a possible X base solution, actually fully computed X^R and
compared to N to see if it was an actual solution.  I remembered
Bernstein's comment
"But I can eliminate most (x,n) more efficiently, by checking whether n=x^k
is consistent with the first few digits of n and x."  [Daniel J.
Bernstein "Detecting Perfect Powers in Essentially Linear Time", part
III, section 9. "How to test if n = x^k"].

Keeping in mind that the algorithm doesn't calculate X for which X is
closest to N^(1/R) but instead calculates X such that X^R is congruent
to N, it would be quite likely that if X was not a valid solution that X
would not be close to N^(1/R).  Bernstein provides an algorithm: "How to
test if n = x^k" using truncated accuracy floating point.  But before
implementing that, almost in passing, I added a simpler test.

I ensured that the number of bits in N was compatible with the number of
bits in X^R.  If the number of bits was incompatible, simply return 0.
As an example of how useful this check is:

Time to perfect power test 10,000 numbers starting at 10^10,000

original code: 386 seconds
new code (without bit-length test): 45 seconds
new code (with bit-length test): 8 seconds

This simple bit test eliminated 97% of the computations X^R for possible
X values (8,700 computations of X^R that didn't equal N vs. 332,500
computations).  We might be able to speed things up even more with a
better test (presuming that the better test didn't take too long).

Suggestions, regardless of what is done with this p-adic routine:

1) provide efficient mpz_mod and mpz_powm for powers of 2, similar to
the bit shifting calls mpz_tdiv_r_2exp.

2) provide signed mod routines as maple's "mods".  It obviously has a
use, as this routine needs it, and I had to cobble together a crude
solution.  Here's a definition of mods:

http://www.student.cs.uwaterloo.ca/~cs487/maple.shtml
modp(a,p) returns the unique integer congruent to a modulo p  in the
range 0..p-1, whereas mods(a,p) returns the image in the range
-(p-1)/2..(p+1)/2  (here I'm assuming p is an odd prime).

Suggestions for this routine:

1) check the p-adic math
2) fix some of the choices I made to quickly get something to work
(using current sqrt routine for r=2, method of testing negative numbers
- simply exiting if the number is too large)
3) check the bit-test and loops to ensure there's no "off-by-one" error
4) see about replacing bit-test with a more precise test, looking at the
trade-off in time of how long the test takes vs. the computation time
required for the candidates it excludes over and above the bit-test

> (2) assign the code to the FSF and get your employer to sign a disclaimer

heheehee - I don't think my employer would understand.  As a funny
side-note, I mentioned to my girlfriend that I was doing some
"Newton-Hensel lifting" and she looked shocked and wanted to know why I
had any interest in nude pencil lifting.

> Cool!

Thanks.  I just whipped this up at home to help in my perfect power
testing.  Its certainly been fun and exciting, and it seems to be
working quickly and correctly.  Hopefully its a good "proof of concept"
that can be implemented in GMP.  I'll do what I can, but I've only been
using GMP for a few days and am still getting familiar with it.

/arthur
```