[Paul Zimmermann <Paul.Zimmermann@loria.fr>] mpz_cbrtrem

Jason Moxham J.L.Moxham@maths.soton.ac.uk
Mon, 4 Nov 2002 13:16:11 +0000


On Monday 04 Nov 2002 12:54 pm, Paul Zimmermann wrote:
>    > [zimmerma@ecrouves ~/gmp]$ ./cbrtrem 50000 1
>    > mpz_root took 29670ms
>    > mpz_sqrtrem took 2270ms
>    > mpz_cbrtrem took 1950ms
>
>    strange , On my athlon mpz_cbrtrem is 50% slower than mpz_sqrtrem ?
>
> For the same number of limbs?

I didn't change you code , so it should be , used same compiler options a=
s gmp=20
lib was compiled with , although I replaced cputime() with clock() =20

>
>    > while being reasonably fast for one-limb operands:
>    >
>    > [zimmerma@ecrouves ~/gmp]$ ./cbrtrem 1 100000
>    > mpz_root took 1830ms
>    > mpz_sqrtrem took 110ms
>    > mpz_cbrtrem took 2400ms
>    >
>    > The key function is mpz_add_bits: set the low bits of the destinat=
ion
>    > to bits n0 to n1-1 from the operand.
>    >
>    > Paul
>
>    I assume this is the karatsuba cube root , does it generilize to k-t=
h
> root ?
>
> Yes this is a recursive algorithm, with a direct computation of the
> remainder together with the cube root. It generalizes to k-th root, but
> when k increases the efficiency decreases with respect to first computi=
ng
> the k-th root, then the remainder by n - s^k.
>
>    If this is the karatsuba cbrt-root then I was under the impression t=
hat
> it was not asymptotically optimal , although faster for pratical ranges=
 .
>
> In fact my "karatsuba square root" is more generally a divide & conquer
> algorithm. Unlike divide & conquer division, it is asymptotically optim=
al
> (except perhaps for the multiplicative constant), since it is just a Ne=
wton
> iteration with incremental remainder computation.
>
>    So Paul , if it does generalize to k-th root , do you plan to implem=
ent
> it for GMP ? , or does anyone else ?
>
> I first want to see if the cube-root implementation is fast enough.
> How does it compare to your generic implementation for k=3D3?
>

It faster by a couple of percent , so when optimized should be much faste=
r and=20
simpler than varible precision newton

I wish you had emailed me , that you were working on it , I would put off=
 the=20
mpn-ification of my code until I had compaired it with you code to see if=
 it=20
was worth the effort.


> Paul