division

Jason Moxham J.L.Moxham@maths.soton.ac.uk
Fri, 8 Nov 2002 18:15:01 +0000
Previous message: division
Next message: mpz_powm memory usage vs speed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Friday 08 Nov 2002 8:12 am, Torbjorn Granlund wrote:
> Jason Moxham <J.L.Moxham@maths.soton.ac.uk> writes:
>
>   Attached are some times for varible precision newton
>   iteration for calculation quotient and/or remainder
>
>   the numbers are speedup's over GMP4.1 and GMP4.1+div-patch
>   the left hand column is the number of limbs in the
>   numerator and the top row is the number of limbs in the
>   denominator as a fraction of the number of limbs in the
>   numerator
>
> You cannot expect the readers of this list to know what
> GMP4.1+div-patch could be.

GMP4.1+div-patch is GMP4.1 plus a patch by Torbjorn to correct some speed=
=20
problems with mpn_tdiv_qr in gmp4.1 , as you posted a mention of this pat=
ch=20
in the last week I assumed that everyone would know I ment this patch . I=
f=20
was something else or older I would of said so.

>
>   It's still generic inverse k'th root preliminary code, but I doubt
>   that the speedup's will change much.
>
> Are you dividing with a kth root computation algorithm??  Or
> are you comparing division of GMP with your kth root code?
>

I compairing the existing gmp mpz_tdiv_q with a new fn mpz_tdiv_q_newton.
So if you replaced the existing mpz_tdiv_q with this new fn , which has t=
he=20
same inputs and outputs , you would get the corrisponding speedups/slowdo=
wns.

To implement this newton method division I use a fn which calculates inve=
rse=20
k'th roots with k=3D1

>   The runtime is proportional to the quotient, so for large
>   n and small q there is a large speed up (upto
>   5000x), though I thought gmp already had code for this
>   special case?
>
> I assume the runtime when generating just the quotient is in
> most cases a function of just the quotient size, but when
> the remainder is close to 0, I think it will need more time.
>

No , I don't use that method.I use something like this

To calculate the quotient of n/d . First choose j>=3D2 ( I take j=3D8)
The underlying inverse fn , returns an inverse of d which is guarenteed t=
o be=20
within certain limits, we multiply this by the numerator to get a approx=20
quotient q. This quotient is within 2^-j (either way) of the exact quotie=
nt.
Next set low=3Dfloor(q-2^-j) and set high=3Dfloor(q+2^-j) .
If low=3Dhigh then the exact quotient is q , otherwise we have to calcula=
te the=20
remainder and adjust q. The probability that low!=3Dhigh is about 1 in 2^=
j , so=20
only rairly do we have to calculate the remainder. And in the current cod=
e=20
when I calculate the remainder , I always calculate the full remainder.

I'll do some runs to confirm this.

So for some combinations of n and d  , it will run slower because we have=
 to=20
calculate the full remainder , although I not sure if these cases are eas=
ily=20
identifible. But by choosing j to be larger we can reduce the probability=
 of=20
these cases to a very small percentage.
Eg if j=3D32 then these slow cases would be just 1 in 2^32 of all possibl=
e=20
inputs of n and d , so in practise you would never come across this case.=
=20
But you can not take j to be too big because of (roughly) the runtime of=20
calculating the quotient depends on j+other_stuff.

Think of j as some excess bits that we calculate the quotient to .

>   I havent yet testing using this for barrett type division,
>   but it should be fairly good.
>
> I am confused, are you dividing or not?  Are you using a
> pre-inverse, but not Barrett's algorithm?
>

To summerise.

This is a preliminary replacement for mpz_tdiv_q and mpz_tdiv_qr .

The pre-inverse and/or Barretts algorithm are future possibilitys.


> Please try to work more with your postings to gmp-devel.  They
> come through as rather sloppy now.

Do you mean badly explained ?

I posted this so that I could see if this idea and code  is worth continu=
ing=20
with. is it ?

If this is worth continuing with , then of course I provide full details =
of=20
the exact algorithm used etc.
Previous message: division
Next message: mpz_powm memory usage vs speed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]