division
Jason Moxham
J.L.Moxham@maths.soton.ac.uk
Fri, 8 Nov 2002 18:15:01 +0000
On Friday 08 Nov 2002 8:12 am, Torbjorn Granlund wrote:
> Jason Moxham <J.L.Moxham@maths.soton.ac.uk> writes:
>
> Attached are some times for varible precision newton
> iteration for calculation quotient and/or remainder
>
> the numbers are speedup's over GMP4.1 and GMP4.1+div-patch
> the left hand column is the number of limbs in the
> numerator and the top row is the number of limbs in the
> denominator as a fraction of the number of limbs in the
> numerator
>
> You cannot expect the readers of this list to know what
> GMP4.1+div-patch could be.
GMP4.1+div-patch is GMP4.1 plus a patch by Torbjorn to correct some speed=
=20
problems with mpn_tdiv_qr in gmp4.1 , as you posted a mention of this pat=
ch=20
in the last week I assumed that everyone would know I ment this patch . I=
f=20
was something else or older I would of said so.
>
> It's still generic inverse k'th root preliminary code, but I doubt
> that the speedup's will change much.
>
> Are you dividing with a kth root computation algorithm?? Or
> are you comparing division of GMP with your kth root code?
>
I compairing the existing gmp mpz_tdiv_q with a new fn mpz_tdiv_q_newton.
So if you replaced the existing mpz_tdiv_q with this new fn , which has t=
he=20
same inputs and outputs , you would get the corrisponding speedups/slowdo=
wns.
To implement this newton method division I use a fn which calculates inve=
rse=20
k'th roots with k=3D1
> The runtime is proportional to the quotient, so for large
> n and small q there is a large speed up (upto
> 5000x), though I thought gmp already had code for this
> special case?
>
> I assume the runtime when generating just the quotient is in
> most cases a function of just the quotient size, but when
> the remainder is close to 0, I think it will need more time.
>
No , I don't use that method.I use something like this
To calculate the quotient of n/d . First choose j>=3D2 ( I take j=3D8)
The underlying inverse fn , returns an inverse of d which is guarenteed t=
o be=20
within certain limits, we multiply this by the numerator to get a approx=20
quotient q. This quotient is within 2^-j (either way) of the exact quotie=
nt.
Next set low=3Dfloor(q-2^-j) and set high=3Dfloor(q+2^-j) .
If low=3Dhigh then the exact quotient is q , otherwise we have to calcula=
te the=20
remainder and adjust q. The probability that low!=3Dhigh is about 1 in 2^=
j , so=20
only rairly do we have to calculate the remainder. And in the current cod=
e=20
when I calculate the remainder , I always calculate the full remainder.
I'll do some runs to confirm this.
So for some combinations of n and d , it will run slower because we have=
to=20
calculate the full remainder , although I not sure if these cases are eas=
ily=20
identifible. But by choosing j to be larger we can reduce the probability=
of=20
these cases to a very small percentage.
Eg if j=3D32 then these slow cases would be just 1 in 2^32 of all possibl=
e=20
inputs of n and d , so in practise you would never come across this case.=
=20
But you can not take j to be too big because of (roughly) the runtime of=20
calculating the quotient depends on j+other_stuff.
Think of j as some excess bits that we calculate the quotient to .
> I havent yet testing using this for barrett type division,
> but it should be fairly good.
>
> I am confused, are you dividing or not? Are you using a
> pre-inverse, but not Barrett's algorithm?
>
To summerise.
This is a preliminary replacement for mpz_tdiv_q and mpz_tdiv_qr .
The pre-inverse and/or Barretts algorithm are future possibilitys.
> Please try to work more with your postings to gmp-devel. They
> come through as rather sloppy now.
Do you mean badly explained ?
I posted this so that I could see if this idea and code is worth continu=
ing=20
with. is it ?
If this is worth continuing with , then of course I provide full details =
of=20
the exact algorithm used etc.