Newton's method for integer division

Torbjorn Granlund tege at swox.com
Mon Nov 27 11:22:42 CET 2006


Craig Schroeder <schroeder at santel.net> writes:

  * Order should be one of 2, 3, 4, or 5.  These indicate that the 
  quadratic, cubic, quartic, or quintic convergent iterations should be 
  used.  Currently, the cubic version is fastest on my machine.
  
For computing the inverse, I assume?
  
  Benchmark data is after the code below.  It is a comparison of GMP's 
  division (without remainder) against this version at a variety of 
  numerator and denominator sizes.

It looks like you compare to gmp 4.2.1, without the Newton code from
the GMP development tidbits?  Could you please instead compare it to
our Newton code?  See http://swox.com/gmp/devel/.  The inversion code
is in invert.c, the truncating division Barrett code is in files with
"_div_mu_" in their name.

  For 2n / n division, Newton's method became faster (on this machine) 
  with n = 98,304 bits.
  
That is very good!  For a division with remainder, including the time
for computing the inverse, our 2n/n cutoff is roughly 150,000 on a
64-bit machine.

  The method does worse for smaller denominators, and much better for 
  larger denominators:
  
  For 1.6n / n division, Newton's method became faster (on this machien) 
  with n = 5,120 bits.  (Unless I am mistaken, this is even below the FFT 
  multiplication cutoff.)  Below is the complete set of cutoffs, by size 
  ratio:
  
  8/1: 1,048,576 bits / 131,072 bits
  8/2: 786,432 bits / 196,608 bits
  8/3: 393,216 bits / 147,456 bits
  8/4: 196,608 bits / 98,304 bits
  8/5: 8,192 bits / 5,120 bits
  8/6: 4,096 bits / 3,072 bits
  8/7: 3,072 bits / 2688 bits
  
This seems backwards to me!  The smaller the operand size difference,
the larger should be the cutoff point, since the overhead of computing
the inverse becomes propotionally larger.

This is indeed what we observe with our code.

When the dividend is many times larger than the divisor, our code
beats divide-and-conquer division at roughly 20000 bits (again on a
64-bit machine).

-- 
Torbjörn


More information about the gmp-discuss mailing list