New mpn_divexact code

Tue Aug 1 11:22:46 CEST 2006

Paul.Zimmermann at loria.fr (Paul Zimmermann) writes:

  > I've put improved mpn_divexact code on the the tidbits page
  > (http://swox.com/gmp/development.html).

  Great! The new timings are very impressive!

One can see that the operand invariance helps the new code if one
compares e.g. the 2,000,000 / 1,000,000 and 11,000,000 / 1,000,000
times.  Using mpz_tdiv_qr, the times for the latter is about 10 times
higher, which is expected for the 10 times larger quotient.  Using the
new code, one critical call to the FFT multiplication will take
advantage of operand invariance:

      mpn_mul (tp, dp, dn, qp, in);

Here, {qp,in} is much smaller than {dp,dn} and your FFT code cleverly
transforms {qp,in} only once.

Further speedups should be possible if {ip,in} where pre-transformed,
and then used in the mpn_mullow_n calls.  Perhaps pre-transformation
would be possible also for the comparatively large {dp,dn}?

  When the quotient size (qn) is smaller than the divisor size (dn),
  it suffices to consider dn limbs of the dividend and divisor (either
  low or high bits, or dn/2 low bits and dn/2 high bits), thus a cutoff
  point is enough.

I think you mean qn instead of dn here the last three times...

The present code uses the low qn bits of the dividend and divisor
when qn < dn.

  The only real problem is when qn > dn.

That's the case I am losing sleep over.  :-)

This first showed up when Karl Hasselström worked on truncating
division for GMP, where the situation is very similar.

  > The next thing to do is implementing Jebelean's bidirectional
  > modular inversion algorithm.  That should save a good deal of the
  > inversion time.

  The tricky thing here is to glue the high and low parts of the quotient,
  and ensure the middle bits are correct.

I can imagine.  Are you aware of any free implementation, or perhaps
pseudo code, that one could start from?

How much time does Jebelean's method need in terms of M(n),
compared to the traditional Newton inversion?

My Newton inversion code could surely be improved without involving
Jebelean's method.  There are comments in the file (see
http://swox.com/gmp/devel/mpn/generic/invert_2exp.c) from you ("Paul
Zimmermann spoke"), and comments by me about using short
multiplication (mpn_sqrhigh_mn, mpn_mulmid_mn).

As we all know, short multiplication saves less for large operands,
thus your suggestions will be more important than mine for the operand
ranges where we currently use this code.  But it is interesting to
note that by using short multiplication, Barrett style division might
become useful also in the "schoolbook" operand range, since that's
where short multiplication saves the most.

  By the way, I was at the ANTS VII conference last week in Berlin.
  There were interesting talks about pseudosquares by Sorenson and
  Wooding/Williams. This enables to write a rigorous primality test
  up to a given bound (currently about 30 digits), with the same cost
  of O(log(n)) modular exponentiations for a n-digit number. This might
  be something to consider for future revisions of primality testing
  functions.

Improved primality testing functions is difficult but important.  I
might try to get a clever student to do it as a master thesis project.

-- 
Torbjörn