divrem_1 and mod_1 (Was: Re: gmp-devel list)

Torbjorn Granlund tg at gmplib.org
Wed Mar 6 11:13:15 CET 2013

Zimmermann Paul <Paul.Zimmermann at loria.fr> writes:

  Finally to increase the technical content on this list Alexander Kruppa
  told me about the following preprint of Ernst Mayer: arxiv.org/abs/1303.0328.
  On a Core 2, he claims a mpn_mod_1 running at 6 cycles per limb, and a
  mpn_divrem_1 at 12.5 cycles per limb.
  On my computer GMP 5.1.1 runs at about 12 and 24 cycles per limb respectively:
  tarte% ./speed -s 1000,2000,5000,10000 -c mpn_mod_1.16031552068505892269 mpn_divrem_1.16031552068505892269
  overhead 6.13 cycles, precision 10000 units of 5.00e-10 secs, CPU freq 2000.00 MHz
          mpn_mod_1.16031552068505892269 mpn_divrem_1.16031552068505892269
  1000        #12682.00      24064.00
  2000        #25024.00      48187.00
  5000        #62084.00     119298.00
  10000      #123786.00     238280.00
  tarte% cat /proc/cpuinfo | grep "model name" | head -1
  model name      : Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz
I assume that what he is suggesting is what we do for mod_1 already do
(since GMP 5.0) and for divrem_1 have publicly proposed to do, and for
which we publish cycle numbers at https://gmplib.org/devel/asm.html.

You have chosen very specific mod_1 operands above, which triggers
slower behaviour.  Does Mayer claim to handle those faster than us, or
why else did you choose 16031552068505892269?

Our mod_1 on (the 5 generations old) Conroe CPU runs at 5 cycles/limb
with GMP since several years, except for operands > 2^62.

Our fastest divrem_1 (see asm.html) indeed runs at 12.5 c/l.  This is
code which we did not optimise for this very obsolete processor, but for
less obsolete processors...  The paper he refers ("Improved invariant
division") would describe what we do.


More information about the gmp-devel mailing list