divrem_1 and mod_1 (Was: Re: gmp-devel list)
Torbjorn Granlund
tg at gmplib.org
Wed Mar 6 11:13:15 CET 2013
Zimmermann Paul <Paul.Zimmermann at loria.fr> writes:
Finally to increase the technical content on this list Alexander Kruppa
told me about the following preprint of Ernst Mayer: arxiv.org/abs/1303.0328.
On a Core 2, he claims a mpn_mod_1 running at 6 cycles per limb, and a
mpn_divrem_1 at 12.5 cycles per limb.
On my computer GMP 5.1.1 runs at about 12 and 24 cycles per limb respectively:
tarte% ./speed -s 1000,2000,5000,10000 -c mpn_mod_1.16031552068505892269 mpn_divrem_1.16031552068505892269
overhead 6.13 cycles, precision 10000 units of 5.00e-10 secs, CPU freq 2000.00 MHz
mpn_mod_1.16031552068505892269 mpn_divrem_1.16031552068505892269
1000 #12682.00 24064.00
2000 #25024.00 48187.00
5000 #62084.00 119298.00
10000 #123786.00 238280.00
tarte% cat /proc/cpuinfo | grep "model name" | head -1
model name : Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
I assume that what he is suggesting is what we do for mod_1 already do
(since GMP 5.0) and for divrem_1 have publicly proposed to do, and for
which we publish cycle numbers at https://gmplib.org/devel/asm.html.
You have chosen very specific mod_1 operands above, which triggers
slower behaviour. Does Mayer claim to handle those faster than us, or
why else did you choose 16031552068505892269?
Our mod_1 on (the 5 generations old) Conroe CPU runs at 5 cycles/limb
with GMP since several years, except for operands > 2^62.
Our fastest divrem_1 (see asm.html) indeed runs at 12.5 c/l. This is
code which we did not optimise for this very obsolete processor, but for
less obsolete processors... The paper he refers ("Improved invariant
division") would describe what we do.
--
Torbjörn
More information about the gmp-devel
mailing list