divrem_1 and mod_1 (Was: Re: gmp-devel list)
Zimmermann Paul
Paul.Zimmermann at loria.fr
Wed Mar 6 11:37:40 CET 2013
Torbjörn,
> From: Torbjorn Granlund <tg at gmplib.org>
> Date: Wed, 06 Mar 2013 11:13:15 +0100
>
> Zimmermann Paul <Paul.Zimmermann at loria.fr> writes:
>
> Finally to increase the technical content on this list Alexander Kruppa
> told me about the following preprint of Ernst Mayer: arxiv.org/abs/1303.0328.
> On a Core 2, he claims a mpn_mod_1 running at 6 cycles per limb, and a
> mpn_divrem_1 at 12.5 cycles per limb.
>
> On my computer GMP 5.1.1 runs at about 12 and 24 cycles per limb respectively:
>
> tarte% ./speed -s 1000,2000,5000,10000 -c mpn_mod_1.16031552068505892269 mpn_divrem_1.16031552068505892269
> overhead 6.13 cycles, precision 10000 units of 5.00e-10 secs, CPU freq 2000.00 MHz
> mpn_mod_1.16031552068505892269 mpn_divrem_1.16031552068505892269
> 1000 #12682.00 24064.00
> 2000 #25024.00 48187.00
> 5000 #62084.00 119298.00
> 10000 #123786.00 238280.00
>
> tarte% cat /proc/cpuinfo | grep "model name" | head -1
> model name : Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz
>
> I assume that what he is suggesting is what we do for mod_1 already do
> (since GMP 5.0) and for divrem_1 have publicly proposed to do, and for
> which we publish cycle numbers at https://gmplib.org/devel/asm.html.
ok fine.
> You have chosen very specific mod_1 operands above, which triggers
> slower behaviour. Does Mayer claim to handle those faster than us, or
> why else did you choose 16031552068505892269?
I didn't yet read the details of the paper, but the abstract says
"64-bit-word divisor" without any restriction.
I had first chosen a smaller divisor, and noticed mod_1 was faster (around
5 cycles/limb). Then I tried 2^64-1 and it was slower. Then I generated a
random odd divisor in [2^63, 2^64].
> Our mod_1 on (the 5 generations old) Conroe CPU runs at 5 cycles/limb
> with GMP since several years, except for operands > 2^62.
is divrem_1 also faster for divisors of 62 bits or less?
> Our fastest divrem_1 (see asm.html) indeed runs at 12.5 c/l. This is
> code which we did not optimise for this very obsolete processor, but for
> less obsolete processors... The paper he refers ("Improved invariant
> division") would describe what we do.
thank you. Good to see GMP is up-to-date with the state-of-art!
Paul
PS: I wonder why you write "https" above, anyway the certificate expired in
2010.
More information about the gmp-devel
mailing list