Cofactor canonicalisation of mpn_gcdext
tg at gmplib.org
Thu Dec 3 19:10:23 CET 2009
nisse at lysator.liu.se (Niels Möller) writes:
I gave div1 a try, and it actually seemed to make gcdext slower (./speed
-c -s1 mpn_gcdext on my x86_32 laptop).
32-bit machines typically perform division faster than 64-bit machines.
And in the case of x86, there is a more severe shortage of registers in
the 32-bit instruction set than in the 64-bit instruction set; that will
> If you think this code is right, don't be shy: please push it into the
> main repo and check http://gmplib.org/devel/testmachines.shtml tomorrow.
Pushed in now.
Benchmarking on shell, I get (for the old gcdext code):
$ ./speed -c -s1-5 mpn_gcdext
overhead 6.06 cycles, precision 10000 units of 4.66e-11 secs, CPU freq 21438.42 MHz
and for the just pushed in code, with one /-division per iteration in gcdext_1, I
So, on shell we get a speedup. Cute.
I bet that we do not get a speedup on repentium or panther or tiger.
Only 45nm Core 2 and Core i have early-out dividers, I think. An
early-out divider will help here.
Small-operand gcd and gcdext are interesting and important things to
optimise. We might define different approaches, and let tuneup choose
the best one.
More information about the gmp-devel