C implementation of mod_1_1
nisse at lysator.liu.se
Mon Feb 28 22:38:10 CET 2011
I committed an alternative implementation for generic/mod_1_1.c, using
less multiplies and a bit more carry propagation logic (same idea as
used in the x86_64 code). Configured via MOD_1_1P_METHOD.
Also added the corresponding speed and tuneup code.
On my core2 laptop, I get
$ ./speed -o cycles-broken -s2000 -C mpn_mod_1_1_1.0x8765432187654321 mpn_mod_1_1_2.0x8765432187654321 mpn_mod_1_1.0x8765432187654321
clock_gettime is 1.000ns accurate
overhead 6.38 cycles, precision 10000 units of 1.00e-09 secs, CPU freq 1200.00 MHz
mpn_mod_1_1_1.0x8765432187654321 mpn_mod_1_1_2.0x8765432187654321 mpn_mod_1_1.0x8765432187654321
2000 17.3778 14.8116 #12.7650
which means that the new C implementation beats the old one, but (as
expected) not the assembler version.
I'm very curious about tuneup results on other machines (and of course,
if I have broken anything...).
It would most likely help a lot to add native definitions of add_mssaaaa
for more non-x86 machines, in generic/mod_1_1.c
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel