C implementation of mod_1_1

Niels Möller nisse at lysator.liu.se
Mon Feb 28 22:38:10 CET 2011


I committed an alternative implementation for generic/mod_1_1.c, using
less multiplies and a bit more carry propagation logic (same idea as
used in the x86_64 code). Configured via MOD_1_1P_METHOD.

Also added the corresponding speed and tuneup code.

On my core2 laptop, I get

  $ ./speed -o cycles-broken -s2000 -C mpn_mod_1_1_1.0x8765432187654321 mpn_mod_1_1_2.0x8765432187654321 mpn_mod_1_1.0x8765432187654321
  clock_gettime is 1.000ns accurate
  overhead 6.38 cycles, precision 10000 units of 1.00e-09 secs, CPU freq 1200.00 MHz
          mpn_mod_1_1_1.0x8765432187654321 mpn_mod_1_1_2.0x8765432187654321 mpn_mod_1_1.0x8765432187654321
  2000          17.3778       14.8116      #12.7650

which means that the new C implementation beats the old one, but (as
expected) not the assembler version.

I'm very curious about tuneup results on other machines (and of course,
if I have broken anything...).

It would most likely help a lot to add native definitions of add_mssaaaa
for more non-x86 machines, in generic/mod_1_1.c

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.



More information about the gmp-devel mailing list