Shared toom evaluation functions

Torbjorn Granlund tg at gmplib.org
Sun Nov 15 23:03:43 CET 2009


bodrato at mail.dm.unipi.it writes:

  Yes, we do. I mean, where we don't we should avoid addlsh1 anyway.
  
I don't follow.  Why should we "avoid" addlsh1?

  > And if we want to continue adding assembly primitives, we could of
  > course do the above much faster using one single loop, doing two reads
  
  I'm not able to write in assembler... but I can propose to use an assembly
  function already written and never used :-D

Cool, mpn_rsblsh1_n was for free with just some m4 tinkering.

  --------8<------------8<------------8<------------8<------------8<----
  diff -r 9cae3e117fbb mpn/generic/toom33_mul.c
  --- a/mpn/generic/toom33_mul.c	Sun Nov 08 11:45:49 2009 +0100
  +++ b/mpn/generic/toom33_mul.c	Sun Nov 15 22:42:38 2009 +0100
  @@ -140,6 +140,13 @@
   #endif
  
     /* Compute as2.  */
  +#if HAVE_NATIVE_mpn_rsblsh1_n
  +  cy = mpn_add_n (as2, a2, as1, s);
  +  if (s != n)
  +    cy = mpn_add_1 (as2 + s, as1 + s, n - s, cy);
  +  cy += as1[n];
  +  cy = 2 * cy + mpn_rsblsh1_n (as2, a0, as2, n);
  +#else
   #if HAVE_NATIVE_mpn_addlsh1_n
     cy  = mpn_addlsh1_n (as2, a1, a2, s);
     if (s != n)
  @@ -153,6 +160,7 @@
     cy = 2 * cy + mpn_lshift (as2, as2, n, 1);
     cy -= mpn_sub_n (as2, as2, a0, n);
   #endif
  +#endif
     as2[n] = cy;
  
     /* Compute bs1 and bsm1.  */
  @@ -187,6 +195,13 @@
   #endif
  
     /* Compute bs2.  */
  +#if HAVE_NATIVE_mpn_rsblsh1_n
  +  cy = mpn_add_n (bs2, b2, bs1, t);
  +  if (t != n)
  +    cy = mpn_add_1 (bs2 + t, bs1 + t, n - t, cy);
  +  cy += bs1[n];
  +  cy = 2 * cy + mpn_rsblsh1_n (bs2, b0, bs2, n);
  +#else
   #if HAVE_NATIVE_mpn_addlsh1_n
     cy  = mpn_addlsh1_n (bs2, b1, b2, t);
     if (t != n)
  @@ -200,6 +215,7 @@
     cy = 2 * cy + mpn_lshift (bs2, bs2, n, 1);
     cy -= mpn_sub_n (bs2, bs2, b0, n);
   #endif
  +#endif
     bs2[n] = cy;
  
     ASSERT (as1[n] <= 2);
  --------8<------------8<------------8<------------8<------------8<----
  It should be a good idea to trade two addlsh1 for an add and an rsblsh1,
  everywhere, except maybe on P4:
  $ grep P4 mpn/x86_64/aorrlsh1_n.asm
  C P4:		13
  $ grep P4 mpn/x86_64/pentium4/aors_n.asm
  C P4:		 4
  $ grep P4 mpn/x86_64/pentium4/aorslsh1_n.asm
  C P4:		 5.8
  
I am sure the P4 could be fixed, with its own rsblsh1.  It should not
pick up that aorrlsh1_asm, I suppose...

-- 
Torbjörn


More information about the gmp-devel mailing list