Shared toom evaluation functions

bodrato at mail.dm.unipi.it bodrato at mail.dm.unipi.it
Sun Nov 15 22:52:53 CET 2009


Ciao,

>   > We should use mpn_addlsh1_n in more places I think, even for the s,t
>   > related computations, such as pm2 in toom72.  That will be a bit

>   All evaluations in \pm2 and \pm1/2 for operands split in two should
> better use the simple trick:
>   b0 + 2*b1 = (b0 + b1) + b1

> I think we do that already, at least on some of the toom files.

Yes, we do. I mean, where we don't we should avoid addlsh1 anyway.

> And if we want to continue adding assembly primitives, we could of
> course do the above much faster using one single loop, doing two reads

I'm not able to write in assembler... but I can propose to use an assembly
function already written and never used :-D

--------8<------------8<------------8<------------8<------------8<----
diff -r 9cae3e117fbb mpn/generic/toom33_mul.c
--- a/mpn/generic/toom33_mul.c	Sun Nov 08 11:45:49 2009 +0100
+++ b/mpn/generic/toom33_mul.c	Sun Nov 15 22:42:38 2009 +0100
@@ -140,6 +140,13 @@
 #endif

   /* Compute as2.  */
+#if HAVE_NATIVE_mpn_rsblsh1_n
+  cy = mpn_add_n (as2, a2, as1, s);
+  if (s != n)
+    cy = mpn_add_1 (as2 + s, as1 + s, n - s, cy);
+  cy += as1[n];
+  cy = 2 * cy + mpn_rsblsh1_n (as2, a0, as2, n);
+#else
 #if HAVE_NATIVE_mpn_addlsh1_n
   cy  = mpn_addlsh1_n (as2, a1, a2, s);
   if (s != n)
@@ -153,6 +160,7 @@
   cy = 2 * cy + mpn_lshift (as2, as2, n, 1);
   cy -= mpn_sub_n (as2, as2, a0, n);
 #endif
+#endif
   as2[n] = cy;

   /* Compute bs1 and bsm1.  */
@@ -187,6 +195,13 @@
 #endif

   /* Compute bs2.  */
+#if HAVE_NATIVE_mpn_rsblsh1_n
+  cy = mpn_add_n (bs2, b2, bs1, t);
+  if (t != n)
+    cy = mpn_add_1 (bs2 + t, bs1 + t, n - t, cy);
+  cy += bs1[n];
+  cy = 2 * cy + mpn_rsblsh1_n (bs2, b0, bs2, n);
+#else
 #if HAVE_NATIVE_mpn_addlsh1_n
   cy  = mpn_addlsh1_n (bs2, b1, b2, t);
   if (t != n)
@@ -200,6 +215,7 @@
   cy = 2 * cy + mpn_lshift (bs2, bs2, n, 1);
   cy -= mpn_sub_n (bs2, bs2, b0, n);
 #endif
+#endif
   bs2[n] = cy;

   ASSERT (as1[n] <= 2);
--------8<------------8<------------8<------------8<------------8<----
It should be a good idea to trade two addlsh1 for an add and an rsblsh1,
everywhere, except maybe on P4:
$ grep P4 mpn/x86_64/aorrlsh1_n.asm
C P4:		13
$ grep P4 mpn/x86_64/pentium4/aors_n.asm
C P4:		 4
$ grep P4 mpn/x86_64/pentium4/aorslsh1_n.asm
C P4:		 5.8

Regards,
Marco

-- 
http://bodrato.it/papers/



More information about the gmp-devel mailing list