# Shared toom evaluation functions

Sun Nov 15 22:52:53 CET 2009

>   > We should use mpn_addlsh1_n in more places I think, even for the s,t
>   > related computations, such as pm2 in toom72.  That will be a bit

>   All evaluations in \pm2 and \pm1/2 for operands split in two should
> better use the simple trick:
>   b0 + 2*b1 = (b0 + b1) + b1

> I think we do that already, at least on some of the toom files.

Yes, we do. I mean, where we don't we should avoid addlsh1 anyway.

> And if we want to continue adding assembly primitives, we could of
> course do the above much faster using one single loop, doing two reads

I'm not able to write in assembler... but I can propose to use an assembly
function already written and never used :-D

diff -r 9cae3e117fbb mpn/generic/toom33_mul.c
--- a/mpn/generic/toom33_mul.c	Sun Nov 08 11:45:49 2009 +0100
+++ b/mpn/generic/toom33_mul.c	Sun Nov 15 22:42:38 2009 +0100
@@ -140,6 +140,13 @@
#endif

/* Compute as2.  */
+#if HAVE_NATIVE_mpn_rsblsh1_n
+  cy = mpn_add_n (as2, a2, as1, s);
+  if (s != n)
+    cy = mpn_add_1 (as2 + s, as1 + s, n - s, cy);
+  cy += as1[n];
+  cy = 2 * cy + mpn_rsblsh1_n (as2, a0, as2, n);
+#else
cy  = mpn_addlsh1_n (as2, a1, a2, s);
if (s != n)
@@ -153,6 +160,7 @@
cy = 2 * cy + mpn_lshift (as2, as2, n, 1);
cy -= mpn_sub_n (as2, as2, a0, n);
#endif
+#endif
as2[n] = cy;

/* Compute bs1 and bsm1.  */
@@ -187,6 +195,13 @@
#endif

/* Compute bs2.  */
+#if HAVE_NATIVE_mpn_rsblsh1_n
+  cy = mpn_add_n (bs2, b2, bs1, t);
+  if (t != n)
+    cy = mpn_add_1 (bs2 + t, bs1 + t, n - t, cy);
+  cy += bs1[n];
+  cy = 2 * cy + mpn_rsblsh1_n (bs2, b0, bs2, n);
+#else
cy  = mpn_addlsh1_n (bs2, b1, b2, t);
if (t != n)
@@ -200,6 +215,7 @@
cy = 2 * cy + mpn_lshift (bs2, bs2, n, 1);
cy -= mpn_sub_n (bs2, bs2, b0, n);
#endif
+#endif
bs2[n] = cy;

ASSERT (as1[n] <= 2);
It should be a good idea to trade two addlsh1 for an add and an rsblsh1,
everywhere, except maybe on P4:
$grep P4 mpn/x86_64/aorrlsh1_n.asm C P4: 13$ grep P4 mpn/x86_64/pentium4/aors_n.asm
C P4:		 4
\$ grep P4 mpn/x86_64/pentium4/aorslsh1_n.asm
C P4:		 5.8

