Timing Toom-8.5 (mpn level)

bodrato at mail.dm.unipi.it bodrato at mail.dm.unipi.it
Mon Oct 26 12:48:17 CET 2009


>> In short terms (on my old laptop):
>> - GMP: Toom-8.5 is faster than Toom-4 from ~300 limbs ... and keeps on
>> being faster than FFT up to 12,000+ limbs
> Cool that it starts winning so early. I wonder what the thresholds
> would be if one also had the intermediate toom5, 6 and 7... It would
> push up the threshold for toom8 and toom8.5 quite a bit, I guess?

I do not know...

I _have_ Toom-6 (the truncated version of Toom-6.5), but its range is so
narrow that I decided to ignore it... Look at the zoomed graph! In the
range [250:380] there are _some_ sizes where T6 is faster than both T4 and
T8, it somehow depends on the congruence Mod 6, Mod 4, Mod 8... but, does
it worth distinguishing?
In MPIR there are Toom-4 and Toom-7, but my Toom-8 gets faster than both
_before_ they get faster than Toom-3!

The old mpz implementation I tested for Toom-4,5,6,7... suggested that
high degree Toom does not have a huge threshold... that's why I'm

I have to complete my current goal... but the next can be Toom-10.5 or
higher, I doubt I'll never try intermediate ones...

One question arises:
someone (Torbjorn?) suggested that on some architectures addmul_n can be
slower than add + addlsh.
Are there macros tuned for this purpose? How far can this happen?
Should one consider trading addmul_n(6) with addlsh(1) + addlsh(2)?
Or addmul_n(7) with add+addlsh(1) + addlsh(2)? Or ...

In Toom-8.5 I added things like the following:
  /* FIXME: tuneup should decide the best variant */
  ASSERT_NOCARRY(mpn_submul_1 (r3, r1, n3p1, 3969));
  ASSERT_NOCARRY(mpn_sub_n (r3, r3, r1, n3p1));
  ASSERT_NOCARRY(DO_mpn_addlsh_n (r3, r1, n3p1, 7, wsi));
  ASSERT_NOCARRY(DO_mpn_sublsh_n (r3, r1, n3p1, 12, wsi));

I tested both branches. If we name the macros... the code will be ready :-D



More information about the gmp-devel mailing list