Timing Toom-8.5 (mpn level)
nisse at lysator.liu.se
Mon Oct 26 14:17:30 CET 2009
bodrato at mail.dm.unipi.it writes:
> I _have_ Toom-6 (the truncated version of Toom-6.5), but its range is so
> narrow that I decided to ignore it... Look at the zoomed graph! In the
> range [250:380] there are _some_ sizes where T6 is faster than both T4 and
> T8, it somehow depends on the congruence Mod 6, Mod 4, Mod 8... but, does
> it worth distinguishing?
I see, it looks quite narrow.
If toom4, toom5, toom6 and toom7 would get a range of less than a 100
limbs or so each (and with small difference in slope at the crossover
points), maybe it's not worth the effort and code size.
> someone (Torbjorn?) suggested that on some architectures addmul_n can be
> slower than add + addlsh.
If the difference is large, I guess one might consider addmul_by3 and
the like... I guess that can be faster only where addmul_1 is limited
by multiplier throughput.
> I tested both branches. If we name the macros... the code will be ready :-D
We'd also need code in tune/tuneup to select the best strategy.
More information about the gmp-devel