Releasing Toom-8.5 (mpn level)

bodrato at mail.dm.unipi.it bodrato at mail.dm.unipi.it
Thu Oct 29 13:46:54 CET 2009


Hi!

>> I _have_ Toom-6 (the truncated version of Toom-6.5), but its range is so
>> narrow that I decided to ignore it... Look at the zoomed graph! In the
>> range [250:380] there are _some_ sizes where T6 is faster than both T4

There was an incorrect tuning in Toom-6.5. I corrected it and now, in the
not so narrow range [250:380], Toom-6h basically _is_ the fastest
algorithm.

Then Toom-8.5 wins on current mul_n in the range [380:12300] and is
sometimes faster than fft up to 15300...

>> someone (Torbjorn?) suggested that on some architectures addmul_n can be
>> slower than add + addlsh.
>
> If the difference is large, I guess one might consider addmul_by3 and
> the like... I guess that can be faster only where addmul_1 is limited
> by multiplier throughput.

There will be a lot of addmul_by**! Maybe it is better to have some generic
addaddlsh(a,b,c,s) giving a+b+(c<<s).
Then addmul_by3 is add_addlsh(a,b,b,1); addmul_by9 is add_addlsh(a,b,b,3);
A subsublsh giving a-b-(c<<s) would be useful also in toom4 interpolation,
I guess...

>> I tested both branches. If we name the macros... the code will be ready
>
> We'd also need code in tune/tuneup to select the best strategy.

Right, but if one already have some code ready for the enhancement, the
effort is better paid...
I foresee a kind of a problem for this kind of tuning... One should at
first decide the faster way to compute e.g. addmul_by9, then compile Toom
and detect thresholds... it's a two-steps tuning.


At last, I had the time to test the code on a couple of 64-bits machines.
Tests where good, so I release the code. You can find it, on the web:
http://bodrato.it/software/toom.html#TC6.5

It is not perfect, yet! There are some _itch functions, but I'd not trust
them... both code works smoothly for balanced operands and for unbalanced
up to 2:1; wider differences in length should be tested more carefully.
For Toom-8.5 (and Toom-6.5) I wrote a single function mpn_toom8h_mul
(resp. toom6h), but it is very easy, if needed, to write all the
toomMN_mul interfaces (possibly extending to N<4, unsupported by the
general code).
Both share the toom-tools.h file with Toom-4.5.

If someone can test it, please let me know.

Regards,
Marco

-- 
http://bodrato.it/papers/



More information about the gmp-devel mailing list