"balanced" unbalanced toom22
David Harvey
dmharvey at nyu.edu
Wed Aug 6 21:14:32 CEST 2008
Hi,
I tried writing another version of mpn_mul_toom22(), based on TG's
code from http://gmplib.org/devel/mpn/generic/mul_toom22.c.
The idea is to break up the inputs as e.g.
AAAAAA|AAAAAA
==BBBB|BBBB==
instead of
AAAAAA|AAAAAA
====BB|BBBBBB
So for example, if the input lengths are 2n >= 2m, then instead of
splitting into n*n and n*n and n*(2m-n) products, it gets split into
n*n and n*m and n*m.
I was hoping that the "better balancing" of the subproducts would
make things faster.
Well, it turns out to be slower, for reasons I only partially
understand. TG suggested it might be useful for me to post the code
here anyway. (Warning: the code is a bit messy.)
I don't think it's just because the algorithm is more complicated and
the overhead larger.
Take for example the fairly extreme case 200x120. (This would
probably be handled by toom32, but just as an example.)
The existing toom22 breaks it up as 100x100 + 100x100 + 100x20. My
experimental code does it as 100x100 + 100x60 + 100x60.
The problem is that both 100x60's get only a little benefit from
karatsuba (threshold is 46 on this machine), whereas 100x100 gets
quite a lot, more than the two 100x60's put together. So the first
method wins even though 100x20 doesn't get to take advantage of
karatsuba at all. I suspect the same thing happens at any size.
david
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_toom22.c
Type: application/octet-stream
Size: 10779 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20080806/3a0d7472/attachment.obj
-------------- next part --------------
More information about the gmp-devel
mailing list