"balanced" unbalanced toom22

Wed Aug 6 21:14:32 CEST 2008

Hi,

I tried writing another version of mpn_mul_toom22(), based on TG's  
code from http://gmplib.org/devel/mpn/generic/mul_toom22.c.

The idea is to break up the inputs as e.g.

AAAAAA|AAAAAA
==BBBB|BBBB==

instead of

AAAAAA|AAAAAA
====BB|BBBBBB

So for example, if the input lengths are 2n >= 2m, then instead of  
splitting into n*n and n*n and n*(2m-n) products, it gets split into  
n*n and n*m and n*m.

I was hoping that the "better balancing" of the subproducts would  
make things faster.

Well, it turns out to be slower, for reasons I only partially  
understand. TG suggested it might be useful for me to post the code  
here anyway. (Warning: the code is a bit messy.)

I don't think it's just because the algorithm is more complicated and  
the overhead larger.

Take for example the fairly extreme case 200x120. (This would  
probably be handled by toom32, but just as an example.)

The existing toom22 breaks it up as 100x100 + 100x100 + 100x20. My  
experimental code does it as 100x100 + 100x60 + 100x60.

The problem is that both 100x60's get only a little benefit from  
karatsuba (threshold is 46 on this machine), whereas 100x100 gets  
quite a lot, more than the two 100x60's put together. So the first  
method wins even though 100x20 doesn't get to take advantage of  
karatsuba at all. I suspect the same thing happens at any size.

david

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mul_toom22.c
Type: application/octet-stream
Size: 10779 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20080806/3a0d7472/attachment.obj 
-------------- next part --------------