Improved Toom3 square

Jaewook Chung j4chung at uwaterloo.ca
Thu Aug 31 06:07:26 CEST 2006


Hi,

I have written an improved routine for mpn_toom3_sqr_n() (source code is 
attached to this email). It uses the same algorithm as Zimmerman's 
implementation currently used in GMP 4.2.x. Zimmerman does not use separate 
variables for storing carries, but in my testing it turned out that it is 
better to store carries in separate variables as Harley did in the previous 
version of GMP v4.1.x.

I do not know whether "make check" thoroughly checks the correctness of toom3 
square routine, but it passed the test. I have also done some correctness 
check myself and have found no error.

I have also tried to see whether using shift&add is better than using 
mpn_addmul(), but in my testing there wasn't enough difference to conclude 
that which one is better. In the code, both two methods are given, but 
currently the method using mpn_addmul() is enabled by #if directive.

Depending on the architectures, the result may be different. Using shift&add 
may work better on processors having slow multiplication but fast add and 
shift. Using mpn_addmul() may be better on processors having similar 
performance for add/sub/shift and mul. I have done all timing analyses on 
Pentium 4 3.2GHz + HT with 2M L2 cache.

I am currently working on some potentially useful squaring routines. The 
details are shown in my technical report: 
http://www.cacr.math.uwaterloo.ca/techreports/2006/cacr2006-24.pdf (this is 
in fact draft...expect some minor errors.)

Currently, 3-way asymmetric squaring has been implemented and its correctness 
has been thoroughly checked. It shows better timing than mpz_mul() from 
2300-7000 bit range on P4 (the technical report says it performs better for 
2100-6912 range, but it seems that there was a slight error in my measurement 
at that time. (so the claim that KA is not needed in GMP may not be true) It 
may be because of some recent changes in my testing environment (compiler 
upgrade, kernel upgrade...etc). I am currently trying to get correct timing 
and will revise the report in very near future. I'm hoping to implement 4-way 
and 5-way multiplication and squaring (symmetric & asymmetric) routines as 
soon as my upcoming papers are completed.

Thanks,
J. Chung
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new_toom3_sqr.c
Type: text/x-csrc
Size: 4400 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20060831/e9105bb3/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20060831/e9105bb3/attachment-0001.bin 


More information about the gmp-devel mailing list