Improved Toom3 square
Jaewook Chung
j4chung at uwaterloo.ca
Thu Aug 31 06:07:26 CEST 2006
Hi,
I have written an improved routine for mpn_toom3_sqr_n() (source code is
attached to this email). It uses the same algorithm as Zimmerman's
implementation currently used in GMP 4.2.x. Zimmerman does not use separate
variables for storing carries, but in my testing it turned out that it is
better to store carries in separate variables as Harley did in the previous
version of GMP v4.1.x.
I do not know whether "make check" thoroughly checks the correctness of toom3
square routine, but it passed the test. I have also done some correctness
check myself and have found no error.
I have also tried to see whether using shift&add is better than using
mpn_addmul(), but in my testing there wasn't enough difference to conclude
that which one is better. In the code, both two methods are given, but
currently the method using mpn_addmul() is enabled by #if directive.
Depending on the architectures, the result may be different. Using shift&add
may work better on processors having slow multiplication but fast add and
shift. Using mpn_addmul() may be better on processors having similar
performance for add/sub/shift and mul. I have done all timing analyses on
Pentium 4 3.2GHz + HT with 2M L2 cache.
I am currently working on some potentially useful squaring routines. The
details are shown in my technical report:
http://www.cacr.math.uwaterloo.ca/techreports/2006/cacr2006-24.pdf (this is
in fact draft...expect some minor errors.)
Currently, 3-way asymmetric squaring has been implemented and its correctness
has been thoroughly checked. It shows better timing than mpz_mul() from
2300-7000 bit range on P4 (the technical report says it performs better for
2100-6912 range, but it seems that there was a slight error in my measurement
at that time. (so the claim that KA is not needed in GMP may not be true) It
may be because of some recent changes in my testing environment (compiler
upgrade, kernel upgrade...etc). I am currently trying to get correct timing
and will revise the report in very near future. I'm hoping to implement 4-way
and 5-way multiplication and squaring (symmetric & asymmetric) routines as
soon as my upcoming papers are completed.
Thanks,
J. Chung
-------------- next part --------------
A non-text attachment was scrubbed...
Name: new_toom3_sqr.c
Type: text/x-csrc
Size: 4400 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20060831/e9105bb3/attachment.bin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20060831/e9105bb3/attachment-0001.bin
More information about the gmp-devel
mailing list