Pentium 4 SSE2 challenge
Torbjorn Granlund
tg at swox.com
Fri Mar 18 02:51:16 CET 2005
GMP doesn't run as fast as it could on Pentium 4. The main
problem is that mpn_addmul_1 runs relatively slowly, at 5.5 or
6.0 cycles/limb (depending on P4 model).
It seems possible to shave off a cycle from these numbers,
reaching 4.5 and 5 cycles/limb, respectively. But that's still
not great. We should go for mpn_addmul_2 instead as the main
multiplication function.
mpn_addmul_2 is defined like this:
mp_limb_t
mpn_addmul_2 (mp_ptr rp, mp_srcptr up, mp_size_t n, mp_srcptr vp)
{
rp[n] = mpn_addmul_1 (rp, up, n, vp[0]);
return mpn_addmul_1 (rp, up, n, vp[1]);
}
Are there any Pentium 4 assembly hackers out there? Do you have
some spare time to play with this over Easter?
Below is what I came up with. It runs at 7.5 cycles/limb, which
is of course slower than mpn_addmul_1. It surely could be
improved to the point where it vastly outperforms mpn_addmul_1.
That is my challenge to the list: Write an mpn_addmul_2 that runs
at 4 cycles/limb or better. I suspect 3 cycles/limb might be
possible. Your tools: (1) An MMX/SSE2 assembly manual (from
www.amd.com or www.intel.com), (2) Loop unrolling, (3) Software
pipelining.
The reward is that lots of people with these roaster chips will
become grateful to you. :-)
If there are any takers, I will include the result in GMP 4.2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: p4a-addmul_2.asm
Type: application/octet-stream
Size: 2828 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20050318/414789ed/p4a-addmul_2.obj
-------------- next part --------------
Test code. Compile with -DN=2 to get mpn_addmul_2 testing, pass
CPU clock in Hertz as -DCLOCK=XXX, pass operand size as e.g.,
-DSIZE=50. Sample compilation command line (assuming pwd is at
the top of a gmp-4.1.4 source tree where a fresh build has been
made, and that the two attached files are there too):
(cd mpn; m4 ../addmul_2.asm) >x.s && gcc -O -I. x.s addmul_N.c -DN=2 -DCLOCK=XXX tests/.libs/libtests.a .libs/libgmp.a -DSIZE=50 && ./a.out 2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: addmul_N.c
Type: application/octet-stream
Size: 6077 bytes
Desc: not available
Url : http://gmplib.org/list-archives/gmp-devel/attachments/20050318/414789ed/addmul_N.obj
-------------- next part --------------
--
Torbj?rn
More information about the gmp-devel
mailing list