The single-instruction multiple-data support in current microprocessors is aimed at signal processing algorithms where each data point can be treated more or less independently. There’s generally not much support for propagating the sort of carries that arise in GMP.
SIMD multiplications of say four 16x16 bit multiplies only do as much work as one 32x32 from GMP’s point of view, and need some shifts and adds besides. But of course if say the SIMD form is fully pipelined and uses less instruction decoding then it may still be worthwhile.
On the x86 chips, MMX has so far found a use in mpn_rshift
and
mpn_lshift
, and is used in a special case for 16-bit multipliers in the
P55 mpn_mul_1
. SSE2 is used for Pentium 4 mpn_mul_1
,
mpn_addmul_1
, and mpn_submul_1
.