Several Points

Kevin Ryde user42 at
Sun May 2 00:09:17 CEST 2004

Josh Liu <zliu2 at> writes:
> I believe than using my single-precision Montgomery
> multiplication actually makes the algorithm slower than using a
> multiplication and a division, at least on the Pentium 4.

Incidentally, p4 is not a high performance chip for integer
multiplies, and it's carry flag handling is poor too.  But p4 division
is even slower still than multiplication, so if division is faster
then you're probably doing something wrong :-).

The p4 sse2 mmx instructions are better, but athlon or opteron are
friendlier and certainly opteron will give greater total performance
(clock for clock).

More information about the gmp-devel mailing list