Interesting anomaly of INV_NEWTON_THRESHOLD

Torbjorn Granlund tg at
Wed Dec 16 15:16:22 CET 2009

At we see that
ultrasparc machines get a very low INV_NEWTON_THRESHOLD.

What is going on, why does mpn_invert become faster than plain division
for such small operands, i.e., why does mpn_sbpi1_divappr_q run slower
than the newton loop?

It turns out that this is due to a know "flaw" in schoolbook division:
It uses mpn_submul_1 for updating the partial remainder.  Schoolbook
multiplication on the other hand uses mpn_addmul_2.  On some machines,
and ultrasparc in particular, the latter is much faster per cross
product than the former.

The solution is to base schoolbook division on a multi-limb inverse, and
develop multiple quotient limbs at a time.


More information about the gmp-devel mailing list