Torbjörn Granlund tg at
Tue Sep 30 06:59:45 UTC 2014

Zimmermann Paul <Paul.Zimmermann at> writes:, see Fig. 1 page 17.
I took a quick glance.

They compare against "GMP" and "GMP Optimised".  Note that "GMP" here is
some undefined precompiled variant, perhaps 32-bit and surely not the
corrrect compile for their processor.

"GMP Optimised" presumably is just a proper compile, not a variant
improved by the authors.  Here 2048-bit mod 64-bit runs quite similarly
to their new code, although their diagram is not zero based making an
apparent great difference.

I am not sure one can draw any conclusions about the relative
performance of the current GMP code and their suggested new method
considering how they present the performance results.

Niels and I have published improved algorithms for GMP's mpn_divrem_1
operation, but we have yet to finish implementing them.

