Adrien Prost-Boucle adrien.prost-boucle at
Thu Nov 3 21:30:45 UTC 2016


This is a follow-up for previous discussion about improvement of
mpn_sqrtrem1() and maybe mpn_sqrtrem2():

Getting as much performance as possible while using clear algorithms is
something I am appreciate very much, so I did a bit of work about that.

In order to infer the algorithm that is currently used in current GMP
implementation, and understand where the magic constants come from, I
implemented several well-known algorithms and compared their
performance on my laptop.

I have a rather clear implementation for 32-bit values that is 20%
faster than current GMP mpn_sqrtrem().
The corresponding 64-bit version is 12% faster than GMP algorithm.
Note that I only return the root, whereas mpn_sqrtrem() also returns
the remainder. But that's no big deal, I think.

My program can be found there:
Compile it with standard gcc -O3 or -Ofast.

The interesting commands are:
./sqrt bench32 1000000
./sqrt bench32 1000000
These commands launch one million random sqrt calculations with all
algorithms and measure execution time.
The fastest algorithm is "inv", named after the inverse square root
that is used in the first steps of the calculation.

You can exhaustively check the results of the 32-bit version with the
following command:
./sqrt bench32-inv 0

Of course I have not tested the 64-bit version. And I know nothing
about proving, so the 64-bit version is still proposal to ve verified.

Finally, I know a few ways to simplify some parts of my implementation.
But for now, I would simply like to get some comments.


More information about the gmp-devel mailing list