Adrien Prost-Boucle adrien.prost-boucle at
Sun Jan 29 11:21:26 UTC 2017


The potential speedup of 3x on high-end CPUs for functions sqrtrem1 and sqrtrem2.
This is exactly the case of workstations used for intensive computing.
But I don't know the actual impact on mpz_sqrt() function and similar...
Maybe we should study that first.

Maybe the availability of SSE / AVX / NEON etc instruction sets can be checked at compilation time?
These will always be faster than integer code - and be smaller when compiled.
Then an ASM implementation can be used for the appropriate architectures.
I've seen that a lot of code in GMP is handled that way, in source directory mpn/.
The ASM version would be very easy to obtain:
compile sqrtrem1 and sqrtrem2 (an FP implementation) on the right machine and keep the ASM.

There would be no dependency on libm.
How difficult would it be to add such checks in GMP code?


On Sun, 2017-01-29 at 11:25 +0100, Torbjörn Granlund wrote:
> > Adrien Prost-Boucle <adrien.prost-boucle at> writes:
>   So first I'd like to know,
>   what do GMP developers think about using FP there?
> Making GMP dependent in libm is not OK.
> Using time-critical floating-point features on a CPU-by-CPU basis is ok,
> but needs to be done with care.  Are these instructions unconditionally
> available?  Are they affected by a hardware state "rounding mode" which
> could make the operation incorrect?
> Using floating-point in C for something time-critical is not good since
> GMP should run well on a broad set of CPUs, some of which will run such
> code poorly.

More information about the gmp-devel mailing list