adrien.prost-boucle at laposte.net
Thu Mar 23 19:46:38 UTC 2017
> About the pure C code, integer version that was working on,
> I now have an exhaustively validated version, with only one table of invsqrt shared between the 2 versions (32b, 64b, 2x64b).
> Previously I observed a moderate but interesting speedup compared to GMP.
> But... when I put that code in GMP code, that resulted in a noticeable slowdown /o\
> So, not yet ready...
It was just my benchmark code reusing too many times the same input values.
Branch prediction made GMP's sqrtrem1 appear faster than it actually is on normal pseudo-random workloads.
So, I have a working and exhaustively tested C version for sqrtrem1, that is slightly faster than GMP's.
I tested both the 32b and 64b versions by testing all root values within each of the 384 precalculated invsqrt segments.
- the sqrt result for bounds of each segment, inclusive
- the sqrt result for all values N*N and N*N-1 inside each segment to ensure transitions are at the right place
I think that's valid, please comment if that is not enough.
Patch coming soon.
Do I do it based on rev 17327 or on my previous patch that uses FP instructions on x86-86?
More information about the gmp-devel