bodrato at mail.dm.unipi.it
Sat Mar 25 14:38:43 UTC 2017
Il Gio, 23 Marzo 2017 8:46 pm, Adrien Prost-Boucle ha scritto:
>> About the pure C code, integer version that was working on,
>> But... when I put that code in GMP code, that resulted in
>> a noticeable slowdown /o\
> Problem solved.
> Branch prediction made GMP's sqrtrem1 appear faster than it actually
> is on normal pseudo-random workloads.
> So, I have a working and exhaustively tested C version for sqrtrem1,
> that is slightly faster than GMP's.
> Patch coming soon.
I'll be happy to examine it.
Your observations ask for some investigation... Is your version faster
because of a faster core-sequence or thanks to an improved handling of the
possible branches? Can the proposed branch structure be applied also to
the current code, or it's strictly linked to the new core?
We shall probably move to SQRTREM1_NEEDNORM=0 (notation from Adrien's
patch) as soon as sqrtrem2 do not need rem1 any more, to avoid, at least,
the double computation of the reminder for non-normalised inputs.
> Do I do it based on rev 17327 or on my previous patch that uses
> FP instructions on x86-86?
The two patches are orthogonal, I suggest not to mix them.
More information about the gmp-devel