udiv_qrnnd_preinv and udiv_rnnd_preinv macros
Torbjorn Granlund
tg at gmplib.org
Sun Feb 27 22:02:25 CET 2011
nisse at lysator.liu.se (Niels Möller) writes:
I just committed a new macro, udiv_rnnd_preinv, which is
udiv_qrnnd_preinv with the q argument and the q adjustments omitted.
There are at least half a dozen potential users (but I haven't yet
commited changes to any of them).
I have relied on compilers supressing the q adustment, by means of dead
code elimination.
Both udiv_qrnnd_preinv and udiv_rnnd_preinv have a special case for
if (__builtin_constant_p (nl) && (nl) == 0)
The macro udiv_rnd_preinv (introduced by Torbjörn, I think) handles only
this special case and goes a bit further in optimizing it. Compared to
udiv_qrnnd_preinv, it does
_r = ~(_qh + (nh)) * d
rather than
_qh += (nh) + 1;
_r = (nl) - _qh * (d);
and it also omits the final adjustment, if (UNLIKELY (_r >= (d))), since
that can't happen. Should these optimizations be copied to the nl ==
constant 0 special cases of udiv_qrnnd_preinv and udiv_rnnd_preinv? Then
udiv_rnd_preinv gets redundant, at least when compiling with gcc which
understands __builtin_constant_p, and maybe it could be removed?
Sure, this make sense.
Alternatively, one could remove these optimizations from
udiv_rnnd_preinv and let callers use udiv_rnd_preinv instead, where
appropriate. And similarly, if needed, introduce an udiv_qrnd and use
that at call sites where nl == constant 0.
I think it is easy enough to pass a 0 there. (The same argument could
perhaps be used against your new udiv_rnnd_preinv, but dummy destination
parameters are harder to deal with.)
Next issue: Masking logic for the hard-to-predict adjustment. For
udiv_rnnd_preinv it's simple to replace
if (_r > _ql)
_r -= d;
by
mask = - (_r > _ql)
_r -= mask & d;
I've done similar things before with good results, so I'd be tempted to
do that change on faith, without even benchmarking ;-)
Be my guest...
GCC is not very clever about optimizing these into cmov, last time I
checked. But keeping a branch is pure performance disaster.
It would be nice to have something analogous to the gcc mechanism
LIKELY/UNLIKELY uses, for saying "unpredictable". I am not aware of any
such GCC extension.
And a similar change could work also for udiv_qrnnd_preinv, where one
would need
mask = - (_r > _ql)
_qh += mask;
_r -= mask & d;
(in the assembler version, that sequence corresponds to lea, cmp, cmovc,
sbb $0). Unless compilers are able to compile the currrent version to
branch free code (which I doubt, in particular for udiv_qrnnd_preinv), I
think this change ought to help? The mpn_divrem_1 or the (obsolete)
mpn_preinv_mod_1 are likely the simplest choices for benchmarking.
I agree. Benchmarking this will take lots of time; eliminating
unpredictable branches is a good thing, period.
--
Torbjörn
More information about the gmp-devel
mailing list