udiv_qrnnd_preinv and udiv_rnnd_preinv macros

Sun Feb 27 22:02:25 CET 2011

nisse at lysator.liu.se (Niels Möller) writes:

  I just committed a new macro, udiv_rnnd_preinv, which is
  udiv_qrnnd_preinv with the q argument and the q adjustments omitted.
  There are at least half a dozen potential users (but I haven't yet
  commited changes to any of them).

I have relied on compilers supressing the q adustment, by means of dead
code elimination.

  Both udiv_qrnnd_preinv and udiv_rnnd_preinv have a special case for

    if (__builtin_constant_p (nl) && (nl) == 0)

  The macro udiv_rnd_preinv (introduced by Torbjörn, I think) handles only
  this special case and goes a bit further in optimizing it. Compared to
  udiv_qrnnd_preinv, it does

    _r = ~(_qh + (nh)) * d

  rather than

    _qh += (nh) + 1;
    _r = (nl) - _qh * (d);

  and it also omits the final adjustment, if (UNLIKELY (_r >= (d))), since
  that can't happen. Should these optimizations be copied to the nl ==
  constant 0 special cases of udiv_qrnnd_preinv and udiv_rnnd_preinv? Then
  udiv_rnd_preinv gets redundant, at least when compiling with gcc which
  understands __builtin_constant_p, and maybe it could be removed?

Sure, this make sense.

  Alternatively, one could remove these optimizations from
  udiv_rnnd_preinv and let callers use udiv_rnd_preinv instead, where
  appropriate. And similarly, if needed, introduce an udiv_qrnd and use
  that at call sites where nl == constant 0.

I think it is easy enough to pass a 0 there.  (The same argument could
perhaps be used against your new udiv_rnnd_preinv, but dummy destination
parameters are harder to deal with.)

  Next issue: Masking logic for the hard-to-predict adjustment. For
  udiv_rnnd_preinv it's simple to replace

    if (_r >  _ql)
      _r -= d;

  by

    mask = - (_r > _ql)
    _r -= mask & d;

  I've done similar things before with good results, so I'd be tempted to
  do that change on faith, without even benchmarking ;-)

Be my guest...

GCC is not very clever about optimizing these into cmov, last time I
checked.  But keeping a branch is pure performance disaster.

It would be nice to have something analogous to the gcc mechanism
LIKELY/UNLIKELY uses, for saying "unpredictable".  I am not aware of any
such GCC extension.

  And a similar change could work also for udiv_qrnnd_preinv, where one
  would need

    mask = - (_r > _ql)
    _qh += mask;
    _r -= mask & d;

  (in the assembler version, that sequence corresponds to lea, cmp, cmovc,
  sbb $0). Unless compilers are able to compile the currrent version to
  branch free code (which I doubt, in particular for udiv_qrnnd_preinv), I
  think this change ought to help? The mpn_divrem_1 or the (obsolete)
  mpn_preinv_mod_1 are likely the simplest choices for benchmarking.

I agree.  Benchmarking this will take lots of time; eliminating
unpredictable branches is a good thing, period.

-- 
Torbjörn