Generic get_d_2exp failures

Sat Oct 28 21:42:57 UTC 2017

Marc Glisse <marc.glisse at inria.fr> writes:

> On Sat, 28 Oct 2017, Niels Möller wrote:
>
>> It would be nice if we could find a portable way to add two floating
>> point values without rounding up. Would something like this work?
>>
>>  s = a + b; /* Assume a > b */
>>  r = (s - a) - b;  /* No rounding expected here. */
>>  if (r > 0)
>>    s -= 2*r;
>
> Not sure where the 2*r is coming from. And I am not very optimistic
> that there is such a simple formula that works for any rounding mode,
> though I could easily be wrong.

Sorry for the confusion, I mistakenly thought that since the rounding
error affects the lsb, it would be of the same order. But I now realize
that r may be much smaller, and affect the lsb only after carry
propagation through low (otherwise ignored) mantissa of b.

So on second thought, I don't think we can fix the rounding by only
examining a, b, and s. We could compute rounded_b = b + r, and then we
know that s == a + b_rounded, with no rounding, but that still doesn't
tell us the position of the lsb, only that it's somewhere between the
least signficant 1-bit of b_rounded and the most significant 1-bit of r.

> There is also the question of how portable exactly it needs to be. C99
> nextafter can be helpful.

So 

  if (r > 0)
    s = nextafter(s, -infinity)

might work. Except that (i) c99 might not be available on obscure
systems where this code is used, and (ii) we try to avoid depending on
linking with -lm.

> It may be easier to assert that FLT_RADIX==2 and use DBL_MANT_DIG to
> avoid any rounding.

That's probably the sane way to do it.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.