Generic get_d_2exp failures
nisse at lysator.liu.se
Sat Oct 28 21:42:57 UTC 2017
Marc Glisse <marc.glisse at inria.fr> writes:
> On Sat, 28 Oct 2017, Niels Möller wrote:
>> It would be nice if we could find a portable way to add two floating
>> point values without rounding up. Would something like this work?
>> s = a + b; /* Assume a > b */
>> r = (s - a) - b; /* No rounding expected here. */
>> if (r > 0)
>> s -= 2*r;
> Not sure where the 2*r is coming from. And I am not very optimistic
> that there is such a simple formula that works for any rounding mode,
> though I could easily be wrong.
Sorry for the confusion, I mistakenly thought that since the rounding
error affects the lsb, it would be of the same order. But I now realize
that r may be much smaller, and affect the lsb only after carry
propagation through low (otherwise ignored) mantissa of b.
So on second thought, I don't think we can fix the rounding by only
examining a, b, and s. We could compute rounded_b = b + r, and then we
know that s == a + b_rounded, with no rounding, but that still doesn't
tell us the position of the lsb, only that it's somewhere between the
least signficant 1-bit of b_rounded and the most significant 1-bit of r.
> There is also the question of how portable exactly it needs to be. C99
> nextafter can be helpful.
if (r > 0)
s = nextafter(s, -infinity)
might work. Except that (i) c99 might not be available on obscure
systems where this code is used, and (ii) we try to avoid depending on
linking with -lm.
> It may be easier to assert that FLT_RADIX==2 and use DBL_MANT_DIG to
> avoid any rounding.
That's probably the sane way to do it.
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel