inf & nan for mpq

Wed Nov 17 06:24:22 CET 2010

Hello,

I've now cleaned up the implementation of inf & nan handling for mpq that I first posted back in May; attached is the patch against gmp-5.0.1. I ask that this be folded into the next gmp release. (If so, I'd be happy to write the necessary patches to the mpq documentation.)

The implementation is now complete (including the C++ wrapper) and mirrors the behavior of IEEE floating-point inf and nan. It uses the internal representations -1/0, 0/0, and 1/0, and thus does not require mpz to handle inf or nan. In my benchmarks it does not lead to any noticeable slowdown even when (ab)used on short numbers.

The present (vanilla gmp-5.0.1) treatment of zero mpq denominators is highly inconsistent: in some places they raise a DIVIDE_BY_ZERO exception, but in others they're silently ignored (often resulting in incorrect behavior), and in many places they lead to memory access errors that produce obscure segfaults and bus errors. Zero denominators do routinely arise in many rational computations; correct, predictable and documented behavior on these is thus highly desirable. My patch achieves this by consistently treating zero denominators as +/-inf or nan, depending on the sign of the numerator.

I append an updated response to the issues Marc raised in regard to the patch I posted back in May. The only open issue remaining is the behavior of the 3-valued sgn and cmp functions for nan arguments.  

Best regards,

- nic

-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpq_inf_nan.patch
Type: application/octet-stream
Size: 19602 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-discuss/attachments/20101117/2ecd62f6/attachment.obj>
-------------- next part --------------

On 31/05/2010, at 6:24 AM, Marc Glisse wrote:

> You say mpq_equal(nan,nan) returns true, but for type double, nan==nan returns false.

Fixed - mpq_equal(nan, x) now returns 0 for any x, including nan. In the C++ wrapper, (nan != x) returns true while all other comparisons involving nan return false, as per IEEE floating-point specs.

> sgn and cmp don't make much sense for nan, so a convention like 0 sounds ok to me. There is code out there that relies on sgn(x)==0 being equivalent to x==0, but then that code also relies on x being finite... Not sure how much introducing inf and nan may confuse existing code.
> By the way, with your patch, how does one test whether x==0? Creating a mpq to store 0 sounds like a waste, but sgn(x)==0 must now come with !isnan(x) (yes, 3-valued comparison doesn't match well with nan).

Yes, sgn & cmp are the one place where existing correct code might get confused. They have no IEEE floating-point equivalent, so we can freely choose what convention to adopt here. Perhaps mpq_sgn(nan) should return 1, or even better: some constant GMP_MPQ_NAN_COMPARISON > 1? The alternative, as you write, is to require those who care to protect their sgn(x)==0 tests with is_finite(x). Both options are equally fast (just a test for non-zero denominator) - the question is how much to nanny the user.

Keep in mind that in vanilla gmp 5.0.1, mpq_sgn(0/0) quietly returns 0, so code that relies on this shortcut is already broken in that it will claim that 0/0 == 0. Explicit inf/nan handling doesn't create these issues, it just brings them out into the open.

> You seem to have forgotten at least cmp_ui (I wouldn't have commented, but you are calling the implementation "complete").

Fixed - mpq_cmp_ui() and mpq_cmp_si() were missing but have been added now.

> The C++ wrapper would need modifications so that any comparison involving NaN returns false.

Done. Note though that (nan != x) returns true, as it should per IEEE specs.

> You would probably want extra functions like mpq_isfinite, in a second step.

Done - added macros mpq_isfinite(), mpq_isinf(), and mpq_isnan(). 

> I guess a relevant benchmark would involve only small numbers (say one or two limbs each), since that's where the overhead might be noticable. Yes, that is a common use for mpq (though possibly not what it is best suited for).

Done - I cannot detect any significant performance difference on int-sized (i.e., single limb) values either.