s390 debug question

Fri Apr 29 05:17:25 CEST 2005

Hello,

I am a developer for a s390/s390x (aka IBM mainframe) emulator.

A user who is attempting to build the CentOS linux distribution for
s390/s390x architecture has reported a problem trying to build gmp-4.1.4
from the available RHEL rpm.  The rpm enables the mpfr portion of gmp.

The error I am working on comes from mpfr/tests/tadd.c  It seems the
problem stems from adding/subtracting a very small number (vsn) to a
very big number (vbn) using 53 bit precision.

For the testcase below, we are adding (in big-endian representation)
  0e99ee563b83c5a0 + f0d038ed4fb527de
which in decimal is
  2.4888731542366126e-238 + -2.5789977198441930e+235

In both intel and s390 emulation, the answer is
  f0d038ed4fb527dd
that is, a bit changes.  I confess I do not know enough why such a nit
(vsn) added to a vbn will cause a bit to change, but it does *seem*
silly.

The problem occurs (on s390 emulation) because
 z1.f != z2.f         but
 mpfr_cmp(zz,yy)      is true, ummm, I mean 0.

That is, on intel and s390 (emulation), x+y is not the same as mpfr_add
(x,y) but on s390 mpfr_cmp() says they are equal but on intel it's not.

That the emulator has a bug is my foremost suspicion.  Good Lord knows
we've had more than one, esp. in fp arithmetic.  I can try to replicate
the problem on a real s390, but, unfortunately, this is going to require
a bit of, umm, negotiation.

What I'm looking for is some pointers or insight on how to debug the
problem... if anyone can be so kind.  I do have a rather lengthy s390
instruction trace (~50000 lines) that I will be digging in to.

Thanks,

Greg Smith

#include <stdio.h>
#include "gmp.h"
#include "mpfr.h"

void _check (double, double, double, mp_rnd_t,
             unsigned int, unsigned int, unsigned int);

typedef union {
 double f;
 unsigned long long i;
} fi;

int main()
{
 fi x, y, z1, z2;
 unsigned int p = 53;
 mp_rnd_t r = GMP_RNDZ;
 mpfr_t xx, yy, zz;
 int t;

 if (sizeof(double) != sizeof(unsigned long long)) exit(1);

 x.i = 0x0e99ee563b83c5a0ull;
 y.i = 0xf0d038ed4fb527deull;

 mpfr_init2 (xx, p);
 mpfr_init2 (yy, p);
 mpfr_init2 (zz, p);

 mpfr_set_d(xx, x.f, r);
 mpfr_set_d(yy, y.f, r);

 z1.f = x.f + y.f;
 mpfr_add(zz, xx, yy, r);
 mpfr_set_machine_rnd_mode(r);
 z2.f = mpfr_get_d1 (zz);
 mpfr_set_d (yy, z2.f, GMP_RNDN);
 t = mpfr_cmp(zz,yy);

 printf("%1.16e %1.16e = \n%1.16e + %1.16e\n",z1.f,z2.f,x.f,y.f);
 printf("%16.16llx %16.16llx = %16.16llx + %16.16llx\n",
z1.i,z2.i,x.i,y.i);
 printf("mpfr_cmp %d  z1==z2 %d\n",t,z1.f==z2.f);
}