Tricky bug with mpq_div2_exp and mpq_div_2mul on some architectures

Sylvain CHEVILLARD sylvain.chevillard at inria.fr
Mon Nov 12 14:03:52 CET 2012


Dear developers of GMP,

we found a bug in GMP. The mistake seems to be in the code for while 
(and is still present in the mercurial repository), but in fact the bug 
is only visible on a few number of architectures and with particular 
arguments, which probably explains that it has not been discovered sooner.

Consider the following code:

=====================================================================
#include <stdio.h>
#include <gmp.h>

int main(void) {
   mpq_t a;
   mpq_init(a);

   /* a <-- 1*B + 3*B^2 where B=2^32 */
   mpq_set_str(a, "55340232225423622144 / 1", 10);
   mpq_div_2exp(a,a,32);
   gmp_printf("%Qd\n", a);
   return 0;
}
=====================================================================

The expected result is of course 1+3*B=12884901889.
However, on a pentium 4 you get 3+3*B=12884901891 instead.

A similar bug can be achieved on an arm, but the constant 'a' has to be 
changed, e.g., to a = 237684487579686500932345921536 / 1
(i.e. a = 1*B + 2*B^2 + 3*B^3).
In that case, one expects 1 + 2*B + 3*B^2 = 55340232229718589441
and an arm gives 1 + 3*B + 3*B^2 = 55340232234013556737 instead.

Indeed, the bug appears when the following conjunction of events is met 
when calling mpq_div_2exp(a,b,c):
  * c is an exact multiple of the size of the limbs.
  * a == b as pointers.
  * The value of the numerator of a is a sufficiently big odd number 
multiplied by 2^(c)

We believe that the bug actually lies at line 58 of md_2exp.c where 
MPN_COPY_DECR is called with overlapping numbers. When performing the 
copy, src is actually overwritten by dst. A patch would probably be to 
call MPN_COPY_INCR instead (as far as we understand this code).

Of course, since MPN_COPY_DECR actually calls assembly code, things can 
begin to become architecture dependent. And it turns out that on modern 
x86, the assembly code auto-corrects the issue because MPN_COPY_INCR is 
used instead of MPN_COPY_DECR when possible (because it is more 
efficient). So, the bug is hidden on modern processors (but just by 
chance): the bug is not in the implementation of MPN_COPY_DECR on these 
architectures but really in the fact that MPN_COPY_DECR is called where 
it should not be.

Probably, the bug can also be exhibited for mpq_mul_2exp since it calls 
the same function mord_2exp.

Best regards,

Sylvain Chevillard and Christoph Lauter.





More information about the gmp-bugs mailing list