Tricky bug with mpq_div2_exp and mpq_div_2mul on some architectures
Sylvain CHEVILLARD
sylvain.chevillard at inria.fr
Mon Nov 12 14:03:52 CET 2012
Dear developers of GMP,
we found a bug in GMP. The mistake seems to be in the code for while
(and is still present in the mercurial repository), but in fact the bug
is only visible on a few number of architectures and with particular
arguments, which probably explains that it has not been discovered sooner.
Consider the following code:
=====================================================================
#include <stdio.h>
#include <gmp.h>
int main(void) {
mpq_t a;
mpq_init(a);
/* a <-- 1*B + 3*B^2 where B=2^32 */
mpq_set_str(a, "55340232225423622144 / 1", 10);
mpq_div_2exp(a,a,32);
gmp_printf("%Qd\n", a);
return 0;
}
=====================================================================
The expected result is of course 1+3*B=12884901889.
However, on a pentium 4 you get 3+3*B=12884901891 instead.
A similar bug can be achieved on an arm, but the constant 'a' has to be
changed, e.g., to a = 237684487579686500932345921536 / 1
(i.e. a = 1*B + 2*B^2 + 3*B^3).
In that case, one expects 1 + 2*B + 3*B^2 = 55340232229718589441
and an arm gives 1 + 3*B + 3*B^2 = 55340232234013556737 instead.
Indeed, the bug appears when the following conjunction of events is met
when calling mpq_div_2exp(a,b,c):
* c is an exact multiple of the size of the limbs.
* a == b as pointers.
* The value of the numerator of a is a sufficiently big odd number
multiplied by 2^(c)
We believe that the bug actually lies at line 58 of md_2exp.c where
MPN_COPY_DECR is called with overlapping numbers. When performing the
copy, src is actually overwritten by dst. A patch would probably be to
call MPN_COPY_INCR instead (as far as we understand this code).
Of course, since MPN_COPY_DECR actually calls assembly code, things can
begin to become architecture dependent. And it turns out that on modern
x86, the assembly code auto-corrects the issue because MPN_COPY_INCR is
used instead of MPN_COPY_DECR when possible (because it is more
efficient). So, the bug is hidden on modern processors (but just by
chance): the bug is not in the implementation of MPN_COPY_DECR on these
architectures but really in the fact that MPN_COPY_DECR is called where
it should not be.
Probably, the bug can also be exhibited for mpq_mul_2exp since it calls
the same function mord_2exp.
Best regards,
Sylvain Chevillard and Christoph Lauter.
More information about the gmp-bugs
mailing list