GMP failures due to Linux bugs

Torbjörn Granlund tg at gmplib.org
Thu Jul 26 15:35:34 UTC 2018


You might have seen that there are many "Unexpected failures" reported
at <https://gmplib.org/devel/tm/gmp/date.html>.  This started about a
month ago, and are not the result of any changes in GMP or in GMP's test
machinery; the problem is changes to the Linux kernel.

We first saw it in the yet-to-be-released Debian 10 (when moving from
Debian's kernel 4.14.0-3 to 4.16.0-2), but then Debian 9 (when moving
Debian's kernel from 4.9.0-6 to 4.9.0-7), Ubuntu 18.04, and finally
Gentoo (when moving from 4.9.95 to 4.14.52) also broke.

The failures apparently happen for all 64-bit Intel and AMD CPUs.

Failure types:

* 32-bit processes running under a 64-bit kernel are spuriously
  terminated with SIGKILL.

* 64-bit processes fail to allocate reasonable amounts of memory.

Both failure types happen when there are plenty of system resources
available.  Neither failure is reproducible.  We've seen more than 100
failures thus far (but that's limited by (1) letting GMP's test
machinery know that these systems are inherently unstable and thus have
it retry failing steps, and (2) reverting to known-good Linux kernels).

I have made a futile attempt at reporting the first failure.  I don't
know which kernel change caused these problems, but an educated guess is
that some Meltdown bug or Spectre bugs mitigation change is in itself
buggy.


PS. About 100% of my GMP hacking time is wasted on hardware bugs and
software bugs.  I would suggest that the computer industry is in a deep
crisis when we produce such awfully broken products as we currently do.

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list