reproducibility of GMP random functions vs limb size and GMP version

Thu Sep 16 20:17:34 CEST 2010

Paul Zimmermann wrote:

> as far as such a change is documented in the new version, it is fine
> for us. We were more concerned for a difference between the same GMP
> version on different hardware, but Torbjörn's answer clarifies this.

The current code should give the same values in all platforms. If that's
not the case then that's a bug that should be fixed as quickly as
possible. A pseudorandom function is not that different from the rest of
functions of GMP; the only difference is that it returns a sequence
instead of a value, for a given input value (seed). As such, it should
be consistent across architectures. That's my view on it, anyway. The
tests should include a check for a specific number of the sequence
against a constant; a quick look reveals that that's not the case and it
should be added. My bad.

As for the documenting, I don't think there would be a problem with
including a mention in the release notes, in case the seeding function
(or even the algorithm) changes.

> do you mean gmp_randseed_ui() would be 100x faster, or say mpz_urandomb()?

Both gmp_randseed and gmp_randseed_ui would, and only for the default MT
generator. No impact on mp*_*.

The current code uses powering modulo a prime; the new code would use a
cryptographic function (XXTEA) which has good randomness properties
according to my tests, and a much greater speed. It consists of a nested
loop applying 6 times a not very expensive formula (only sums, xors and
bit shifts on words) to 622 of the 623 elements of the state array. Much
faster than the powm used, which doesn't even take advantage of the
improved powm algorithm in GMP 5.

That would affect only the state initialization as a function of the
seed, with no effect in cross-platform compatibility itself (barring any
possible bugs).

> One difficulty for us is to check we get similar results on different
> hardware. Since we can run MPFR's "make check" on only one hardware at a time,
> our current approach is to hard-code the results of the random functions
> in our tests.

Well, AFAICS import/export would just be useful to not need a
modification of the tests if the seeding function changes. Instead of
initializing with a seed, the idea would be to initialize by importing a
state. Not a big gain though.

> (I wonder how GMP's "make check" does check reproducibility
> among different hardware.) If the formulas used by the GMP random functions
> were documented, we could write a reference implementation (possibly depending
> on the GMP version).

The seeding function is briefly described above; details are in the
code. Some extractions are performed to "warm up" the generator during
the seeding process.

The generating function is unlikely to change. It is a plain vanilla MT
generator. For mpz_urandomb, bits are extracted from the state words one
at a time, starting with the LSB. Each of the 623 words in the state
array except the last holds 32 valid pseudorandom bits; the last one is
special but I don't remember the details. If all 32 bits of a word are
exhausted, the next word is used; if all words are used, a new MT round
is performed. The extracted bits fill the result starting with the LSB.
I don't remember whether the last word is used, that should be looked up
in the code.

mpz_urandomm extracts the exact number of random bits necessary to
accomodate the given limit, exactly as mpz_urandomb does. If the result
is greater than or equal to the limit, all bits are discarded and a new
extraction is performed. There is a limit to the iteration count to
prevent ill generators from locking GMP.

Hope that helps. I'll be happy to address any doubts.

-- Pedro Gimeno