Random number generator in mpz_millerrabin

pggimeno@wanadoo.es pggimeno@wanadoo.es
Fri, 13 Dec 2002 01:18:40 +0100


Gary Wong wrote:

>In mpz_millerrabin(),
> there is a call to gmp_randinit_default (i.e. the Mersenne Twister init
> function), which is then used for only a handful of calls to =
mpz_urandom
> before the state is discarded again.  On my system, this means that a
> call to mpz_millerrabin() takes over 30ms, whereas 16 repititions on a
> 128-bit prime take only 1ms if a weaker 128-bit LCG is used instead.

The problem lies completely within Mersenne Twister's seeding function, =
which
currently performs a powering modulo 2^19937-20023. The powering function=
 has
been improved as of Dec 07 to avoid division, but it is still slow for =
many
purposes including this. There are two approaches to the problem: one is =
to
modify mpz_millerrabin and the other is to modify MT's seeding function. =
Taking
both would get best results, of course.

Modifying the API call of mpz_millerrabin is directly discarded. Using a =
global
variable for the random context would break reentrancy, which is not =
desirable
either. It's possible to implement a new function accepting a random =
context as
one of its arguments, which was already discussed some time ago, but =
that's not
very good as a solution to this concrete problem as I see it.

One possibility is to initialize the random context inside the function =
with a
known buffer. There are two drawbacks: one is that the numbers generated =
within
the function will be almost the same in each call, and the other is that
mpz_millerrabin should know about the structure of MT's context. The =
former is
not too big a trouble since that's already happening; the latter can be =
avoided
if a new MT initialization function is implemented that accepts a buffer =
as an
argument. This is the approach I personally like the most, but not =
forgetting
that mpz_millerrabin is making the primality test always with basically =
the same
numbers (bases?). This can't be solved (except by breaking reentrancy) =
using
just the current functions; a new one is needed.

Another `quick and dirty' possibility is to make mpz_millerrabin use
gmp_randinit_lc_2exp instead of gmp_randinit_default.

Of course, the MT seeding function speed should be improved if possible, =
but
how? The current initialization function has two properties which are =
quite
desirable in this generator:

1. The mixing of values in the buffer make it unlikely to get two =
sequences
which seem correlated if they use different seeds. Correlation among =
sequences
obtained with different seeds was a problem in the previous versions of =
MT; see
the homepage <http://www.math.keio.ac.jp/~matumoto/emt.html> for details.

2. Every value between 0 and 2^19937-20028 inclusive is a valid seed, and=
 each
produces a different sequence. Since the state space of the generator is
2^19937-1 (all combinations of 19937 bits except all-zeros are valid), =
this
means that almost all (exactly all but 20026) possible sequences can be
generated.

These goals are obtained by calculating seed^1074888996 mod =
(2^19937-20023)
(with some further adjustments to ensure that the result won't be zero) =
and
using the result in chunks of 32 bits each to fill the buffer. But that =
powering
operation is too costly in terms of speed.

Initialization with a lower seed space (say seeds in range 0 to 2^54-1) =
would be
possible and perhaps desirable; some good and fast 32-bit PRNGs can be =
used. A
drawback is that only a subset of the possible buffer values would then =
be
available, and it would be harder to give guarantees that no two =
different seeds
produce the same sequence.

Based on this, my suggestion is to apply all of the following:

- Reduce the seed space in MT to, say, 2^54 seeds (the number is a =
consequence
of a concrete algorithm I have in mind able to handle 54 bits for the =
seed).
That's a big downgrading of the current seed space and is for this reason=
 the
most arguable point.

- Implement a new gmp_randinit_mt_buff (suggested name) function that =
accepts an
MT buffer as a parameter.

- Replace the gmp_randinit_default in mpz_millerrabin with a call to
gmp_randinit_mt_buff with a pre-defined buffer. Of course if the default
algorithm is ever changed this call should probably be replaced again.

There's another possibility. As mpz_millerrabin is not calling =
gmp_randseed, the
MT initialization function could initialize its state with a predefined =
buffer
instead of calling the seeding function internally. In this way the new
gmp_randinit_mt_buff function is avoided and mpz_millerrabin can keep =
using
gmp_randinit_default which is even better.

Of course I'm open to comments, suggestions, improvements etc.

  Pedro Gimeno