# Improvement to mpz_nextprime

Torbjorn Granlund tege at swox.com
Wed Jun 21 17:38:30 CEST 2006

```Décio Luiz Gazzoni Filho <decio at decpp.net> writes:

There are some storage tricks, of course. For instance, using a
bitmap and a wheel of 2*3*5*7 = 210, one could store primes up to
100k in less than 3 KB of memory. I have a gut feeling that's
enough  for the sieve, as long as people stick to sensible prime
sizes -- for  really huge stuff they're better off using
e.g. PFGW.

I don't think using a wheel is worth the effort.

each prime than just the fact that it is a prime.

For example:

* The prime
* An inverse of the prime and some shift constants for fast division
by the prime

A prime can be represented with a byte, representing the distance from
the previous prime.  (Since distances are even between odd primes, we
can represent a huge table that way, limited by 304599508537...)

> For small enough n, one would include an n dependent number of sieve
> primes, over a limit one need to cap the primes and then run a pseudo
> prime test for the remaining numbers.

Here's an idea for determining the cutoff point of the sieve:
about 1/ p of surviving composites are likely to be sieved out by
a given  prime p; multiply by the time it would take to perform
the sieve, and  whenever this is less than the time a pseudoprime
test in that range  would take, stop sieving and begin PrP
testing. Since sieve lengths  are unlkely to be larger than the
L2 cache, this time could be  estimated with decent precision.

Cache locality is critical these days.  Therefore, one will want to
keep the sieve much smaller than the L2 cache.  If larger number areas
need to be checked, one needs to run the sieve multiple times.

The only remaining question is how to obtain the starting
residues  for the sieve (e.g. in a call to
mpz_nextprime(10^1000), one would  have to compute 10^1000 mod 2,
3, 5, 7, 11, etc.) If subquadratic  division is implemented, then
the best strategy is, I believe:

-Compute m = 2*3*5*7*11*... up until the number is about half the
size of 10^1000;
-Compute r = 10^1000 mod m;
-Break up m in two similarly sized halves m_1 and m_2;
-Compute r_1 = r mod m_1 and r_2 = r mod m_2;
-Proceed recursively until the modulos are prime.

Something like that.  Note that designing and hacking that part of the
code should be almost independent of the rest of the code, and that
for the first version, basic code could be used for computing starting
points.  (The code I sent you has basic code you should reuse.)

The only problem I see with this scheme is memory usage when
accumulating products. Would that be a concern?

I doubt that.  The pseudo prime tests are slow enough to stop people
from computing billion digit pseudo primes with this code.  (Memory
locality is always critical, though.)

I think we want a function variant that can preserve a (residue) state
beteeen calls, since some applications will want to call a nextprime
function over and over.

Possible new interface:

mpz_nextprime -- for compatibility
mpz_pprime_p (n, howsure)
mpz_pprime_next (p, n, howsure)

Functions preserving a state in a struct "ps".  These functions should
be really fast also for small primes, since that should be no problem.
See the code I sent in a previous message.

mpz_pprime_ps_init (ps, n)           -- n is starting point
mpz_pprime_ps_next (p, ps, howsure)  -- next pseudo prime is returned in p
mpz_pprime_ps_clear (ps)

Most probably, mpz_pprime_next should be implemented in terms of the
ps functions.

--
Torbjörn
```