Improvement to mpz_nextprime

Wed Jun 21 17:38:30 CEST 2006

Décio Luiz Gazzoni Filho <decio at decpp.net> writes:

  There are some storage tricks, of course. For instance, using a
  bitmap and a wheel of 2*3*5*7 = 210, one could store primes up to
  100k in less than 3 KB of memory. I have a gut feeling that's
  enough  for the sieve, as long as people stick to sensible prime
  sizes -- for  really huge stuff they're better off using
  e.g. PFGW.

I don't think using a wheel is worth the effort.

One will almost surely want to include more information about
each prime than just the fact that it is a prime.

For example:

* The prime
* An inverse of the prime and some shift constants for fast division
  by the prime

A prime can be represented with a byte, representing the distance from
the previous prime.  (Since distances are even between odd primes, we
can represent a huge table that way, limited by 304599508537...)

  > For small enough n, one would include an n dependent number of sieve
  > primes, over a limit one need to cap the primes and then run a pseudo
  > prime test for the remaining numbers.

  Here's an idea for determining the cutoff point of the sieve:
  about 1/ p of surviving composites are likely to be sieved out by
  a given  prime p; multiply by the time it would take to perform
  the sieve, and  whenever this is less than the time a pseudoprime
  test in that range  would take, stop sieving and begin PrP
  testing. Since sieve lengths  are unlkely to be larger than the
  L2 cache, this time could be  estimated with decent precision.

Cache locality is critical these days.  Therefore, one will want to
keep the sieve much smaller than the L2 cache.  If larger number areas
need to be checked, one needs to run the sieve multiple times.

  The only remaining question is how to obtain the starting
  residues  for the sieve (e.g. in a call to
  mpz_nextprime(10^1000), one would  have to compute 10^1000 mod 2,
  3, 5, 7, 11, etc.) If subquadratic  division is implemented, then
  the best strategy is, I believe:

  -Compute m = 2*3*5*7*11*... up until the number is about half the
  size of 10^1000;
  -Compute r = 10^1000 mod m;
  -Break up m in two similarly sized halves m_1 and m_2;
  -Compute r_1 = r mod m_1 and r_2 = r mod m_2;
  -Proceed recursively until the modulos are prime.

Something like that.  Note that designing and hacking that part of the
code should be almost independent of the rest of the code, and that
for the first version, basic code could be used for computing starting
points.  (The code I sent you has basic code you should reuse.)

  The only problem I see with this scheme is memory usage when
  accumulating products. Would that be a concern?

I doubt that.  The pseudo prime tests are slow enough to stop people
from computing billion digit pseudo primes with this code.  (Memory
locality is always critical, though.)

I think we want a function variant that can preserve a (residue) state
beteeen calls, since some applications will want to call a nextprime
function over and over.

Possible new interface:

mpz_nextprime -- for compatibility
mpz_pprime_p (n, howsure)
mpz_pprime_next (p, n, howsure)

Functions preserving a state in a struct "ps".  These functions should
be really fast also for small primes, since that should be no problem.
See the code I sent in a previous message.

mpz_pprime_ps_init (ps, n)           -- n is starting point
mpz_pprime_ps_next (p, ps, howsure)  -- next pseudo prime is returned in p
mpz_pprime_ps_clear (ps)

Most probably, mpz_pprime_next should be implemented in terms of the
ps functions.

-- 
Torbjörn