# Improvement to mpz_nextprime

Décio Luiz Gazzoni Filho decio at decpp.net
Thu Jun 22 04:55:50 CEST 2006

```On Jun 21, 2006, at 12:38 PM, Torbjorn Granlund wrote:

> Décio Luiz Gazzoni Filho <decio at decpp.net> writes:
>
>   There are some storage tricks, of course. For instance, using a
>   bitmap and a wheel of 2*3*5*7 = 210, one could store primes up to
>   100k in less than 3 KB of memory. I have a gut feeling that's
>   enough  for the sieve, as long as people stick to sensible prime
>   sizes -- for  really huge stuff they're better off using
>   e.g. PFGW.
>
> I don't think using a wheel is worth the effort.

Now that the code is already written, I think it is (:

> each prime than just the fact that it is a prime.
>
> For example:
>
> * The prime
> * An inverse of the prime and some shift constants for fast division
>   by the prime

I assume you mean either something like Barrett division or the
algorithm of your paper with Peter Montgomery. In either case, the
values to be stored are precision-dependent, no? Unless one were
looking for a series of primes of the same size (and usually this
means in the same interval, so that one would be compelled to write
his own sieve in that case), it wouldn't make sense to store these
constants -- and, in particular, I understand it wouldn't make sense
to use the algorithm for division by constants.

> A prime can be represented with a byte, representing the distance from
> the previous prime.  (Since distances are even between odd primes, we
> can represent a huge table that way, limited by 304599508537...)

Indeed, and this representation is more efficient than a simple
bitmap (no wheel) once primes are larger than about 8k, if I'm not
mistaken (using the aproximation n/(log(n) - 1) for the prime number
theorem). But anyway, once a wheel is implemented, even a fairly
simple wheel, representation by differences would lose on all
practical ranges. Of course, if a sizable amount of information has
to be represented besides the prime itself, then it doesn't really
matter.

>
>> For small enough n, one would include an n dependent number of sieve
>> primes, over a limit one need to cap the primes and then run a pseudo
>> prime test for the remaining numbers.
>
>   Here's an idea for determining the cutoff point of the sieve:
>   about 1/ p of surviving composites are likely to be sieved out by
>   a given  prime p; multiply by the time it would take to perform
>   the sieve, and  whenever this is less than the time a pseudoprime
>   test in that range  would take, stop sieving and begin PrP
>   testing. Since sieve lengths  are unlkely to be larger than the
>   L2 cache, this time could be  estimated with decent precision.
>
> Cache locality is critical these days.  Therefore, one will want to
> keep the sieve much smaller than the L2 cache.  If larger number areas
> need to be checked, one needs to run the sieve multiple times.

Truth be told, even a meager 8 KB L1 D-cache such as the P4's would
be enough to sieve expected ranges for 28k-digit primes. Currently,
at this size one is much better off using PFGW, as I recall. Looking
forward, a 32 KB L1 D-cache is a reasonable assumption, enough for
111k-digit primes. If you think that's not enough, then the sieve
array itself could use a wheel, although things would complicate a
little bit; a 2*3*5*7 = 210 wheel combined with a 32 KB L1 D-cache
would be good for ~500k-digit primes.

>   The only remaining question is how to obtain the starting
>   residues  for the sieve (e.g. in a call to
>   mpz_nextprime(10^1000), one would  have to compute 10^1000 mod 2,
>   3, 5, 7, 11, etc.) If subquadratic  division is implemented, then
>   the best strategy is, I believe:
>
>   -Compute m = 2*3*5*7*11*... up until the number is about half the
>   size of 10^1000;
>   -Compute r = 10^1000 mod m;
>   -Break up m in two similarly sized halves m_1 and m_2;
>   -Compute r_1 = r mod m_1 and r_2 = r mod m_2;
>   -Proceed recursively until the modulos are prime.
>
> Something like that.  Note that designing and hacking that part of the
> code should be almost independent of the rest of the code, and that
> for the first version, basic code could be used for computing starting
> points.  (The code I sent you has basic code you should reuse.)
>

Actually, I rewrote everything (hey, I was bored...) and for now I'm
using the simplest code possible. I should do some profiling and
figure out where are the bottlenecks, since the improvement (even
with manual parameter tuning) isn't that great -- cutting time by
half was the best I could do.

Once I have something worth showing, I'll attach my code so others
can look at it, suggest improvements and perhaps include it in GMP.

Décio

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part