# Improvement to mpz_nextprime

Torbjorn Granlund tege at swox.com
Thu Jun 22 14:33:17 CEST 2006

```  > For example:
>
> * The prime
> * An inverse of the prime and some shift constants for fast division
>   by the prime

I assume you mean either something like Barrett division or the
algorithm of your paper with Peter Montgomery. In either case,
the  values to be stored are precision-dependent, no? Unless one
were  looking for a series of primes of the same size (and
usually this  means in the same interval, so that one would be
compelled to write  his own sieve in that case), it wouldn't make
sense to store these  constants -- and, in particular, I
understand it wouldn't make sense  to use the algorithm for
division by constants.

Why?

Here and there in GMP, we now divide by prime small numbers, an
operation that takes on the order of 200 plain operations on a modern
CPU.

I really think it makes sense to trade division for a multiply and
some plain operations.  Why do you say it doesn't make sense?

The code for initializing the residues will want to do a lot of
dividing, and here tabled preinverses will help a lot.

> A prime can be represented with a byte, representing the distance from
> the previous prime.  (Since distances are even between odd primes, we
> can represent a huge table that way, limited by 304599508537...)

Indeed, and this representation is more efficient than a simple
bitmap (no wheel) once primes are larger than about 8k, if I'm
not  mistaken (using the aproximation n/(log(n) - 1) for the
prime number  theorem). But anyway, once a wheel is implemented,
even a fairly  simple wheel, representation by differences would
lose on all  practical ranges. Of course, if a sizable amount of
information has  to be represented besides the prime itself, then
it doesn't really  matter.

What matters still is access times.  Both a bitmap and deltas have
their pros and cons.

Truth be told, even a meager 8 KB L1 D-cache such as the P4's
would  be enough to sieve expected ranges for 28k-digit
primes. Currently,  at this size one is much better off using
PFGW, as I recall. Looking  forward, a 32 KB L1 D-cache is a
reasonable assumption, enough for  111k-digit primes. If you
think that's not enough, then the sieve  array itself could use a
wheel, although things would complicate a  little bit; a 2*3*5*7
= 210 wheel combined with a 32 KB L1 D-cache  would be good for
~500k-digit primes.

I'd say to use a 1 Kibyte sieve, or smaller if the range suggests
that.  If one doesn't find a prime, one just needs to rerun the sieve
for the next block.  (To make sure that mechanism works, test it with
a one byte sieve.)

Actually, I rewrote everything (hey, I was bored...)

Rewrote as compared to what?

--
Torbjörn
```