Improvement to mpz_nextprime
Torbjorn Granlund
tege at swox.com
Thu Jun 22 14:33:17 CEST 2006
> For example:
>
> * The prime
> * An inverse of the prime and some shift constants for fast division
> by the prime
I assume you mean either something like Barrett division or the
algorithm of your paper with Peter Montgomery. In either case,
the values to be stored are precision-dependent, no? Unless one
were looking for a series of primes of the same size (and
usually this means in the same interval, so that one would be
compelled to write his own sieve in that case), it wouldn't make
sense to store these constants -- and, in particular, I
understand it wouldn't make sense to use the algorithm for
division by constants.
Why?
Here and there in GMP, we now divide by prime small numbers, an
operation that takes on the order of 200 plain operations on a modern
CPU.
I really think it makes sense to trade division for a multiply and
some plain operations. Why do you say it doesn't make sense?
The code for initializing the residues will want to do a lot of
dividing, and here tabled preinverses will help a lot.
> A prime can be represented with a byte, representing the distance from
> the previous prime. (Since distances are even between odd primes, we
> can represent a huge table that way, limited by 304599508537...)
Indeed, and this representation is more efficient than a simple
bitmap (no wheel) once primes are larger than about 8k, if I'm
not mistaken (using the aproximation n/(log(n) - 1) for the
prime number theorem). But anyway, once a wheel is implemented,
even a fairly simple wheel, representation by differences would
lose on all practical ranges. Of course, if a sizable amount of
information has to be represented besides the prime itself, then
it doesn't really matter.
What matters still is access times. Both a bitmap and deltas have
their pros and cons.
Truth be told, even a meager 8 KB L1 D-cache such as the P4's
would be enough to sieve expected ranges for 28k-digit
primes. Currently, at this size one is much better off using
PFGW, as I recall. Looking forward, a 32 KB L1 D-cache is a
reasonable assumption, enough for 111k-digit primes. If you
think that's not enough, then the sieve array itself could use a
wheel, although things would complicate a little bit; a 2*3*5*7
= 210 wheel combined with a 32 KB L1 D-cache would be good for
~500k-digit primes.
I'd say to use a 1 Kibyte sieve, or smaller if the range suggests
that. If one doesn't find a prime, one just needs to rerun the sieve
for the next block. (To make sure that mechanism works, test it with
a one byte sieve.)
Actually, I rewrote everything (hey, I was bored...)
Rewrote as compared to what?
--
Torbjörn
More information about the gmp-discuss
mailing list