Double factorial and primorial

Torbjorn Granlund tg at gmplib.org
Thu Dec 22 21:23:07 CET 2011

```nisse at lysator.liu.se (Niels Möller) writes:

Then I imagine it would be good to use Marco's layout, with one bit for
each number equal to ±1 (mod 6). (The Erathostenes code I wrote a while
ago uses the easier representation of one bit per odd number).

It turned out my profiling data was incorrect (due to incompatibility
between gcc version and the profiling tools on my system).

The correct number was that 53% of the time was spent in heap_remove,

Now, with improved code, only about 20% of the time is spend in
heap_remove, and almost 80% in the (still) rudimentary siever.

(I improved heap_remove itself, use bit sieving, and use some trickery
for avoiding a bit field extract in the heap_remove loop.)

I can now find all primes < 10^9 in 1.3 s on the Phenom system.

> One idea I had was to split the heap in to a bunch of heaps, say 64.
> The current sieving code would then use a heap at a time, separately.
> One may put a prime in any heap, no intelligent criteria are needed.

So you get lower depth, and better locality. The larger number the
better, until you have so many that a significant proportion of them will
have all elements beyond the end of the current sieving block.

Yep.  There seem to be no drawback.

I'll make the siever 2x faster first, though, since improving the 20%
will not impress much.  :-)

One trick: Use the least few bits of the prime to select which heap it
goes into, then you don't new to allocate any space for those bits in
the heap elements. Could even make a big difference for the 32-bit case,
where I think a 20 bit offset and 10 bit prime is a bit too small for
the range we want to be able to sieve.

Cute.  I'll keep this trick in mind.

There is another trick that can save offset bits: Don't square the prime
initially if it does not fit.  Instead find the largest odd number < p^
that is both divisible by p and fits the offset field.  For *correct*
operation, offsets need only one more bit than the prime field.  These
tricks will together save the day on 32-bit systems.  Using two separate
words <prime,offset> would of course work too, but that's too simple.
:-)

--
Torbjörn
```