Double factorial and primorial

Thu Dec 22 12:32:50 CET 2011

nisse at lysator.liu.se (Niels Möller) writes:

  > 95% of the time is spent in heap_remove, unsurprisingly.

  And to decrease the heap work, one could keep the small primes in an
  array outside of the heap (as you suggested). Or use a larger sieving
  table. Or use a more compact representation to fit more numbers in the
  same space (at the cost of some additional bit fiddling).

  Do you have any intuition on what additional speedup one could get with
  such tricks?

I already use a sieving table which is the size of the L1D cache on AMD,
16 times larger than that in the gmp_nextprime code.  Sieving bits (not
bytes) there would help, since the sieving accounts for 0% of the time.
Even if bit fiddling would make that part 10 times slower, it will still
be hard to measure.

I think my original idea about handling small primes separately, would
save a little time.  (I intended that idea for sieving efficiency, but
now I think it is good for reducing heap depth, directly affecting heap
removal speed.)  This will not save very much for large sieving tasks,
but I think it will speed up smaller ones quite well.

One idea I had was to split the heap in to a bunch of heaps, say 64.
The current sieving code would then use a heap at a time, separately.
One may put a prime in any heap, no intelligent criteria are needed.

With 64 heaps, the heap depth would be reduced 6 levels, of course.  We
could also use O(sqrt(n)) heaps each with O(sqrt(n)) elements, n =
Pi(sqrt(N)), N the [current] sieving limit.  This will halve the heap
depth.  The overhead of going through many heaps will be really very
small.

There are a bunch of different heap types, but I am not familiar with
them.  They are "faster" in some sense, but I don't know if they are
faster in teory or in practice.  The most prominent one is the Fibonacci
heap.

The offsets will be very uniformly distributed.  Perhaps a trie
structure would work well?

-- 
Torbjörn