Trial-division interface

Fri Apr 9 11:41:37 CEST 2010

nisse at lysator.liu.se (Niels Möller) writes:

  >   If f is non-NULL and n contains such a factor, a non-trivial factor of n
  >   is stored in *f.
  >   
  > Why not return a factor (or 0 if none is found)?

  Because if we require that it's a prime factor, it might be a little
  extra work, depending on how the trialdivision is done (say you compute
  gcd(n, ppp) where ppp is a product of some primes. Then when g > 1,
  there's some extra work needed to split out a prime factor of g. And
  even if we don't require that the produced factor is prime, we may use
  multiple limbs for ppp and then g may be larger than a limb so it has to
  be split for size reasons.

Your suggested interface is to store a factor through an (unsigned int
*).  I suggest that we inmstead return it.

I buy the argument that while the prime factors are small, a found
factor might be larger.  But if we are to return composite factors, then
we have use as mpz_t for the factor

  So *if* we can rule out interface and implementation variants where
  producing a correct single-limb factor is no significant extra work,
  then I agree it's simpler to just return it.

  >     * The implementation will use a compiled in list of primes and various
  >       useful related quantities. The size of this list implies a limit on
  >       how large b are supported. There must be some way for an application
  >       to find out how large b it can use.
  >
  > Hmm.  I don't like this.  I think we should allow for any divisor
  > range.

  > We need to understand (as do the users) that such a table becomes
  > forbiddingly large quote quickly.  Pi(10^9) = 50847534.  If the table
  > lives in main memory, it is almost always faster to recompute the magic
  > values than to retrieve them.

  My thinking was that

  (i)   A compiled in list should be limited to maybe 10 KB or so. (If even
        this is typically faster to recompute than to read from disk, I
        guess it should be even smaller).

Or actually computed?  (It is actually possible to compute a globally
visible table without thread synchronisation problems, since such writes
will occur in the same order each time.  If a second [third,...] thread
decides it needs to be computed while a first thread is in the middle of
computing it, it will fill out the same values.  One will need a flag
for "table initialised".)

  (ii)  It can make sense for some applications to spend several MB of
        memory on a prime list, and if such a list is computed, the
        application should be able to reuse it.

OK.  As long as we don't lure people into believeing that trial our
division is the new ECM.  :-)

  (iii) And I don't quite like having GMP compute the such a list
        automatically and then store it in some global variable (libraries
        should really avoid using global variables).

For any other reasns than thread safety?

  > When n is large enough to live in main memory, repeated use of mod_1_4
  > should be avoided, and instead one should accumulate a large number of
  > primes into a product P, for which gcd(n,P) is computed.  There is
  > nothing asymptotically clever about this, just locality issues.

  Are you sure? Say we have k limb-sized prime-products p_j, and an n-limb
  input, and k <= n. Computing mod_1 k times obviously is O(n k).
  Multiplying all the p_k together (using a balanced tree) is O(M(k)).
  Computing the gcd is one initial division with quotient of size n - k
  and remainder of size k, which takes O(M(n-k) + O(M(k))), and rest of
  the gcd computation is M(k) lg k. It seems like this should be faster
  for some choices of n and k (a necessary condition is that M(n-k) < c n
  k for some constant c, and for k = n we get O(n^2) vs O(M(n) lg n)).
  (Splitting the resulting gcd into prime factors implies some additional
  cost, but that's not always needed).

There is some confusion of n, the number which we are processing, and
its size, log(n) above.

My thinking is that there is room for a O(log(n) k) algorithm which
does not use repeated mod_1, I agree that it is straightforward to
design a o(log(n) k) algorithm.

My O(log(n) k) algorithm would naively multiply a few hundred primes
together and divide n (using Hensel division) by these products.  It is
only attractive for very large n.  It is only better than mod_1 because
it accesses n fewer times.

I think a prospective "mpnv_mod_1_4" would be even better, which takes a
vector of primes, and divides the same n by all these primes.  This
would just traverse n once, and as long as the primes list size is not
exaggerated, will have great cache utilisation.

-- 
Torbjörn