Trial-division interface

Thu Apr 8 11:47:16 CEST 2010

Torbjorn Granlund <tg at gmplib.org> writes:

> (Perhaps also make the smoothness bound of type "long unsigned int".)

Agreed.

>   If f is non-NULL and n contains such a factor, a non-trivial factor of n
>   is stored in *f.
>   
> Why not return a factor (or 0 if none is found)?

Because if we require that it's a prime factor, it might be a little
extra work, depending on how the trialdivision is done (say you compute
gcd(n, ppp) where ppp is a product of some primes. Then when g > 1,
there's some extra work needed to split out a prime factor of g. And
even if we don't require that the produced factor is prime, we may use
multiple limbs for ppp and then g may be larger than a limb so it has to
be split for size reasons.

So *if* we can rule out interface and implementation variants where
producing a correct single-limb factor is no significant extra work,
then I agree it's simpler to just return it.

>     * The implementation will use a compiled in list of primes and various
>       useful related quantities. The size of this list implies a limit on
>       how large b are supported. There must be some way for an application
>       to find out how large b it can use.
>
> Hmm.  I don't like this.  I think we should allow for any divisor
> range.

> We need to understand (as do the users) that such a table becomes
> forbiddingly large quote quickly.  Pi(10^9) = 50847534.  If the table
> lives in main memory, it is almost always faster to recompute the magic
> values than to retrieve them.

My thinking was that

(i)   A compiled in list should be limited to maybe 10 KB or so. (If even
      this is typically faster to recompute than to read from disk, I
      guess it should be even smaller).

(ii)  It can make sense for some applications to spend several MB of
      memory on a prime list, and if such a list is computed, the
      application should be able to reuse it.

(iii) And I don't quite like having GMP compute the such a list
      automatically and then store it in some global variable (libraries
      should really avoid using global variables).

But if it's cheap enough to recompute the primes as needed (including
various needed inverses), then this issue disappears. One can have a
compiled in list of primes up to, say, 2^12, and then use this for
sieving up to 2^24 if needed, or whatever limit is reasonable.

> Now, nobody in her/his right mind will ever want to do that much trial
> dividing...

I wonder if a large prime list is useful for other purposes else. Pari
uses a primelimit of 500000 by default, and it can be increased (but I
don't know what the limits are). The obvious application is the function
to return the j:th prime. Might also be useful for returning a random
prime of a given bitsize (basecase of Maurer's algorithm). But I don't
quite understand the involved complexities.

> When n is large enough to live in main memory, repeated use of mod_1_4
> should be avoided, and instead one should accumulate a large number of
> primes into a product P, for which gcd(n,P) is computed.  There is
> nothing asymptotically clever about this, just locality issues.

Are you sure? Say we have k limb-sized prime-products p_j, and an n-limb
input, and k <= n. Computing mod_1 k times obviously is O(n k).
Multiplying all the p_k together (using a balanced tree) is O(M(k)).
Computing the gcd is one initial division with quotient of size n - k
and remainder of size k, which takes O(M(n-k) + O(M(k))), and rest of
the gcd computation is M(k) lg k. It seems like this should be faster
for some choices of n and k (a necessary condition is that M(n-k) < c n
k for some constant c, and for k = n we get O(n^2) vs O(M(n) lg n)).
(Splitting the resulting gcd into prime factors implies some additional
cost, but that's not always needed).

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.