Trial-division interface
Torbjorn Granlund
tg at gmplib.org
Thu Apr 8 10:42:09 CEST 2010
nisse at lysator.liu.se (Niels Möller) writes:
I've been thinking a little about what an mpz-level interface for trial
division should look like. One fairly natural interface would be
something like
int
mpz_small_factor_p(unsigned int *f, const mpz_t n, unsigned int b);
This function should return 1 if n contains any prime factor p with p <=
B (or < B, I'm not sure which is the most common convention for the
definition of "B-smooth").
I think one wants a range, not 2..B. There are some algorithms where it
is good to do a little trial dividing, then runs some algorithm-specific
stuff, then do more trial dividing.
(Perhaps also make the smoothness bound of type "long unsigned int".)
If f is non-NULL and n contains such a factor, a non-trivial factor of n
is stored in *f.
Why not return a factor (or 0 if none is found)?
Now, there are a couple of subtleties.
* For a non-trivial factor f, should one require that f is prime, or
that f is the smallest prime factor?
Probably not. (But perhaps provide another function for that,)
* The implementation will use a compiled in list of primes and various
useful related quantities. The size of this list implies a limit on
how large b are supported. There must be some way for an application
to find out how large b it can use.
Hmm. I don't like this. I think we should allow for any divisor
range.
Note that it is not too terrible to compute things on-the-fly. See your
upcoming article in IEEE Computer :-).
* For complete factorization, one would need some way to iterate over all
small prime factors of n.
* For a large bound b, there should be a way to tell GMP precompute a
larger prime table, and then use that for trialdivision. Something
like
gmp_make_primetable(gmp_prime_table_t primes, unsigned int b);
where for small b, primes should bbe set to a reference to the small
compiled in table, while for larger b, a new table should be
allocated and filled out.
We need to understand (as do the users) that such a table becomes
forbiddingly large quote quickly. Pi(10^9) = 50847534. If the table
lives in main memory, it is almost always faster to recompute the magic
values than to retrieve them.
Now, nobody in her/his right mind will ever want to do that much trial
dividing...
* Efficient trial division typically lumps a couple of primes
together, to compute n mod ppp where ppp is a single-limb product
p_k p_{k+1} ... p_{k+j} and. Then for the final such product (say
p_k <= b < p_{k+1}), there's O(1) extra cost of checking
divisibility also by the other factors of ppp. And for typical
applications, it's useful to do that and return any discovered
factor. To allow this behaviour, the function would be specified to
allow some rounding up of the supplied b.
Currently, mod_1_4 is our fastest function for trial division by
single-limb numbers. (For 32bit machines, one quickly can only fit one
prime per limb, BTW.)
When n is large enough to live in main memory, repeated use of mod_1_4
should be avoided, and instead one should accumulate a large number of
primes into a product P, for which gcd(n,P) is computed. There is
nothing asymptotically clever about this, just locality issues.
--
Torbjörn
More information about the gmp-devel
mailing list