Trial-division interface

Fri Apr 9 14:17:33 CEST 2010

Torbjorn Granlund <tg at gmplib.org> writes:

> Your suggested interface is to store a factor through an (unsigned int
> *).  I suggest that we inmstead return it.

Fine with me, if you think the additional work (which is unnecessary for
callers that only care about existence of the factor, not the particular
value) is insignificant. With the current trialdiv implementation, the
additional work would be one binvert, which I guess may be a small price
to pay for a simpler interface.

BTW, I think the current mpz_gcd_ui interface is suboptimal in a similar
way, it would have been better to disallow op2 == 0, always return the
gcd, and get rid of the rop pointer argument.

> (libraries should really avoid using global variables).
>   
> For any other reasns than thread safety?

Libraries should not have global shared state, and this is relevant also
for single threaded applications, where different and otherewise
unrelated modules use the same library.

Shared read-only data is clearly ok, and I guess shared write-once data
should be ok as well if the initialization, as you outlined it, doesn't
cause any trouble. I imagine it gets a little trickier to make
initialization and allocation thread safe for a prime table of a size
which is determined (and might grow) during runtime.

>   Are you sure? Say we have k limb-sized prime-products p_j, and an n-limb
>   input, and k <= n. Computing mod_1 k times obviously is O(n k).
>   Multiplying all the p_k together (using a balanced tree) is O(M(k)).
>   Computing the gcd is one initial division with quotient of size n - k
>   and remainder of size k, which takes O(M(n-k) + O(M(k))), and rest of
>   the gcd computation is M(k) lg k. It seems like this should be faster
>   for some choices of n and k (a necessary condition is that M(n-k) < c n
>   k for some constant c, and for k = n we get O(n^2) vs O(M(n) lg n)).
>   (Splitting the resulting gcd into prime factors implies some additional
>   cost, but that's not always needed).
>   
> There is some confusion of n, the number which we are processing, and
> its size, log(n) above.

In my paragraph above, n denotes the size of the input, and as far as I
see it's consistently used.

> My O(log(n) k) algorithm would naively multiply a few hundred primes
> together and divide n (using Hensel division) by these products.

It's not clear to me how to use Hensel division here. If we extend the
same trick used to test divisibility of single limb numbers, we need to
compute a Hensel quotient of the same size as n, and then we will find
out fairly easily if n is divisible by the prime product, but not if
it's divisible by just one or a few of the primes. So I guess you're
thinking of something different.

> It is only attractive for very large n. It is only better than mod_1
> because it accesses n fewer times.

For my current application, n will be at most a few thousand bits.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.