mpz reuse test takes too much time

Sun Jan 1 20:03:27 UTC 2017

nisse at lysator.liu.se (Niels Möller) writes:

> In the case
>
>   mpn_gcdext(G, S, NULL, A, B)
>
> with size(A) < size(B), IIRC we currently handle that by swapping
> arguments, calling mpn_gcdext to compute T (even though the caller
> didn't need that, and produce S with an multiplication using multiply +
> exact division.

I've looked a bit more at this, and it's not at clear to me what
strategy one should use.

The cofactors satisfy

  G = S A + T B,

where S is the same order of magnitude as B/G, and T is of the same
order of magnitude as A/G. For simplicity, start with the case of small
G so we don't have to care about its effect on the sizes, and assume
that B is n limbs, A is m limbs, and n > m. Then we'd also expect S to
be n limbs and T to be m limbs.

Now we have two possible strategies: 

1. Call mpn_gcdext (B, A), which produces T. And compute S as

     S = (G - T B) / A

   that's one n x m multiply for T B, and one (n + m) / m exact division.

2. First do a division up front, 

     B = Q A + R

   and call mpn_gcdext(A, R). We then get a cofactor S', of m limbs. But
   we also need the other cofactor T', because the wanted cofactor S is

     S = S' - Q T'

   That's one (n - m) x m multiply. To get T', we have to use

     T' = (G - S' A) / R

   Which is a m x m multiply and an 2m / m exact division. I had a quick
   look at also substituting Q = (B - R)/A into the equations, but that
   didn't lead to any simplifications.

   (And we don't count the cost of the initial division, bacause it will
   be needed for strategy (1) too, done by mpn_gcdext).

So the question is, which cost is smallest,

   MUL(n, m) + DIVE(n + m, m), or 

   MUL(m, m) + MUL(n - m, m) + DIVE(2m, m) ?

With subquadratic multiplication, I'd expect 

   MUL(n, m) < MUL(m, m) + MUL(n-m, m)

or am I confused? But DIVE(n + m, m) > DIVE(2m, m), since the quotient
is larger.

And a third strategy could be to extend mpn_gcdext to support A < B, and
hence compute the larger cofactor directly. But I'd guess that would be
more work, since gcdext uses a quadratic algorithm for much larger sizes
than for multiplication and division.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.