mpn_mul is embarrassingly slow

Fri Apr 20 17:50:51 UTC 2018

Ciao,

Il Ven, 20 Aprile 2018 1:38 pm, Torbjörn Granlund ha scritto:
> nisse at lysator.liu.se (Niels Möller) writes:
>   Fredrik Johansson <fredrik.johansson at gmail.com> writes:
>
>   > It would be possible to have mpn_mul itself assembly-coded to do
> something
>   > like this:
>   >
>   > case 1x1: ...
>   > case 2x1: ...
>   > case 2x2: ...
>   > generic case, small n: (basecase loop)
>   > generic case, large n: (fall back to calling an mpn_mul_generic
> function
>   > that selects between different algorithms)
>
>   Interesting idea! It's backwards to how assembly code is currently
>   organized, as far as I'm aware we have no cases of assembly routines
>   calling (or jumping to, as would be the case here) complex C code.
>
> I like the idea two.  Cutting down constant time overhead in GMP would
> be a major speedup for most applications.

Interesting, but writing an assembly mpn_mul means writing manyversions...

> And I have never been happy with the mpn_mul, mpn_mul_n, mpn_sqr
> organisation nor the current implementation.  Having said that, I don't
> know how to fix it without new compromises.

On the generic (or no-asm) side, we could at least swap the first branches
in mpn_mul. Currently we have:

  if (un == vn)
    {
      if (up == vp)
	mpn_sqr (prodp, up, un);
      else
	mpn_mul_n (prodp, up, vp, un);
    }
  else if (vn < MUL_TOOM22_THRESHOLD)
    { /* plain schoolbook multiplication */

If we start with the if (vn < MUL_TOOM22_THRESHOLD) branch, we can
(almost) directly consider _mul_basecase for small operands. Of course
this is a slowdown for code that (should we keep on supporting this?)
calls mpn_mul(r,a,n,a,n) instead of mpn_sqr(r,a,n).

Ĝis,
m

-- 
http://bodrato.it/