mpn_mul is embarrassingly slow

Fri Apr 20 11:38:59 UTC 2018

nisse at lysator.liu.se (Niels Möller) writes:

  Fredrik Johansson <fredrik.johansson at gmail.com> writes:

  > It would be possible to have mpn_mul itself assembly-coded to do something
  > like this:
  >
  > case 1x1: ...
  > case 2x1: ...
  > case 2x2: ...
  > generic case, small n: (basecase loop)
  > generic case, large n: (fall back to calling an mpn_mul_generic function
  > that selects between different algorithms)

  Interesting idea! It's backwards to how assembly code is currently
  organized, as far as I'm aware we have no cases of assembly routines
  calling (or jumping to, as would be the case here) complex C code.

I like the idea two.  Cutting down constant time overhead in GMP would
be a major speedup for most applications.

And I have never been happy with the mpn_mul, mpn_mul_n, mpn_sqr
organisation nor the current implementation.  Having said that, I don't
know how to fix it without new compromises.

But I suppose nothing stops us from implementing a mdep variant of any
of these functions; it would be picked up automatically by GMP's
configure.  (What would be hard is having a mdep (say) mpn_mul which
would want to drop out to the generic mpn_mul; when an mdep mpn_mul
exists GMP's configure will drop the generic variant.  This is not too
hard to solve within the current configure system.)

We already make calls in asm (see the CALL macro definition and
invocations for x86_64 for an example).  Calling C from asm is no
different.  We do not have any jump-to-other routine things; that might
need special handling on certain platforms.

  Such an assembly routine would need access to the threshold between
  basecase and generic, which in the case of fat builds isn't a compile
  time constant. 

And for tuning...

-- 
Torbjörn
Please encrypt, key id 0xC8601622