mpn_mul is embarrassingly slow
Marco Bodrato
bodrato at mail.dm.unipi.it
Fri Apr 20 17:50:51 UTC 2018
Ciao,
Il Ven, 20 Aprile 2018 1:38 pm, Torbjörn Granlund ha scritto:
> nisse at lysator.liu.se (Niels Möller) writes:
> Fredrik Johansson <fredrik.johansson at gmail.com> writes:
>
> > It would be possible to have mpn_mul itself assembly-coded to do
> something
> > like this:
> >
> > case 1x1: ...
> > case 2x1: ...
> > case 2x2: ...
> > generic case, small n: (basecase loop)
> > generic case, large n: (fall back to calling an mpn_mul_generic
> function
> > that selects between different algorithms)
>
> Interesting idea! It's backwards to how assembly code is currently
> organized, as far as I'm aware we have no cases of assembly routines
> calling (or jumping to, as would be the case here) complex C code.
>
> I like the idea two. Cutting down constant time overhead in GMP would
> be a major speedup for most applications.
Interesting, but writing an assembly mpn_mul means writing manyversions...
> And I have never been happy with the mpn_mul, mpn_mul_n, mpn_sqr
> organisation nor the current implementation. Having said that, I don't
> know how to fix it without new compromises.
On the generic (or no-asm) side, we could at least swap the first branches
in mpn_mul. Currently we have:
if (un == vn)
{
if (up == vp)
mpn_sqr (prodp, up, un);
else
mpn_mul_n (prodp, up, vp, un);
}
else if (vn < MUL_TOOM22_THRESHOLD)
{ /* plain schoolbook multiplication */
If we start with the if (vn < MUL_TOOM22_THRESHOLD) branch, we can
(almost) directly consider _mul_basecase for small operands. Of course
this is a slowdown for code that (should we keep on supporting this?)
calls mpn_mul(r,a,n,a,n) instead of mpn_sqr(r,a,n).
Ĝis,
m
--
http://bodrato.it/
More information about the gmp-devel
mailing list