mpn_mul is embarrassingly slow

Torbjörn Granlund tg at
Sun Apr 22 19:09:34 UTC 2018

"Marco Bodrato" <bodrato at> writes:

  Interesting, but writing an assembly mpn_mul means writing manyversions...

The idea is not to "write mpn_mul in assembly" in the sense to write any
of what is currently in generic/mul.c in assembly.

I suppose as assembly mul.c would look like this:

  if (un == 1)  // vn == 1 too
  else if (((up xor vp) or (un xor vn)) == 0)
    tailcall mpn_sqr
    tailcall mpn_mul_toom22_or_greater_written_in_C
  mpn_mul_basecase code here

On systems where tailcalls are possible (like amd64 ELF systems) this
will be very low-overhead also for sqr.

  On the generic (or no-asm) side, we could at least swap the first branches
  in mpn_mul. Currently we have:

    if (un == vn)
        if (up == vp)
  	mpn_sqr (prodp, up, un);
  	mpn_mul_n (prodp, up, vp, un);
    else if (vn < MUL_TOOM22_THRESHOLD)
      { /* plain schoolbook multiplication */

  If we start with the if (vn < MUL_TOOM22_THRESHOLD) branch, we can
  (almost) directly consider _mul_basecase for small operands. Of course
  this is a slowdown for code that (should we keep on supporting this?)
  calls mpn_mul(r,a,n,a,n) instead of mpn_sqr(r,a,n).

It is surely silly that we don't have any mpn call for when it is known
that the multiplier and multiplicand are distinct.

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list