mpn_mul is embarrassingly slow
Torbjörn Granlund
tg at gmplib.org
Sun Apr 22 19:09:34 UTC 2018
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
Interesting, but writing an assembly mpn_mul means writing manyversions...
The idea is not to "write mpn_mul in assembly" in the sense to write any
of what is currently in generic/mul.c in assembly.
I suppose as assembly mul.c would look like this:
if (un == 1) // vn == 1 too
umul_ppmm
else if (((up xor vp) or (un xor vn)) == 0)
tailcall mpn_sqr
else if (ABOVE_THRESHOLD (vm, MUL_TOOM22_THRESHOLD))
tailcall mpn_mul_toom22_or_greater_written_in_C
mpn_mul_basecase code here
On systems where tailcalls are possible (like amd64 ELF systems) this
will be very low-overhead also for sqr.
On the generic (or no-asm) side, we could at least swap the first branches
in mpn_mul. Currently we have:
if (un == vn)
{
if (up == vp)
mpn_sqr (prodp, up, un);
else
mpn_mul_n (prodp, up, vp, un);
}
else if (vn < MUL_TOOM22_THRESHOLD)
{ /* plain schoolbook multiplication */
If we start with the if (vn < MUL_TOOM22_THRESHOLD) branch, we can
(almost) directly consider _mul_basecase for small operands. Of course
this is a slowdown for code that (should we keep on supporting this?)
calls mpn_mul(r,a,n,a,n) instead of mpn_sqr(r,a,n).
It is surely silly that we don't have any mpn call for when it is known
that the multiplier and multiplicand are distinct.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list