mpn_mul is embarrassingly slow
Niels Möller
nisse at lysator.liu.se
Fri Apr 20 19:24:38 UTC 2018
"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:
> On the generic (or no-asm) side, we could at least swap the first branches
> in mpn_mul. Currently we have:
>
> if (un == vn)
> {
> if (up == vp)
> mpn_sqr (prodp, up, un);
> else
> mpn_mul_n (prodp, up, vp, un);
> }
> else if (vn < MUL_TOOM22_THRESHOLD)
> { /* plain schoolbook multiplication */
>
> If we start with the if (vn < MUL_TOOM22_THRESHOLD) branch, we can
> (almost) directly consider _mul_basecase for small operands. Of course
> this is a slowdown for code that (should we keep on supporting this?)
> calls mpn_mul(r,a,n,a,n) instead of mpn_sqr(r,a,n).
I think it still makes sense to recognize up == vp && un == vn and call
mpn_sqr. Not sure how important that is for the smallest sizes, though? I just
checked https://gmplib.org/devel/thres/SQR_BASECASE_THRESHOLD.html, and
it's zero for most machines. For those machines, no matter in which
order we do the tests, mpn_mul(r,a,n,a,n) is going to have some more overhead
than mpn_sqr(r,a,n), and programs should call mpn_sqr directly whenever
possible.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list