mpn_mul is embarrassingly slow

Fri Apr 20 19:24:38 UTC 2018

"Marco Bodrato" <bodrato at mail.dm.unipi.it> writes:

> On the generic (or no-asm) side, we could at least swap the first branches
> in mpn_mul. Currently we have:
>
>   if (un == vn)
>     {
>       if (up == vp)
> 	mpn_sqr (prodp, up, un);
>       else
> 	mpn_mul_n (prodp, up, vp, un);
>     }
>   else if (vn < MUL_TOOM22_THRESHOLD)
>     { /* plain schoolbook multiplication */
>
> If we start with the if (vn < MUL_TOOM22_THRESHOLD) branch, we can
> (almost) directly consider _mul_basecase for small operands. Of course
> this is a slowdown for code that (should we keep on supporting this?)
> calls mpn_mul(r,a,n,a,n) instead of mpn_sqr(r,a,n).

I think it still makes sense to recognize up == vp && un == vn and call
mpn_sqr. Not sure how important that is for the smallest sizes, though? I just
checked https://gmplib.org/devel/thres/SQR_BASECASE_THRESHOLD.html, and
it's zero for most machines. For those machines, no matter in which
order we do the tests, mpn_mul(r,a,n,a,n) is going to have some more overhead
than mpn_sqr(r,a,n), and programs should call mpn_sqr directly whenever
possible.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.