mpn_mul is embarrassingly slow

Tue Apr 24 19:40:29 UTC 2018

tg at gmplib.org (Torbjörn Granlund) writes:

> nisse at lysator.liu.se (Niels Möller) writes:
>
>   I would prefer the opposite change for GMP7, to have all multiplication
>   functions return, but *not* store, the high limb of the product. Which
>   also should work nicely with tail calls.
>
> I believe this would work nicely for mul_basecase but not for toom or
> fft multiply.  These latter would need extra temp space, or code
> contortions.

Maybe. One would have to try it out to say for sure. I had a quick look
at toom22_mul. The two balanced recursive products (evaluation at 0 and
-1) would immediately store the returned high limb. 

For the third recursive multiply (high parts, interpreted as evaluation
at infinity), it's fairly natural to keep the high limb in a scalar
variable/register, to be returned after final carry propagation.

But I agree there's some potential for contortion: the final carry
propagation now uses MPN_INCR_U and MPN_DECR_U. This gets more awkward
when we want the carry to possibly spill over the edge of the memory
area and propagate into a register. The L(vinf) + H(vinf) addition may
also become less nice, but being the only unbalanced addition (mpn_add,
not mpn_add_n) in this code, it's in some way awkward already.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.