mpn_mul is embarrassingly slow
Torbjörn Granlund
tg at gmplib.org
Tue Apr 24 15:52:00 UTC 2018
nisse at lysator.liu.se (Niels Möller) writes:
I would prefer the opposite change for GMP7, to have all multiplication
functions return, but *not* store, the high limb of the product. Which
also should work nicely with tail calls.
I believe this would work nicely for mul_basecase but not for toom or
fft multiply. These latter would need extra temp space, or code
contortions.
I had to look up what clever things we try to do when vn is small but un
is huge. It seems we try to improve locality by breaking U up in pieces
which hopefully fit the cache.
Yes, that code used to improve things very much at some point. (No CPU
sets MUL_BASECASE_MAX_UN, the default 500 means we add some overhead for
modern CPUs where L1D is much greater than 500 limbs.)
> if (un == vn)
> {
> ! mpn_mul_n (prodp, up, vp, un);
> }
I wonder if calling mpn_mul_n from here still is useful? Given that both
basecase and toom functions support unbalanced operands, and we don't
even have any mpn_toom*_mul_n. Main thing that makes mpn_mul_n nicer is
that thresholds get a lot easier with only a single size.
Perhaps one could omit it without drawbacks. Only timing will tell.
:-)
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list