Please update addaddmul_1msb0.asm to support ABI in mingw64
nisse at lysator.liu.se
Wed Oct 6 15:57:15 UTC 2021
Torbjörn Granlund <tg at gmplib.org> writes:
> OK, so the code is 3-ways unrolled. That's always a bit inconvenient
> and tends to cause some code bloat.
I don't remember at all why I did it that way. Maybe it was faster than
two-way, and too few registers for 4-way?
Do you expect one really needs to go beyond 2-way for this type of loop,
where each iteration does a fair amout of work?
> * Accumulate differently, say 4 consecutive limbs at a time, with carry
> being alive. That will require more registers for sure. By using
> adcx and adox, one may accumulate to the same registers in two chains
With adox/adcx, I think it should work to compute both of a U and b V
one (or a few) limbs at a time, using O-flag as a short-lived carry flag,
and longer lived register to hold the high limb. And then add the
results together using C as a long-lived carry, living between iterations.
Is there a neat way to clear the O flag without clobbering C ?
Another thing that I think could give a substantial speedup for gcd in
the lehmer range, is to implement addsubmul_1msb0. If we have a single
carry flag, each iteration would take as input
a, b (both < B/2), two full limbs u, v, and a carry limb which is a two's
complement signed number. Then compute
a u + b v + c
as a 2-limb two's complement number (fits, thank's to restrictions on a,
b). Store low half, high half becomes the c input to next iteration.
If we have adox/adcx, use same strategy as suggested for
addaddmul_1msb0, but subtract rather than add in the chain with long
> I suspect te present code is far from optimal on modern x86 CPUs which
> can sustain 1 64x64->128 multiply per cycle. I feel confident that we
> could reach close to 1 c/l.
That's the fun thing with GMP, there's always ways to improve the code
you wrote some years ago ;-)
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel