Please update addaddmul_1msb0.asm to support ABI in mingw64

Niels Möller nisse at
Wed Oct 6 15:57:15 UTC 2021

Torbjörn Granlund <tg at> writes:

> OK, so the code is 3-ways unrolled.  That's always a bit inconvenient
> and tends to cause some code bloat.

I don't remember at all why I did it that way. Maybe it was faster than
two-way, and too few registers for 4-way?

Do you expect one really needs to go beyond 2-way for this type of loop,
where each iteration does a fair amout of work?

> * Accumulate differently, say 4 consecutive limbs at a time, with carry
>   being alive.  That will require more registers for sure.  By using
>   adcx and adox, one may accumulate to the same registers in two chains
>   semi-simultaneously.

With adox/adcx, I think it should work to compute both of a U and b V
one (or a few) limbs at a time, using O-flag as a short-lived carry flag,
and longer lived register to hold the high limb. And then add the
results together using C as a long-lived carry, living between iterations.

Is there a neat way to clear the O flag without clobbering C ?

Another thing that I think could give a substantial speedup for gcd in
the lehmer range, is to implement addsubmul_1msb0. If we have a single
carry flag, each iteration would take as input

a, b (both < B/2), two full limbs u, v, and a carry limb which is a two's
complement signed number. Then compute

 a u + b v + c

as a 2-limb two's complement number (fits, thank's to restrictions on a,
b). Store low half, high half becomes the c input to next iteration.

If we have adox/adcx, use same strategy as suggested for
addaddmul_1msb0, but subtract rather than add in the chain with long
lived carry.

> I suspect te present code is far from optimal on modern x86 CPUs which
> can sustain 1 64x64->128 multiply per cycle.  I feel confident that we
> could reach close to 1 c/l.

That's the fun thing with GMP, there's always ways to improve the code
you wrote some years ago ;-)


Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.

More information about the gmp-devel mailing list