Please update addaddmul_1msb0.asm to support ABI in mingw64
Niels Möller
nisse at lysator.liu.se
Wed Oct 6 19:54:56 UTC 2021
nisse at lysator.liu.se (Niels Möller) writes:
> If we have adox/adcx, use same strategy as suggested for
> addaddmul_1msb0, but subtract rather than add in the chain with long
> lived carry.
Here's a sketch of a loop, that should work for both addaddmul_1msb0 and
addsubmul_1msb0:
L(top):
mov (ap, n, 8), %rdx
mulx %r8, alo, hi
adox ahi, alo
mov hi, ahi C 2-way unroll.
adox zero, ahi C Clears O
mov (bp, n), %rdx
mulx %r9, blo, hi
adox bhi, blo
mov hi, bhi
adox zero, bhi C clears O
adc blo, alo C Or sbb, for addsubmul_1msb0
mov alo, (rp, n, 8)
inc n
jnz top
L(done):
adc bhi, ahi C No carry out, thanks to msb0
mov ahi, %rax C Return value
(BTW, do I get operand order right for mulx? I'm confused by the docs
that use the generally different intel conventions).
Note that in this form, I think we could allow full limb inputs (%r8,
%r9), except that we would get a final carry, and we'd need to return a
65-bit value.
For the addadd case, this could be simplified by adding ahi and bhi
together early (since there can be no overflow), eliminating a few of
the adox instructions.
Now, question is if it can beat mul_1 + addmul_1. I don't know.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list