Please update addaddmul_1msb0.asm to support ABI in mingw64
Torbjörn Granlund
tg at gmplib.org
Thu Oct 7 09:50:03 UTC 2021
nisse at lysator.liu.se (Niels Möller) writes:
Here's a sketch of a loop, that should work for both addaddmul_1msb0 and
addsubmul_1msb0:
L(top):
mov (ap, n, 8), %rdx
mulx %r8, alo, hi
adox ahi, alo
mov hi, ahi C 2-way unroll.
adox zero, ahi C Clears O
mov (bp, n), %rdx
mulx %r9, blo, hi
adox bhi, blo
mov hi, bhi
adox zero, bhi C clears O
adc blo, alo C Or sbb, for addsubmul_1msb0
mov alo, (rp, n, 8)
inc n
jnz top
L(done):
adc bhi, ahi C No carry out, thanks to msb0
mov ahi, %rax C Return value
Neat!
Some unrolling would save several instructions:
Put r8 (aka u0) in rdx over k ways of unrolling. Supply ap[...] limbs
to mulx directly. Then accumulate with an adox chain, reducing the adox
zero need. Same for r9/v0.
(BTW, do I get operand order right for mulx? I'm confused by the docs
that use the generally different intel conventions).
Your use looks right.
Now, question is if it can beat mul_1 + addmul_1. I don't know.
It surely has potential.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list