Please update addaddmul_1msb0.asm to support ABI in mingw64

Thu Oct 7 09:50:03 UTC 2021

nisse at lysator.liu.se (Niels Möller) writes:

  Here's a sketch of a loop, that should work for both addaddmul_1msb0 and
  addsubmul_1msb0:

  L(top):
  	mov	(ap, n, 8), %rdx
  	mulx 	%r8, alo, hi
  	adox	ahi, alo
  	mov	hi, ahi			C 2-way unroll.
  	adox	zero, ahi		C Clears O

  	mov	(bp, n), %rdx
  	mulx 	%r9, blo, hi
  	adox	bhi, blo
  	mov	hi, bhi
  	adox	zero, bhi		C clears O

  	adc	blo, alo		C Or sbb, for addsubmul_1msb0
  	mov	alo, (rp, n, 8)
  	inc	n
  	jnz	top

  L(done):
  	adc	bhi, ahi		C No carry out, thanks to msb0
  	mov	ahi, %rax		C Return value

Neat!

Some unrolling would save several instructions:

Put r8 (aka u0) in rdx over k ways of unrolling.  Supply ap[...] limbs
to mulx directly.  Then accumulate with an adox chain, reducing the adox
zero need.  Same for r9/v0.

  (BTW, do I get operand order right for mulx? I'm confused by the docs
  that use the generally different intel conventions).

Your use looks right.

  Now, question is if it can beat mul_1 + addmul_1. I don't know.

It surely has potential.

-- 
Torbjörn
Please encrypt, key id 0xC8601622