Please update addaddmul_1msb0.asm to support ABI in mingw64
Niels Möller
nisse at lysator.liu.se
Tue Oct 5 18:56:56 UTC 2021
Marco Bodrato <bodrato at mail.dm.unipi.it> writes:
> Well, I added one more move to order the cases as you suggest. The
> code gets a little bit shorter.
Thanks, looks good to me. I think one more instruction is easy to move,
see below.
> I also renamed registers, so that a push/pop couple is needed only if
> the loop is used; this may save a couple of cycles when the size is
> small. Does this make sense?
Makes sense.
> L(end): mul %r9
> add %rax, %r11
> adc %rdx, %r10
> cmp $1, R32(n)
> ja L(two)
> jnz L(nul)
>
> mov -8(ap), %rax <-- 1
I think this instruction and the one marked "2" below can be moved to
the start of the L(ona): part, just before the mul %r8 ("3" below).
Slightly worse scheduling, though.
> mov %r11, -16(rp)
> mov %r10, %r11
> jmp L(one)
I had hoped this jump and preceding instructions could be eliminated, to
get a structure like
ja L(two)
jz L(one)
L(nul): (no jumps to this label left)
...
fall through
L(one):
...
fall through
L(two):
...
function exit
But might need other move instructions, to get the right data into the
right registers?
> L(nul): mov -16(ap), %rax
> mov %r11, -24(rp)
> mul %r8
> add %rax, %r10
> mov -16(bp), %rax
> mov $0, R32(%r11)
> adc %rdx, %r11
> mul %r9
> add %rax, %r10
> mov -8(ap), %rax <-- 2
> adc %rdx, %r11
> mov %r10, -16(rp)
> L(one): mul %r8 <-- 3
> add %rax, %r11
> mov -8(bp), %rax
> mov $0, R32(%r10)
> adc %rdx, %r10
> mul %r9
> add %rax, %r11
> adc %rdx, %r10
>
> L(two): mov %r11, -8(rp)
> mov %r10, %rax
> L(ret): pop %rbp
> FUNC_EXIT()
> ret
> EPILOGUE()
So I think your version is an improvement as is, and perhaps not worth
the effort to try to eliminate a few more instructions if this rather
obscure function.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list