[PATCH] Rewrite T3/T4 {add,sub}_n.asm
David Miller
davem at davemloft.net
Thu Apr 4 23:55:53 CEST 2013
From: Torbjorn Granlund <tg at gmplib.org>
Date: Thu, 04 Apr 2013 23:50:03 +0200
> David Miller <davem at davemloft.net> writes:
>
> T3 seems to be much more sensitive to the loop alignment than T4 is.
> For example, if I take out the ALIGN(16) and the annulling branch
> from add_n.asm, T3 takes an extra cycle to execute (thus 8.5 c/l)
>
> I put ALIGN(32) in mul_1.asm and aormul_2.asm. Should these be
> ALIGN(16) for a start?
I'm unsure of the various exact requirements, let me do some more
research on this.
> Pushed, after having remapped g2 to g5 for the benefit of my emulation
> code. (Yes, I should fix that...)
I put that in there just to make sure you were paying attention. :-)
> (Also updated gmplib.org/devel/asm.html.)
Looks good, thanks.
BTW, even after the ALIGN(2) fix invert_limb.asm is still partially
busted, we need to add PIC code. I'll take care of this when I get
a chance.
More information about the gmp-devel
mailing list