[PATCH] Rewrite T3/T4 {add,sub}_n.asm

Thu Apr 4 23:55:53 CEST 2013

From: Torbjorn Granlund <tg at gmplib.org>
Date: Thu, 04 Apr 2013 23:50:03 +0200

> David Miller <davem at davemloft.net> writes:
> 
>   T3 seems to be much more sensitive to the loop alignment than T4 is.
>   For example, if I take out the ALIGN(16) and the annulling branch
>   from add_n.asm, T3 takes an extra cycle to execute (thus 8.5 c/l)
>   
> I put ALIGN(32) in mul_1.asm and aormul_2.asm.  Should these be
> ALIGN(16) for a start?

I'm unsure of the various exact requirements, let me do some more
research on this.

> Pushed, after having remapped g2 to g5 for the benefit of my emulation
> code.  (Yes, I should fix that...)

I put that in there just to make sure you were paying attention. :-)

> (Also updated gmplib.org/devel/asm.html.)

Looks good, thanks.

BTW, even after the ALIGN(2) fix invert_limb.asm is still partially
busted, we need to add PIC code.  I'll take care of this when I get
a chance.