mul_1 2-way on T3

David Miller davem at
Wed Apr 3 02:28:40 CEST 2013

From: David Miller <davem at>
Date: Tue, 02 Apr 2013 20:24:51 -0400 (EDT)

> Only loop like mul_1a.asm (and potentially mul_1b.asm) can, because
> only they have enough cycles in the loop to retire multiplies without
> positive accumulation into the OoO buffer.

Actually, mul1b.asm cannot achieve 6 cycles per loop because of the
3 cycle load-to-use there.

This causes a OoO queue up, which in turn makes the dependent mulx's
queue up in OoO that much longer, and their dependent instructions
likewise, and so on, and so forth, eventually overflowing the OoO

I'm convinced now that mul_1a.asm is the only loop which will execute

More information about the gmp-devel mailing list