Possible new T3-T5 mul_1
Torbjorn Granlund
tg at gmplib.org
Wed Apr 3 01:05:05 CEST 2013
David Miller <davem at davemloft.net> writes:
From: Torbjorn Granlund <tg at gmplib.org>
Date: Tue, 02 Apr 2013 21:59:17 +0200
> .global main
> main: save %sp, -176, %sp
> sethi %hi(2800000000), %g5
> 1: addcc %g7, %g7, %l0
> addxccc %g7, %g7, %l1
> addxccc %g7, %g7, %l2
> addxccc %g7, %g7, %l3
> addcc %g7, %g7, %l4
> addxccc %g7, %g7, %l5
> addxccc %g7, %g7, %l6
> addxccc %g7, %g7, %l7
> brnz %g5, 1b
> dec %g5
> ret
> restore
This runs in 4.922 seconds.
Good, so 5 cycles. (Your system runs not at 2.8 GHz as I assumed, but
slightly more.)
I have to admit that I'm a bit surprised.
It is not really a high-performance pipeline, but it has some aspects of
high-performance pipelines. Carry reg renaming has been around since at
least AMD K7.
I rescheduled the addmul_2 and mul_2. If I have not misunderstood this
pipeline, we should finally reach 3.5 c/l and 3 c/l, respectively.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sparct34-aormul_2.asm
Type: application/octet-stream
Size: 5574 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130403/51d13694/attachment.obj>
-------------- next part --------------
--
Torbj?rn
More information about the gmp-devel
mailing list