Possible new T3-T5 mul_1
tg at gmplib.org
Tue Apr 2 07:34:07 CEST 2013
David Miller <davem at davemloft.net> writes:
Does the current tree even build for you when targetting t3/t4? All
of the umulxhi's in ultrasparct3/mul_1.asm et al. don't put the
parameters inside of parenthesis and your compat macros seem to
Oops, this is the result of defining these things cia config.m4 rather
on a file-by-file basis. I didn't attempt a full build (which I cannot,
unless I hack configure to support this stuff automatically).
Anyways, we need something like the patch below to get the tree
Thanks. I'll push this soonish.
Also attached are the before and after speed output for the
existing T3 mul_1.asm and your 4-way unrolled variant which
appears to converge to 3 c/l.
Excellent, it sustains 2 insn/cycle then! As expected, it is slower at
This version probably overschedules loads, I'll try another variant some
day which fixes that.
I thought that perhaps we'd need even more mulx scheduling. This
variant schedules more than mul_2/addmul_2, but not as much as the
latency suggests; the rest is taken care of by OoO execution.
More information about the gmp-devel