Possible new T3-T5 mul_1

Torbjorn Granlund tg at gmplib.org
Tue Apr 2 07:34:07 CEST 2013

David Miller <davem at davemloft.net> writes:

  Does the current tree even build for you when targetting t3/t4?  All
  of the umulxhi's in ultrasparct3/mul_1.asm et al. don't put the
  parameters inside of parenthesis and your compat macros seem to
  require this.
Oops, this is the result of defining these things cia config.m4 rather
on a file-by-file basis.  I didn't attempt a full build (which I cannot,
unless I hack configure to support this stuff automatically).

  Anyways, we need something like the patch below to get the tree
  building again.
Thanks.  I'll push this soonish.

  Also attached are the before and after speed output for the
  existing T3 mul_1.asm and your 4-way unrolled variant which
  appears to converge to 3 c/l.

Excellent, it sustains 2 insn/cycle then!  As expected, it is slower at

This version probably overschedules loads, I'll try another variant some
day which fixes that.

I thought that perhaps we'd need even more mulx scheduling.  This
variant schedules more than mul_2/addmul_2, but not as much as the
latency suggests; the rest is taken care of by OoO execution.


More information about the gmp-devel mailing list