[PATCH] T3/T4 sparc shifts, plus more timings

Torbjorn Granlund tg at gmplib.org
Wed Mar 27 22:07:47 CET 2013


David Miller <davem at davemloft.net> writes:

  As an aside I think we can get it down to 2.5 cycles per limb on
  T4 with 4-way unrolling, and 3.0 cycles per limb with 2-way
  unrolling.
  
  The idea is to decrease the bookkeeping instructions by only
  maintaining base pointers which do not change, and then we have an
  offset which operates as the loop index.
  
  So we'd instead have an 'n_off' instead of 'n', and then in some local
  registers we'd hold:
  
  l3:	up - 8
  l4:	up - 16
  l5:	rp - 8
  l6:	rp - 16
  
A clever trick!  But you will probably get 2.75 c/l for 4-way, not 2.5
c/l.  We'll need infinite unrolling for 2.5...

-- 
Torbjörn


More information about the gmp-devel mailing list