Some secondary asm T3,T4,T5 functions

  I suspect these are simply OoO unit artifacts.
  I wonder if CPU cycle counter reads really flush the full pipeline, or
  whether OoO execution can overlap part of it.
I think you hit the nail on the head.  I suppose we could chain calls
together in speed, such as passing a return value to the next call.


