SSE2 rshift for 64bit Core2

Torbjorn Granlund tg at
Sat Mar 22 03:07:41 CET 2008

Peter Cordes <peter at> writes:

   Is there an svn/git/whatever repository for gmp that has stuff like that in
  it?  I wish I'd known there was already an optimized version.

I don't have any public repository.  I've got lots of code in various
stages of development and of varying quality.

   Your loop doesn't look much different, except you don't reuse the same pair
  of registers so much.  That lets you separate the loads and stores from the
  shifts to pipeline it, so the OOO core has less re-ordering to do.  I
  thought register renaming would take care of everything, but I guess not.
  Your better-pipelined loop might unroll better than mine.

Register renaming does its job for write-after-read and
write-after-write.  I use several registers for better scheduling, OoO
never does a perfect job, scheduling manually when possible is very

   Maybe at some point I'll try to make a version tuned for K8.  I might wait
  until I have access to some K10 hardware, too.

The mpn/x86_64/mpn_?shift code in GMP 4.2.2 is pretty decent for K8 at
2.5 c/l.

I suspect clever 128-bit code could approach 1 c/l for K10.


More information about the gmp-devel mailing list