SSE2 rshift for 64bit Core2
tg at swox.com
Sat Mar 22 03:07:41 CET 2008
Peter Cordes <peter at cordes.ca> writes:
Is there an svn/git/whatever repository for gmp that has stuff like that in
it? I wish I'd known there was already an optimized version.
I don't have any public repository. I've got lots of code in various
stages of development and of varying quality.
Your loop doesn't look much different, except you don't reuse the same pair
of registers so much. That lets you separate the loads and stores from the
shifts to pipeline it, so the OOO core has less re-ordering to do. I
thought register renaming would take care of everything, but I guess not.
Your better-pipelined loop might unroll better than mine.
Register renaming does its job for write-after-read and
write-after-write. I use several registers for better scheduling, OoO
never does a perfect job, scheduling manually when possible is very
Maybe at some point I'll try to make a version tuned for K8. I might wait
until I have access to some K10 hardware, too.
The mpn/x86_64/mpn_?shift code in GMP 4.2.2 is pretty decent for K8 at
I suspect clever 128-bit code could approach 1 c/l for K10.
More information about the gmp-devel