Better tabselect

Torbjorn Granlund tg at gmplib.org
Thu Apr 11 23:55:18 CEST 2013


I've written a few variants of tabselect using a different table
traversal order.  I think of this as horisontal, making the old one
vertical.

An arm neon variant which I think has become nice, thanks to neon's
elegance.  It improves the A9 performance by ~100% and the A15
performance by ~30% compared with the old neon tabselect code.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: arm-neon-horis-tabselect-w8.asm
Type: application/octet-stream
Size: 2884 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130411/8d44f07c/attachment.obj>
-------------- next part --------------

An x86_64 sse2 variant using the same pattern, but not as elegant...  We
had no sse2 tabselect before, so this is a huge improvement thanks to th
new operation order and sse2.  This is the best code for all x86_64 cpus
except atom and via nano.  The via nano problem is great alignment
dependency (which we could handle at the expense of code complexity).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: x64-sse-horis-tabselect-w8.asm
Type: application/octet-stream
Size: 4158 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130411/8d44f07c/attachment-0001.obj>
-------------- next part --------------

Finally an x86-64 version using plain 64-bit instructions.  Less
polished, and less unrolled.  Best for atom, probably via nano.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: x64-horis-tabselect.asm
Type: application/octet-stream
Size: 2631 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130411/8d44f07c/attachment-0002.obj>
-------------- next part --------------

Improvements welcome!

I think we need to write new tabselect also for ppc64, sparc64, and
perhaps x86_32.  The latter could use a variant of our
x64-sse-horis-tabselect-w8.asm, at least some intel cpus.

-- 
Torbj?rn


More information about the gmp-devel mailing list