Better tabselect
Torbjorn Granlund
tg at gmplib.org
Thu Apr 11 23:55:18 CEST 2013
I've written a few variants of tabselect using a different table
traversal order. I think of this as horisontal, making the old one
vertical.
An arm neon variant which I think has become nice, thanks to neon's
elegance. It improves the A9 performance by ~100% and the A15
performance by ~30% compared with the old neon tabselect code.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arm-neon-horis-tabselect-w8.asm
Type: application/octet-stream
Size: 2884 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130411/8d44f07c/attachment.obj>
-------------- next part --------------
An x86_64 sse2 variant using the same pattern, but not as elegant... We
had no sse2 tabselect before, so this is a huge improvement thanks to th
new operation order and sse2. This is the best code for all x86_64 cpus
except atom and via nano. The via nano problem is great alignment
dependency (which we could handle at the expense of code complexity).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x64-sse-horis-tabselect-w8.asm
Type: application/octet-stream
Size: 4158 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130411/8d44f07c/attachment-0001.obj>
-------------- next part --------------
Finally an x86-64 version using plain 64-bit instructions. Less
polished, and less unrolled. Best for atom, probably via nano.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x64-horis-tabselect.asm
Type: application/octet-stream
Size: 2631 bytes
Desc: not available
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20130411/8d44f07c/attachment-0002.obj>
-------------- next part --------------
Improvements welcome!
I think we need to write new tabselect also for ppc64, sparc64, and
perhaps x86_32. The latter could use a variant of our
x64-sse-horis-tabselect-w8.asm, at least some intel cpus.
--
Torbj?rn
More information about the gmp-devel
mailing list