shoup at cs.nyu.edu
Wed Mar 23 20:38:03 UTC 2016
This may be a bit off topic, but I figure the people on this list
might know something about this.
In some code I've been developing lately (NTL related, of course),
I've been making more use of the __uint128_t type that is available
on gcc (and its clang and icc clones). It's all ifdef'd properly, so I
only use it when it actually works.
Anyway, I find that on x86-64 machines and recent gcc's, the compiler
does a pretty good job of code generation...much better than I recall
some years ago. However, I was wondering about the 64-bit ARM
machine. I don't have access to such a machine, but I tried some code
out at https://gcc.godbolt.org (which is a very convenient site, by the way).
I was somewhat surprised that the code generated there by gcc-4.8 for
64-bit ARM was terrible: a 64x64->128 mul gets mapped to
a generic128x128->128 function call.
So I'm starting to question whether relying on __uint128_t is such a good idea.
Maybe it would be better for me to isolate all of that code so that I can
just drop in appropriate assembly (as in GMP's longlong.h),
as an alternative.
I could also ask gcc people what their plans for future optimizations
in this area are, but I don't know who or where to ask.
Any advice or insights would be appreciated.
More information about the gmp-devel