marc.glisse at inria.fr
Wed Mar 23 21:23:09 UTC 2016
On Wed, 23 Mar 2016, Victor Shoup wrote:
> This may be a bit off topic, but I figure the people on this list
> might know something about this.
> In some code I've been developing lately (NTL related, of course),
> I've been making more use of the __uint128_t type that is available
> on gcc (and its clang and icc clones). It's all ifdef'd properly, so I
> only use it when it actually works.
> Anyway, I find that on x86-64 machines and recent gcc's, the compiler
> does a pretty good job of code generation...much better than I recall
> some years ago. However, I was wondering about the 64-bit ARM
> machine. I don't have access to such a machine, but I tried some code
> out at https://gcc.godbolt.org (which is a very convenient site, by the way).
> I was somewhat surprised that the code generated there by gcc-4.8 for
> 64-bit ARM was terrible: a 64x64->128 mul gets mapped to
> a generic128x128->128 function call.
You realize ARM64 barely existed at the time of gcc-4.8? If gcc-5, or
better yet a snapshot of gcc-6, still generates suboptimal code, please
report to https://gcc.gnu.org/bugzilla/ with a testcase, and the asm you
would like gcc to generate instead.
> So I'm starting to question whether relying on __uint128_t is such a good idea.
> Maybe it would be better for me to isolate all of that code so that I can
> just drop in appropriate assembly (as in GMP's longlong.h),
> as an alternative.
It is always a compromise...
> I could also ask gcc people what their plans for future optimizations
> in this area are, but I don't know who or where to ask.
You could ask on gcc at gcc.gnu.org, but reporting bugs when you see
suboptimal code generated seems much more likely to get you answers, and
by showing constructive interest it may spark further optimizations.
If this is for the development of free software, the GCC compile farm
includes some aarch64 machines on which you could experiment.
More information about the gmp-devel