Victor Shoup shoup at cs.nyu.edu
Wed Mar 23 20:38:03 UTC 2016

This may be a bit off topic, but I figure the people on this list
might know something about this.

In some code I've been developing lately (NTL related, of course), 
I've been making more use of the __uint128_t type that is available
on gcc (and its clang and icc clones).  It's all ifdef'd properly, so I
only use it when it actually works.

Anyway, I find that on x86-64 machines and recent gcc's, the compiler
does a pretty good job of code generation...much better than I recall
some years ago.  However, I was wondering about the 64-bit ARM
machine.  I don't have access to such a machine, but I tried some code
out at https://gcc.godbolt.org (which is a very convenient site, by the way).
I was somewhat surprised that the code generated there by gcc-4.8 for
64-bit ARM was terrible: a 64x64->128 mul gets mapped to
a  generic128x128->128 function call.

So I'm starting to question whether relying on __uint128_t is such a good idea.
Maybe it would be better for me to isolate all of that code so that I can 
just drop in appropriate assembly (as in GMP's longlong.h),
as an alternative.
I could also ask gcc people what their plans for future optimizations 
in this area are, but I don't know who or where to ask.

Any advice or insights would be appreciated.

More information about the gmp-devel mailing list