tg at gmplib.org
Sun Oct 11 23:26:07 UTC 2015
Victor Shoup <shoup at cs.nyu.edu> writes:
Within the next couple of years, we can expect to see
a new instruction on Intel chips: VPMADD52.
This will be a part of the AVX512 ISA, but it's not clear
when actually chips with these instructions will ship.
One variant does an 8-way 52-bit x 52-bit -> low 52-bits
"fused multiply add" on integers. Another does the same,
but with the high-order 52-bits of the product.
Obviously, Intel is going to leverage their SIMD FP hardware
for this...and one might also infer from this that true 64-bit
SIMD instructions are nowhere on Intel's roadmap.
So the question is: what is GMP's roadmap for SIMD development,
and does it include any plans for VPMADD52? I've been talking to a
fellow at Intel about this (Shay Gueron), who is potentially interested
in contributing code to GMP. I'm also interested, because of potential applications
to my NTL library for faster multi-modular FFTs.
The current 64x64 -> 128 bit instruction performs 3 times more work than
one way of these new instructions (1.51 times more because of width, 2
times because it produces the full product with one insn instead of
two). So if one could make perfect use of all 8 ways we could hope for
2.3 times better performance @ the same clock if the instructions have
the same throughput (which probably is a reasonable assumption).
Less excited already?
Making use of SIMD for a single bignum operation is difficult. One
typically needs to transfer intermediate results at a high rate to the
integer register for final accumulation.
Due to the divide-and-conquer nature of GMP's multiply algorithms,
operands tend to be fairly small. They also don't come in multiple of 8
words except about 1/8 of the times...
I'd be impressed if one could get close to 50% utilisation of an 8-way
multiply feature such as this. Now we're at 1.15 times speedup (if we
assume 100% utilisation of plain 64x64->128 mul, which is getting closer
to true for each Intel CPU generation).
Other important operations, such as 2-adic reductions (aka "Montgomery
multiplication") will be even harder to deal with.
SIMD is hard to use on inherently single data.
One concrete issue: if one wanted to fully exploit VPMADD52 instructions,
then perhaps that would be a good reason to enable the "nails" feature
"Nails" used to work a few years ago, but I expect some bitrot now. It
would probably take a day or two to make it work again.
Please encrypt, key id 0xC8601622
More information about the gmp-devel