Use of AVX instructions in mpn_mul_1
bodrato at mail.dm.unipi.it
Fri Jun 17 14:50:55 CEST 2022
Il 2022-06-13 23:17 Thanassis Tsiodras ha scritto:
> I had a quick look at the x86_64 assembly implementations of the basic
> primitive used in multiplications (mpn_mul_1), and saw this:
> ...I could not find any use of AVX-integer-related multiplication
> I am talking about things like " _mm512_mul_epu32", which at first
> seemed promising (8x32bit multiplications in one instruction generating
> 8x64-bit results in one go).
Four 32x32->64 multiplications perform the same multiplication work of
one 64x64->128. But are "8x32bit multiplications in one instruction"
faster then two 64x64 mul? As you confirm, many other additions with
carry propagation are needed.
So the question is, does using AVX reduce the resources needed for a
> I can't see a way to do that optimally. Is that the reason GMP asm code
> seems to prefer the simple 64x64 => 128 instructions? (mul %rcx)
When you'll find an implementation with AVX, more efficient than our
current implementation, you can contribute it to the project :-)
More information about the gmp-devel