Use of AVX instructions in mpn_mul_1

Fri Jun 17 14:50:55 CEST 2022

Ciao Thanassis,

Il 2022-06-13 23:17 Thanassis Tsiodras ha scritto:
> I had a quick look at the x86_64 assembly implementations of the basic
> primitive used in multiplications (mpn_mul_1), and saw this:

> ...I could not find any use of AVX-integer-related multiplication
> instructions.
> I am talking about things like " _mm512_mul_epu32", which at first 
> glance
> seemed promising (8x32bit multiplications in one instruction generating
> 8x64-bit results in one go).

Four 32x32->64 multiplications perform the same multiplication work of 
one 64x64->128. But are "8x32bit multiplications in one instruction" 
faster then two 64x64 mul? As you confirm, many other additions with 
carry propagation are needed.

So the question is, does using AVX reduce the resources needed for a 
multiplication?

> I can't see a way to do that optimally. Is that the reason GMP asm code
> seems to prefer the simple 64x64 => 128 instructions?  (mul %rcx)

When you'll find an implementation with AVX, more efficient than our 
current implementation, you can contribute it to the project :-)

Ĝis,
m