Leaky multiply instruction on Cortex-A75

Tue Dec 18 16:19:05 UTC 2018

tg at gmplib.org (Torbjörn Granlund) writes:

  There is no UMULHI instructions.  UMULH is our 64b x 64b -> 64b highhalf
  instruction.

  The other instructions have 32-bit operands.

It might be worth noticing that multiply instructions with "long" in
their names do shorter multiplication than those without it.  The long
ones have at least one 32-bit operand.

It is not clear what we should do about ARM inc's arm64 GMP performance.
My approach with karatsuba might not be the best one; we have cortex-a15
neon code which runs at 1.3 c/l for addmul_2; this corresponds to 5.2
c/l for an arm64 addmul_1.  This matches my ideal karatsuba code (which
stays clear of neon).

So perhaps the way forward is using neon, with all the tribulations?

PS. I imply to critique against neon; it is a very fine set of
instructions. It just hurts to do bignum using SIMD, even well-designed
SIMD like neon.

-- 
Torbjörn
Please encrypt, key id 0xC8601622