Computing A mod d (for small odd d without division and multiplication)

Torbjörn Granlund tg at gmplib.org
Sun Mar 15 20:38:12 CET 2026


marco.bodrato at tutanota.com writes:

  Actually, I did not touch the inner loop, I just simplified the outer one,
  removing the unneeded rems[] array, and the unnecessary acc variable.

Right.

  The time needed to initialize the computation, and the effect of cache missis change a lot
  for different bases, not far from one another. Not only to use this strategy we have to write
  an efficient inner-loop, but we also have to think how to handle "thresholds"...

Always a pain.

  Does ARM have SIMD 64-bits addition with carry? Really? Interesting!

I am not aware of any add-with-carry SIMD insns.

Arm has means of computing carry-out for all elements of a vector
register (CMHI, CMHS).  (I have not looked at the newer variable-length
vector stuff (SVG?).)

IIRC, PowerPC have even more powerful instructions, even add with
carry-in in a 3rd input vector register, and separate instruvtions for
generating carry-out.

There are machines which impelement this in the gcc compiler farm.

-- 
Torbjörn
Please encrypt, key id 0xC8601622


More information about the gmp-devel mailing list