Computing A mod d (for small odd d without division and multiplication)
Torbjörn Granlund
tg at gmplib.org
Sun Mar 15 20:38:12 CET 2026
marco.bodrato at tutanota.com writes:
Actually, I did not touch the inner loop, I just simplified the outer one,
removing the unneeded rems[] array, and the unnecessary acc variable.
Right.
The time needed to initialize the computation, and the effect of cache missis change a lot
for different bases, not far from one another. Not only to use this strategy we have to write
an efficient inner-loop, but we also have to think how to handle "thresholds"...
Always a pain.
Does ARM have SIMD 64-bits addition with carry? Really? Interesting!
I am not aware of any add-with-carry SIMD insns.
Arm has means of computing carry-out for all elements of a vector
register (CMHI, CMHS). (I have not looked at the newer variable-length
vector stuff (SVG?).)
IIRC, PowerPC have even more powerful instructions, even add with
carry-in in a 3rd input vector register, and separate instruvtions for
generating carry-out.
There are machines which impelement this in the gcc compiler farm.
--
Torbjörn
Please encrypt, key id 0xC8601622
More information about the gmp-devel
mailing list