Best way to carry on 2-input architecture?

Torbjörn Granlund tg at
Sun Aug 17 15:05:14 UTC 2014

Both I and Niels have looked into ISAs which support GMP operations

My work is available here:

You're right that "umulhi" and addition as well as subtraction with
carry/borrow are critical operations.  And for multiply throughput is
more important than low latency; except that division with few-word
divisors depend on short latency.

I think a separate carry flag might not be good for modern designs.
Carry/borrow state is better to keep in a plain register.  You might
want to take a look at the Itanic GMP assembly code which achieves 1 c/l
without any separate carry flag (albeit with some Itanic specific
condition bit trickery).

You might consider the alternative of a = b + c + (d bitand 1) and a
corresponding subtract, with both low and high (i.e. carry) variants.
Of course, encoding 4 separate operands and needing 3 register reads has
a cost.  An trick for denser coding is to either enforce that e.g. a ==
d, or that d can just be (say) the low 8 registers.

For multiply, d = (a * b + c) mod B (B being the word base) and d = [(a
* b + c) / B] are very useful.  Again, the encoding and read port
problem might arise.

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list