Risc V greatly underperforms

Torbjörn Granlund tg at gmplib.org
Tue Sep 21 10:03:43 UTC 2021

A carry bit helps for some codes, GMP being a prime example.

Keeping carry/borrow conditions in plain registers can be made to work
well too.  But then you need good ways of computing carry/borrow, and
good ways of inputting the carry/borrow result to dependent add/subtract

Risc V has OK ways of computing borrow but not carry.  Risc V lacks good
ways of inputting carry/borrow to dependent add/subtract instructions.

For the subtraction c = a - b we could compare a and b independent of
the subtraction using sltu.  The sltu instruction is of course a
subtraction which puts the the borrow-out in its result register.

But for the addition c = a + b we don't have anything which computes the
carry-out, a "compute carry-out from add" would be needed.  We now need
to first perform the add, then use sltu on the result, thus creating a
very unfortunate dependency.  Instruction dependencies are a major
performance killer for super-scalar processors.

We also don't have any way of efficiently consuming a computed
carry/borrow result.  3-input add/subtract would have solved that
(together with 3-input sltu and a 3-input "compute carry from add").

This all means that on Risc V, multi-word subtraction could be made to
at 2 cycles/word while multi-word addition is limited to 3 cycles/word,
in both cases assuming a very wide super-scalar core.  Remember that
other concurrent CPUs do these in 1+epsilon cycles/word, and that
without needing to do wide super-scalar dispatch.

I use multi-word add/subtract here as an example of the inefficiencies
of Risc V.  But the weak instruction set of Risc V shows in any
integer-heavy application, as others have pointed out before me.

Please encrypt, key id 0xC8601622

More information about the gmp-devel mailing list