Risc V greatly underperforms
tg at gmplib.org
Tue Sep 21 10:03:43 UTC 2021
A carry bit helps for some codes, GMP being a prime example.
Keeping carry/borrow conditions in plain registers can be made to work
well too. But then you need good ways of computing carry/borrow, and
good ways of inputting the carry/borrow result to dependent add/subtract
Risc V has OK ways of computing borrow but not carry. Risc V lacks good
ways of inputting carry/borrow to dependent add/subtract instructions.
For the subtraction c = a - b we could compare a and b independent of
the subtraction using sltu. The sltu instruction is of course a
subtraction which puts the the borrow-out in its result register.
But for the addition c = a + b we don't have anything which computes the
carry-out, a "compute carry-out from add" would be needed. We now need
to first perform the add, then use sltu on the result, thus creating a
very unfortunate dependency. Instruction dependencies are a major
performance killer for super-scalar processors.
We also don't have any way of efficiently consuming a computed
carry/borrow result. 3-input add/subtract would have solved that
(together with 3-input sltu and a 3-input "compute carry from add").
This all means that on Risc V, multi-word subtraction could be made to
at 2 cycles/word while multi-word addition is limited to 3 cycles/word,
in both cases assuming a very wide super-scalar core. Remember that
other concurrent CPUs do these in 1+epsilon cycles/word, and that
without needing to do wide super-scalar dispatch.
I use multi-word add/subtract here as an example of the inefficiencies
of Risc V. But the weak instruction set of Risc V shows in any
integer-heavy application, as others have pointed out before me.
Please encrypt, key id 0xC8601622
More information about the gmp-devel