Risc V greatly underperforms
tg at gmplib.org
Mon Sep 20 09:53:13 UTC 2021
It seems safe to assume that most people on this list have heard of
Risc V by now, the license-free instruction set.
I trust that much fewer have looked at the technical details. I have,
though, as we implement critical inner loops for GMP in assembly.
My conclusion is that Risc V is a terrible architecture. It has a
uniquely weak instruction set. Any task will require more Risc V
instructions that any contemporary instruction set. Sure, it is
"clean" but just to make it clean, there was no reason to be naive.
I believe that an average computer science student could come up with
a better instruction set that Risc V in a single term project. It is,
more-or-less a watered down version of the 30 year old Alpha ISA after
all. (Alpha made sense at its time, with the transistor budget
available at the time.)
Let's look at some examples of how Risc V underperforms. First,
addition of a double-word integer with carry-out:
add t0, a4, a6 // add low words
sltu t6, t0, a4 // compute carry-out from low add
add t1, a5, a7 // add hi words
sltu t2, t1, a5 // compute carry-out from high add
add t4, t1, t6 // add carry to low result
sltu t3, t4, t1 // compute carry out from the carry add
add t6, t2, t3 // combine carries
Same for 64-bit arm:
adds x12, x6, x10
adcs x13, x7, x11
Same for 64-bit x86:
add %r8, %rax
adc %r9, %rdx
(Some additional move insn might be needed for x86 due to the
2-operand nature of this arch.)
If we generalise this to GMP's arbitrarily wide addition, we will end
of with 2 to 3 times more instructions, and go from just over 1 cycle
per 64-bit result word to 3 cycles per word. The 3 cycles will happen
even for a wide implementation which could execute a large number of
instructions in parallel. The critical path will be add->sltu->add
which are three dependent instructions i.e. 3 cycles.
I have heard that Risc V proponents say that these problems are known
and could be fixed by having the hardware fuse dependent instructions.
Perhaps that could lessen the instruction set shortcomings, but will
it fix the 3x worse performance for cases like the one outlined here?
Why not provide a decent instruction set instead?
I don't think designing a decent ISA is terribly difficult. Designing
a great ISA is. Designing something like that Risc V is trivial.
Full disclosure: I have no financial or other interest in any computer
architecture mentioned here or not mentioned here. I really like the
idea of a license-free ISA.
Please encrypt, key id 0xC8601622
More information about the gmp-devel