Risc V greatly underperforms

Wed Oct 6 14:36:25 UTC 2021

Hi Torbjörn,

On 10/6/21 12:47 PM, Torbjörn Granlund wrote:
> Hans Petter Selasky <hps at selasky.org> writes:
> 
>    Then you get a penalty. But the penalty might not be so big assuming
>    random input. Adding one to a number is pretty cheap and you only need
>    to continue traversing the data words making up the number when the
>    increment overflows. Which in turn gets you a variable number of
>    iterations.
> 
> Not good, side-channel leakage.
> 
>    How microcode works and what instruction sequences are optimal for a
>    bignum adder, I will not go into. My point is just that x86
>    instructions are parsed before they are executed. Almost like a VM.
> 
> Ahum.  But Risc V instrutions are not "parsed" you say?
> 
>    I would guess that if RISC-V executed "N" instructions at a time on
>    the same logical core w/o using microcode, the performance would be
>    comparable to x86. Then it would be up to the compiler to layout the
>    instructions correctly and not the microcode.
> 
> You guess wrong.
> 
> Most instructions today has a latency of a single cycle, be it Risc V,
> some x86 core, or Arm.
> 
> Arm has the most powerful instruction set.  But x86 also has powerful
> instructions, albeit very messy from both a programmer's perspective and
> from the hardware's perspective.
> 
> Now you claim that something magic (parsing, microcode) slows things
> down on x86.  Somehow, a single-cycle instruction on x86 is really
> magically slower than a single-cycle instruction on Risc V.

No, this is not what I tried to express. I meant that if the Risc-V is 
modified to consumes a fixed number of parallell instructions, N, per 
clock, instead of just one, that the performance would be comparable to 
that of x86.

> You're dead wrong.
> 
> X86 will use many fewer instructions than Risc V for any task since X86
> has many more instructions and many instructions are also more powerful.
> Typically, instructions run in a single cycle and does not involve
> "microcode".

Yes, for this particular task. But if you for example would have the X86 
count/add/subtract/compare in a permuted fashion for some reason where 
that is optimal, then X86 would no longer fit the purpose either, and 
you would end up with having to spend multiple instructions on X86 to 
handle the missing pieces.

An example of simple permuted counting would be to have every odd bit in 
the variable carry a negative representation of the bit, instead of all 
positive. How would you handle that on x86? I guess you would first have 
to convert from permuted adding to linear and then back again.

> Risc V will never compete with Arm or x86 for integer scientific tasks
> (including crytpography).  It won't even come close.  It would need to
> run at clock speeds several times higher than the competition to come
> close.

To say something is not possible is not clever simply put. That history 
has taught over and over again. Only the opposites of logic is not 
possible, to put it like that :-)

> (Modern CPUs are complex, and surely many instructions are not executed
> as simply as a plain add.  Some instructions are internally split, e.g.,
> "add mem,reg" might be split into a load and a register-based add.  But
> the opposite is also true, that some instruction pairs are glued to at
> later stages be seen as a single instruction.  )
> 

Right.

I guess we are far off-topic on this e-mail thread. Let's not start 
another flamewar on which CPU is the best :-)

--HPS