Risc V greatly underperforms

Wed Oct 6 10:12:18 UTC 2021

On 10/6/21 10:48 AM, Torbjörn Granlund wrote:
> Hans Petter Selasky <hps at selasky.org> writes:
> 
>    If the GMP could utilitize multiple cores when doing bignum
>    multiplication and addition, I think the picture would look different.
> 
>    For example for addition, you could split the number in two parts, and
>    then speculate if there is an addition for the higher part or not.
> 
> And if the guess is wrong, then what?

Hi,

Then you get a penalty. But the penalty might not be so big assuming 
random input. Adding one to a number is pretty cheap and you only need 
to continue traversing the data words making up the number when the 
increment overflows. Which in turn gets you a variable number of iterations.

> It is well knowm in a model which ignores caches and memory bandwidth,
> than one can get 2n/k + log(k) word operation steps for n-word addition
> on k execution agents.  Agent k computes the sum of block k with both
> carry = 1 and carry in = 0 and saves both results.  The log(k) term is
> for serially choosing the proper block depending on whether carry-in
> happened to specific blocks.
> 
> On a cached system, I would expect this algorithm to just slow things
> down.
> 
>    I thought that RISC-V would produce cheaper and more cores, and that
>    single core performance was not that critical.
> 
> Slow cores are useful in some applications, sure.
> 
>    Talking about x86, don't forget that there is microcode below each
>    instruction.
> 
> This is a false sattement.  Even it it were true, how is that relevant
> for this discusson?  The relevant instructions run in one cycle.

How microcode works and what instruction sequences are optimal for a 
bignum adder, I will not go into. My point is just that x86 instructions 
are parsed before they are executed. Almost like a VM.

I would guess that if RISC-V executed "N" instructions at a time on the 
same logical core w/o using microcode, the performance would be 
comparable to x86. Then it would be up to the compiler to layout the 
instructions correctly and not the microcode.

--HPS