Risc V greatly underperforms

Hans Petter Selasky hps at selasky.org
Wed Oct 6 10:12:18 UTC 2021

On 10/6/21 10:48 AM, Torbjörn Granlund wrote:
> Hans Petter Selasky <hps at selasky.org> writes:
>    If the GMP could utilitize multiple cores when doing bignum
>    multiplication and addition, I think the picture would look different.
>    For example for addition, you could split the number in two parts, and
>    then speculate if there is an addition for the higher part or not.
> And if the guess is wrong, then what?


Then you get a penalty. But the penalty might not be so big assuming 
random input. Adding one to a number is pretty cheap and you only need 
to continue traversing the data words making up the number when the 
increment overflows. Which in turn gets you a variable number of iterations.

> It is well knowm in a model which ignores caches and memory bandwidth,
> than one can get 2n/k + log(k) word operation steps for n-word addition
> on k execution agents.  Agent k computes the sum of block k with both
> carry = 1 and carry in = 0 and saves both results.  The log(k) term is
> for serially choosing the proper block depending on whether carry-in
> happened to specific blocks.
> On a cached system, I would expect this algorithm to just slow things
> down.
>    I thought that RISC-V would produce cheaper and more cores, and that
>    single core performance was not that critical.
> Slow cores are useful in some applications, sure.
>    Talking about x86, don't forget that there is microcode below each
>    instruction.
> This is a false sattement.  Even it it were true, how is that relevant
> for this discusson?  The relevant instructions run in one cycle.

How microcode works and what instruction sequences are optimal for a 
bignum adder, I will not go into. My point is just that x86 instructions 
are parsed before they are executed. Almost like a VM.

I would guess that if RISC-V executed "N" instructions at a time on the 
same logical core w/o using microcode, the performance would be 
comparable to x86. Then it would be up to the compiler to layout the 
instructions correctly and not the microcode.


More information about the gmp-devel mailing list