Performance on riscv32

Mon Mar 30 18:28:34 CEST 2026

Niels Möller <nisse at lysator.liu.se> writes:

  I'm not that familiar with riscv, but to me the generated code looks
  pretty good under the architectural limitations, and I see no obvious
  microptimizations (only a single move instruction that appear a bit
  redundant). But I may be missing something.

The one thing you could do is to unroll the code (by means of
-funroll-loops, presumably).  That would mitigate the problem with
RiscV's weak addressing.

  When benchmarking, my ed25519 code is about 50% slower slower than the
  monocypher C library, for the ed25519 signing operation (10 million
  cycles vs 7 million). That library appears to use arithmetic based on
  nail bits (in GMP terminology), to avoid dealing with low-level carry
  propagation (and it also has the advantage of specialized code for the
  size of interest). So I wonder, is it possible to get reasonable speed
  with fullsize limbs (no nails) on this platform? If I could switch from
  mini-gmp to full gmp (a bit challenging due to the rather limited
  environment with no normal libc), and revive GMP nails code, would that
  make sense for performance?

Just like for Alpha and MIPS, nails are neessary.  In fact, the quite
new RiscV is not all that different from those decades old
architectures.

Unfortunately, we've let nails rot in GMP.  I don't expect it to be
terribly hard to make it work again.

There are SIMD optional instructions and I believe they tried to address
some of the shortcomings of the basic instruction set there.  They even
have some carry-support, IIRC.  I have no idea if it is well-designed
enough to be practically useful, though.

And beware of the "modular" design of RiscV!  Not only this SIMD stuff
is optimal (which is unsurprising).  Need rotate insructions?  Those are
optional!  The full instruction set is weak, the mandatory basic
instruction set is extremely limited.

-- 
Torbjörn
Please encrypt, key id 0xC8601622