Performance on riscv32
Niels Möller
nisse at lysator.liu.se
Mon Mar 30 16:47:45 CEST 2026
Hi,
I'm trying to do ed25519 operations on a slow riscv32 system. I'm now
using nettle + mini-gmp, with an umul_ppmm patched to use (uint64_t) u *
v, which results in reasonable code using mul and mulhu instructions.
I get this mpn_addmul_1 inner loop:
28: 4010 lw a2, 0x0(s0)
2a: 032636b3 mulhu a3, a2, s2
2e: 03260633 mul a2, a2, s2
32: 962a add a2, a2, a0
34: 4098 lw a4, 0x0(s1)
36: 00a63533 sltu a0, a2, a0
3a: 9536 add a0, a0, a3
3c: 0411 addi s0, s0, 0x4
3e: 9732 add a4, a4, a2
40: 00c73633 sltu a2, a4, a2
44: 9532 add a0, a0, a2
46: 00448613 addi a2, s1, 0x4
4a: c098 sw a4, 0x0(s1)
4c: 84b2 mv s1, a2
4e: fcb61de3 bne a2, a1, 0x28 <mpn_addmul_1+0x28>
As I understand it, sltu + add is needed for each carry propagation.
For comparison, plain mpn_add_n gets compiled to an inner loop
c: 419c lw a5, 0x0(a1)
e: 4214 lw a3, 0x0(a2)
10: 973e add a4, a4, a5
12: 00f737b3 sltu a5, a4, a5
16: 96ba add a3, a3, a4
18: 00e6b733 sltu a4, a3, a4
1c: 973e add a4, a4, a5
1e: c114 sw a3, 0x0(a0)
20: 0511 addi a0, a0, 0x4
22: 0611 addi a2, a2, 0x4
24: 0591 addi a1, a1, 0x4
26: ff0513e3 bne a0, a6, 0xc <mpn_add_n+0xc>
Besides the 4(!) instructions for carry propagation, also lack of
indexed addressing looks somwwhat costly.
I'm not that familiar with riscv, but to me the generated code looks
pretty good under the architectural limitations, and I see no obvious
microptimizations (only a single move instruction that appear a bit
redundant). But I may be missing something.
When benchmarking, my ed25519 code is about 50% slower slower than the
monocypher C library, for the ed25519 signing operation (10 million
cycles vs 7 million). That library appears to use arithmetic based on
nail bits (in GMP terminology), to avoid dealing with low-level carry
propagation (and it also has the advantage of specialized code for the
size of interest). So I wonder, is it possible to get reasonable speed
with fullsize limbs (no nails) on this platform? If I could switch from
mini-gmp to full gmp (a bit challenging due to the rather limited
environment with no normal libc), and revive GMP nails code, would that
make sense for performance?
Regards,
/Niels
--
Niels Möller. PGP key CB4962D070D77D7FCB8BA36271D8F1FF368C6677.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list