Writing mpz_powm for GPU/CUDA

Niall Emmart nemmart at hotmail.com
Wed Jun 12 11:05:15 UTC 2019

Hi Dan,

Can I suggest you have a look at CGBN?  It's a throughput oriented big num library for CUDA, which you can find here: https://github.com/NVlabs/CGBN

It supports moderate size numbers, in the range of say 128-16K bits, efficiently.    The modexp performance on a recent GPU (Volta+) is quite extraordinary compared.

I'm not sure this is relevant to your use case, but also note the CGBN modexp API is not constant time and is therefore susceptible to side channel attacks.

Let me know if you have any questions.


From: gmp-discuss <gmp-discuss-bounces at gmplib.org> on behalf of Dan Cline <decline at umass.edu>
Sent: Monday, June 10, 2019 9:32 PM
To: gmp-discuss at gmplib.org
Subject: Writing mpz_powm for GPU/CUDA

So I have an application that requires modular squaring thousands of
4096-bit numbers at once, and I was planning on writing CUDA to do all of
the multiplications in parallel. How much work would it be to port over
just the mpz_powm method to CUDA? I know that the latency for each single
multiplication won't be as good as on the CPU, but I'm looking for
throughput here, not latency.

Dan Cline
gmp-discuss mailing list
gmp-discuss at gmplib.org

More information about the gmp-discuss mailing list