Writing mpz_powm for GPU/CUDA

Dan Cline decline at umass.edu
Tue Jun 11 01:32:17 UTC 2019

So I have an application that requires modular squaring thousands of
4096-bit numbers at once, and I was planning on writing CUDA to do all of
the multiplications in parallel. How much work would it be to port over
just the mpz_powm method to CUDA? I know that the latency for each single
multiplication won't be as good as on the CPU, but I'm looking for
throughput here, not latency.

Dan Cline

More information about the gmp-discuss mailing list