Writing mpz_powm for GPU/CUDA
Dan Cline
decline at umass.edu
Tue Jun 11 01:32:17 UTC 2019
So I have an application that requires modular squaring thousands of
4096-bit numbers at once, and I was planning on writing CUDA to do all of
the multiplications in parallel. How much work would it be to port over
just the mpz_powm method to CUDA? I know that the latency for each single
multiplication won't be as good as on the CPU, but I'm looking for
throughput here, not latency.
Thanks,
Dan Cline
More information about the gmp-discuss
mailing list