AMD Ryzen 5 7600X gmpbench submission (result 8010.2) -- new: 9176.2

hermann at stamm-wilbrandt.de hermann at stamm-wilbrandt.de
Sun Jul 9 11:55:12 CEST 2023


On 2023-07-08 23:24, Torbjörn Granlund wrote:
> The result is 14.5% increase in GMPbench value, new value is 9176.2 !
> 
> That's much better, but still not as good as it should be with some
> optimisation.
> 

I had to upgrade 7600X PC Bios to version 1.24 to avoid CPU burn risk 
for Ryzen 7000 CPUs.
Therefore I took baseline with 1.24 Bios, only small differences to 
before:
https://github.com/Hermann-SW/QuadraticRegression/commit/89cd05b2cc7de406b8dc6cecc99c4d56ba8fe361

This is the libgmpxx code used to compute "powm(_, qnr, p/4, p)" and 
measuring runtimes:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.cc

While daily snapshot did increase gmpbench value from 8010.2 to 9176.2,
for sqrtm1.cc runtimes increased a bit:

+--------+--------+--------+--------+
! 100355 ! 200700 ! 272770 ! 330855 ! #digits(p) for prime p
+--------+--------+--------+--------+
!  447.1 ! 2144.4 ! 3748.0 ! 5311.2 ! gmp 6.2.1 (baseline with 1.24 
Bios)
+--------+--------+--------+--------+
!  453.1 ! 2158.4 ! 3803.1 | 5403.0 ! gmp-6.2.99-20230706074853.tar.zst
+--------+--------+--------+--------+


> Specifically, there seems to be something slightly off
> with the multiply performance.
> 
Above measurements seem to agree with your statement.


> Unfortunately, I don't have Zen4 hardware, nor do I the time to 
> optimise
> GMP for Zen4 anytime soon.
> 
Is there any doc or postings on how to do CPU specific optimizations for 
gmplib?


While determining sqrtm1="sqrt(-1) (mod p)" is really compute intensive 
for 100Ks digit primes,
other computations are superfast. In this libgmpxx code x and y are 
provided as constants,
the sum of their squares being smallest known 1million digit prime:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.smallest_known_1million_digit_prime.cc

Determining sqrtm1 from x and y takes only 0.2s with GMP powm().
Determining sum of squares from sqrtm1 takes long with GMP gcd(),
but takes only 0.23s with GMP based PARI lib halfgcdii():
(I learned about that from Bill Allombert on pari-users mailing list)

hermann at 7600x:~/RSA_numbers_factored/c++$ 
./sqrtm1.smallest_known_1million_digit_prime
a = y^(-1) (mod p) [powm]; a *= x; a %= p
0.204478s
[M,V] = halfgcdii(sqrtm1, p)
0.226863s
[x,y] = [V[2], M[2,1]]
0s
done
hermann at 7600x:~/RSA_numbers_factored/c++$


Regards,

Hermann Stamm-Wilbrandt.


More information about the gmp-devel mailing list