AMD Ryzen 5 7600X gmpbench submission (result 8010.2) -- new: 9176.2
hermann at stamm-wilbrandt.de
hermann at stamm-wilbrandt.de
Sun Jul 9 11:55:12 CEST 2023
On 2023-07-08 23:24, Torbjörn Granlund wrote:
> The result is 14.5% increase in GMPbench value, new value is 9176.2 !
>
> That's much better, but still not as good as it should be with some
> optimisation.
>
I had to upgrade 7600X PC Bios to version 1.24 to avoid CPU burn risk
for Ryzen 7000 CPUs.
Therefore I took baseline with 1.24 Bios, only small differences to
before:
https://github.com/Hermann-SW/QuadraticRegression/commit/89cd05b2cc7de406b8dc6cecc99c4d56ba8fe361
This is the libgmpxx code used to compute "powm(_, qnr, p/4, p)" and
measuring runtimes:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.cc
While daily snapshot did increase gmpbench value from 8010.2 to 9176.2,
for sqrtm1.cc runtimes increased a bit:
+--------+--------+--------+--------+
! 100355 ! 200700 ! 272770 ! 330855 ! #digits(p) for prime p
+--------+--------+--------+--------+
! 447.1 ! 2144.4 ! 3748.0 ! 5311.2 ! gmp 6.2.1 (baseline with 1.24
Bios)
+--------+--------+--------+--------+
! 453.1 ! 2158.4 ! 3803.1 | 5403.0 ! gmp-6.2.99-20230706074853.tar.zst
+--------+--------+--------+--------+
> Specifically, there seems to be something slightly off
> with the multiply performance.
>
Above measurements seem to agree with your statement.
> Unfortunately, I don't have Zen4 hardware, nor do I the time to
> optimise
> GMP for Zen4 anytime soon.
>
Is there any doc or postings on how to do CPU specific optimizations for
gmplib?
While determining sqrtm1="sqrt(-1) (mod p)" is really compute intensive
for 100Ks digit primes,
other computations are superfast. In this libgmpxx code x and y are
provided as constants,
the sum of their squares being smallest known 1million digit prime:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.smallest_known_1million_digit_prime.cc
Determining sqrtm1 from x and y takes only 0.2s with GMP powm().
Determining sum of squares from sqrtm1 takes long with GMP gcd(),
but takes only 0.23s with GMP based PARI lib halfgcdii():
(I learned about that from Bill Allombert on pari-users mailing list)
hermann at 7600x:~/RSA_numbers_factored/c++$
./sqrtm1.smallest_known_1million_digit_prime
a = y^(-1) (mod p) [powm]; a *= x; a %= p
0.204478s
[M,V] = halfgcdii(sqrtm1, p)
0.226863s
[x,y] = [V[2], M[2,1]]
0s
done
hermann at 7600x:~/RSA_numbers_factored/c++$
Regards,
Hermann Stamm-Wilbrandt.
More information about the gmp-devel
mailing list