[PATCH v3 0/4] Add addmul_1, addmul_2, and mul_basecase for IBM z13 and later
Stefan Liebler
stli at linux.ibm.com
Tue Jul 11 14:46:30 CEST 2023
On 23.06.23 11:05, Torbjörn Granlund wrote:
> These improvements are now (finally!) in GMP repo.
Thanks a lot.
>
> I have not run any timing tests, as I trust you to worry about the
> performance.
>
> A mistake we GMP develiopers have made in the past is couting cycles for
> inner loops for quite large trip counts, and then accidentally adding
> overhead as a side effect of beating down the cycle count. One should
> never forget that most bignum computations probably use moderately large
> numbers, which means decreasing overhead.
>
> Running commands like
>
> tune/speed -p10000000 -C -s1-100 mpn_mul_basecase
> tune/speed -p10000000 -C -s1-100 mpn_addmul_1.0xcafecafecafecafe
>
> are helpful.
>
Thanks for the hint. I think I should do those tuning steps for the
different cpu levels and also add fat binary support like posted by
Marius some time ago:
[RFC] Add fat binary support for s390x
https://gmplib.org/list-archives/gmp-devel/2021-September/006012.html
More information about the gmp-devel
mailing list