[PATCH v3 0/4] Add addmul_1, addmul_2, and mul_basecase for IBM z13 and later

Stefan Liebler stli at linux.ibm.com
Tue Jul 11 14:46:30 CEST 2023


On 23.06.23 11:05, Torbjörn Granlund wrote:
> These improvements are now (finally!) in GMP repo.
Thanks a lot.
> 
> I have not run any timing tests, as I trust you to worry about the
> performance.
> 
> A mistake we GMP develiopers have made in the past is couting cycles for
> inner loops for quite large trip counts, and then accidentally adding
> overhead as a side effect of beating down the cycle count.  One should
> never forget that most bignum computations probably use moderately large
> numbers, which means decreasing overhead.
> 
> Running commands like
> 
>   tune/speed -p10000000 -C -s1-100 mpn_mul_basecase
>   tune/speed -p10000000 -C -s1-100 mpn_addmul_1.0xcafecafecafecafe
> 
> are helpful.
> 
Thanks for the hint. I think I should do those tuning steps for the
different cpu levels and also add fat binary support like posted by
Marius some time ago:
[RFC] Add fat binary support for s390x
https://gmplib.org/list-archives/gmp-devel/2021-September/006012.html


More information about the gmp-devel mailing list