div_qr_1n_pi1

Marco Bodrato bodrato at mail.dm.unipi.it
Mon Jun 14 18:29:53 UTC 2021


Ciao,

Il 2021-06-06 22:16 Torbjörn Granlund ha scritto:
> nisse at lysator.liu.se (Niels Möller) writes:
> 
>   Maybe we should have some macrology for that? Or do all relevant
>   processors and compilers support efficient cmov these days? I'm 
> sticking
>   to masking expressions for now.
> 
> Let's not trust results from compiler generated code for these things.
> The mixture of inline asm and plain code is hard for compilers to deal
> with.  Very subtle things can make a huge cycle count difference.

Of course, mixing asm and plain code will not let the compiler much 
freedom...

Should we try if the compiler supports a larger type (e.g. unsigned 
__int128) and define the common macros add_ssaaaa and umul_ppmm based on 
it? In that case the compiler should be able to optimise also across the 
longlong-defined operations.

Ĝis,
m


More information about the gmp-devel mailing list