div_qr_1n_pi1
Marco Bodrato
bodrato at mail.dm.unipi.it
Mon Jun 14 18:29:53 UTC 2021
Ciao,
Il 2021-06-06 22:16 Torbjörn Granlund ha scritto:
> nisse at lysator.liu.se (Niels Möller) writes:
>
> Maybe we should have some macrology for that? Or do all relevant
> processors and compilers support efficient cmov these days? I'm
> sticking
> to masking expressions for now.
>
> Let's not trust results from compiler generated code for these things.
> The mixture of inline asm and plain code is hard for compilers to deal
> with. Very subtle things can make a huge cycle count difference.
Of course, mixing asm and plain code will not let the compiler much
freedom...
Should we try if the compiler supports a larger type (e.g. unsigned
__int128) and define the common macros add_ssaaaa and umul_ppmm based on
it? In that case the compiler should be able to optimise also across the
longlong-defined operations.
Ĝis,
m
More information about the gmp-devel
mailing list