Why assembler version of addmul_1 is so fast?
Richard at Damon-Family.org
Sun Feb 2 13:04:54 UTC 2020
On 2/1/20 6:45 PM, Torbjörn Granlund wrote:
> The main problems are that the full integer product of two multiplied
> integer variables is not accessible even if the underlying hardware can
> provide the full product. Most high-level languages only return the low
> half of such products.
A half-way decent optimizer, for 32 bit ints and 64 bit longs (or 64/128
bit), should be able to optimize the following to use such an instruction:
int i, j
long k = ((long)i)*j;
Yes, technically the long * long that is requested might require doing
the partial multiplies and adding their results, but since the peephole
optimizer can see that the values are within smaller bound, so only the
low order partial is really needed.
More information about the gmp-discuss