Why assembler version of addmul_1 is so fast?

Sun Feb 2 13:04:54 UTC 2020

On 2/1/20 6:45 PM, Torbjörn Granlund wrote:
> The main problems are that the full integer product of two multiplied
> integer variables is not accessible even if the underlying hardware can
> provide the full product.  Most high-level languages only return the low
> half of such products.

A half-way decent optimizer, for 32 bit ints and 64 bit longs (or 64/128 
bit), should be able to optimize the following to use such an instruction:

int i, j

long k = ((long)i)*j;

Yes, technically the long * long that is requested might require doing 
the partial multiplies and adding their results, but since the peephole 
optimizer can see that the values are within smaller bound, so only the 
low order partial is really needed.

-- 
Richard Damon