tg at gmplib.org
Sat Feb 25 23:03:35 CET 2012
nisse at lysator.liu.se (Niels Möller) writes:
Here's a patch adding a new function mpn_mul_2c. Like mpn_mul_2, but
accepting an single-limb input carry.
I'd like to have it (and also mpn_addmul_2c) for generating diagonal
terms in sqr_basecase, but there may be other uses.
Are you rewriting x86_64 sqr_basecase with calls to mul_2? If that's
faster than the present code, then I think a version with these mul_2c
inlined will be even better.
Or is this experimental stuff? In that case, are there reasons to
expect an x86_64 mul_2c to be actually used? What for?
In the x86_64 assembly, I was tempted to move the initial
multiplication earlier, but when I tried I made mpn_mul_2 run a cycle
slower (problem is that n_param is in %rdx which collides with the
multiplication). Instead I had to duplicate the code for selecting the
loop entrypoint, and leave the old mul_2 code path unchanged.
That's life. I too have done that a few times.
Added support in devel/try.c, but there are no other testcases.
Looks OK, except if the x86_64 asm mul_2c will never be used, I think
that change is somewhat questionable, and could be kept local.
More information about the gmp-devel