mpn_mul_2c

Sat Feb 25 23:03:35 CET 2012

nisse at lysator.liu.se (Niels Möller) writes:

  Here's a patch adding a new function mpn_mul_2c. Like mpn_mul_2, but
  accepting an single-limb input carry.

  I'd like to have it (and also mpn_addmul_2c) for generating diagonal
  terms in sqr_basecase, but there may be other uses.

Are you rewriting x86_64 sqr_basecase with calls to mul_2?  If that's
faster than the present code, then I think a version with these mul_2c
inlined will be even better.

Or is this experimental stuff?  In that case, are there reasons to
expect an x86_64 mul_2c to be actually used?  What for?

  In the x86_64 assembly, I was tempted to move the initial
  multiplication earlier, but when I tried I made mpn_mul_2 run a cycle
  slower (problem is that n_param is in %rdx which collides with the
  multiplication). Instead I had to duplicate the code for selecting the
  loop entrypoint, and leave the old mul_2 code path unchanged.

That's life.  I too have done that a few times.

  Added support in devel/try.c, but there are no other testcases.
  Comments appreciated.

Looks OK, except if the x86_64 asm mul_2c will never be used, I think
that change is somewhat questionable, and could be kept local.

-- 
Torbjörn