Performance of addlsh_n and sublsh_n

Thu Feb 3 14:04:06 CET 2011

Torbjorn Granlund <tg at gmplib.org> writes:

> I think you might be mistaken about that you will not get carry between
> iterations.

You're right about the assembler. I was looking at the C implementation
of addmul_1, which as far as I see uses a single limb (cl) as the only
recurrency variable. The iteration adds one product and two limbs, with
maximum value

  (B-1)^2 + 2 (B-1) = B^2-1

This fits in two words (with no margin), and the high limb is the
recurrency variable. It's the same property that makes it useful to have
a single-limb return value from addmul_1.

I guess the assembler code (including my sketch) may be organized
differently, to avoid an adc $0, ..... But that costs a second
recurrency variable for the carry.

Which is kind-of obvious: Whenever you have adc $0 in this type of loop,
you can most likely delay the add of that carry, and instead and
replace an add by an adc in the next iteration. Assuming you can arrange
so that the carry isn't destroyed before use.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.