div_qr_1 interface

Mon Oct 21 15:33:34 CEST 2013

Torbjorn Granlund <tg at gmplib.org> writes:

> I looked at the logic following this:
>
>         sbb     U2, U2          C 7 13
>
> You negate the U2 copy in Q2.  It seems that three adc by sbb
> could avoid the neg.

The problem is the final use, where Q2 is added, with carry, to a
different register. It's tempting to replace

	adc	Q1I, Q2

with

	sbb	Q2, Q1I

and negated Q2, but I'm afraid that will get the sense of the carry
wrong. Do you see any trick to get that right without negating Q2
somewhere along the way?

> I might also be possible to replace the early loop "and" stuff by
> cmov.

Maybe, but the simple way to do conditional addition with lea + cmov
won't to, since we also need carry out.

Does it matter if we do

	mov	B2, r
        and	mask, r

or

	mov	$0, r
        cmovc	B2, r

?

> To optimise register usage, I sometimes annotate the code with live
> ranges for each register.  That will help with register coalescing.

There are lots of possibilities, since the computations for Q and U are
mostly independent. The data flow is something like

                      load U limb
                           |
                          _V_
          U2, U1I, U0 -> |___| -> U2, U1O, U0 
           \   |    ______/ cy
           _V__V___V_
Q1I, Q0-> |__________|  -> Q1O, Q0
                    \
                     V
               store Q limb

> (T is rather shot-lived, perhaps its register could serve two usages?)

It could perhaps eliminated.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.