div_qr_1 interface
Niels Möller
nisse at lysator.liu.se
Mon Oct 21 15:33:34 CEST 2013
Torbjorn Granlund <tg at gmplib.org> writes:
> I looked at the logic following this:
>
> sbb U2, U2 C 7 13
>
> You negate the U2 copy in Q2. It seems that three adc by sbb
> could avoid the neg.
The problem is the final use, where Q2 is added, with carry, to a
different register. It's tempting to replace
adc Q1I, Q2
with
sbb Q2, Q1I
and negated Q2, but I'm afraid that will get the sense of the carry
wrong. Do you see any trick to get that right without negating Q2
somewhere along the way?
> I might also be possible to replace the early loop "and" stuff by
> cmov.
Maybe, but the simple way to do conditional addition with lea + cmov
won't to, since we also need carry out.
Does it matter if we do
mov B2, r
and mask, r
or
mov $0, r
cmovc B2, r
?
> To optimise register usage, I sometimes annotate the code with live
> ranges for each register. That will help with register coalescing.
There are lots of possibilities, since the computations for Q and U are
mostly independent. The data flow is something like
load U limb
|
_V_
U2, U1I, U0 -> |___| -> U2, U1O, U0
\ | ______/ cy
_V__V___V_
Q1I, Q0-> |__________| -> Q1O, Q0
\
V
store Q limb
> (T is rather shot-lived, perhaps its register could serve two usages?)
It could perhaps eliminated.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list