div_qr_1n_pi1
Marco Bodrato
bodrato at mail.dm.unipi.it
Thu Jun 3 20:29:04 UTC 2021
Ciao,
Il 2021-06-03 12:40 Torbjörn Granlund ha scritto:
> If we dare use cmov (and its presumed side-channel leakage) we could
> probably shorten the critical path by a cycle. The "sbb" and "and"
> would go away.
Using masks does not always give the fastest code. I tried the following
variation on Niels' code, and, on my laptop with "g++-10 -O2
-mtune=icelake-client -march=icelake-client", the resulting code is
comparable (faster?) with the current asm.
*************** mpn_div_qr_1n_pi1 (mp_ptr qp, mp_srcptr
*** 245,266 ****
* + | q0|
* -+---+---+---+
* | q2| q1| q0|
* +---+---+---+
*/
! umul_ppmm (p1, t, u1, dinv);
! add_ssaaaa (q2, q1, -u2, u2 & dinv, CNST_LIMB(0), u1);
! add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), p1);
! add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), q0);
! q0 = t;
umul_ppmm (p1, p0, u1, B2);
- ADDC_LIMB (cy, u0, u0, u2 & B2);
- u0 -= (-cy) & d;
/* Final q update */
! add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), cy);
qp[j+1] = q1;
MPN_INCR_U (qp+j+2, n-j-2, q2);
add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0);
}
--- 245,264 ----
* + | q0|
* -+---+---+---+
* | q2| q1| q0|
* +---+---+---+
*/
! ADDC_LIMB (q2, q1, q0, u1);
! umul_ppmm (t, q0, u1, dinv);
! ADDC_LIMB (cy, u0, u0, u2 ? B2 : 0);
! u0 -= cy ? d : 0;
! add_ssaaaa (q2, q1, q2, q1, -u2, u2 ? dinv : 0);
umul_ppmm (p1, p0, u1, B2);
/* Final q update */
! add_ssaaaa (q2, q1, q2, q1, CNST_LIMB(0), t + cy);
qp[j+1] = q1;
MPN_INCR_U (qp+j+2, n-j-2, q2);
add_mssaaaa (u2, u1, u0, u0, up[j], p1, p0);
}
$ build/tune/speed -p10000000 -s1-100 -f1.6 -C
mpn_div_qr_1n_pi1.9999999999999999999 ...
ASM-code C-code
1 2.1227 1 3.6125
2 3.1758 2 3.9425
3 3.4567 3 3.8861
4 3.4758 4 3.8606
6 3.7857 6 3.8764
9 3.9912 9 3.9676
14 4.0304 14 4.0531
22 4.3461 22 4.1798
35 4.4161 35 4.2080
56 4.4744 56 4.2833
89 4.4896 89 4.2950
> (I am a bit fixated with side-channel leakage; our present
> implementations of these particular functions are not side-channel
> silent.)
We should write a fast version, and then a sec_ one :-)
Ĝis,
m
More information about the gmp-devel
mailing list