Alternative div_qr_1
Niels Möller
nisse at lysator.liu.se
Sat Jun 19 16:08:31 CEST 2010
nisse at lysator.liu.se (Niels Möller) writes:
> seems to be compiled to a branch rather than a cmov by gcc-4.3.2. Maybe
> gcc-4.4.4 or gcc-4.5.0 is smarter.
Now I've tested it on k7, using gcc-4.4.4.
Old C code:
$ ./speed -c -s 2-10 mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
overhead 6.06 cycles, precision 10000 units of 7.15e-10 secs, CPU freq 1398.94 MHz
mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
2 158.30 #147.70
3 168.33 #161.66
4 185.58 #174.25
5 197.98 #186.25
6 210.61 #198.02
7 228.48 #210.13
8 239.98 #222.27
9 252.36 #234.64
10 264.12 #246.82
$ ./speed -C -s 1500 mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
overhead 6.07 cycles, precision 10000 units of 7.15e-10 secs, CPU freq 1398.94 MHz
mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
1500 #12.1487 12.1553
New C code:
$ ./speed -c -s 2-10 mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
overhead 6.05 cycles, precision 10000 units of 7.15e-10 secs, CPU freq 1398.94 MHz
mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
2 #113.06 131.28
3 #124.21 143.37
4 #134.32 153.46
5 #148.41 163.58
6 #156.46 176.68
7 #165.58 185.76
8 #177.02 212.00
9 #198.96 219.04
10 #211.00 228.17
$ ./speed -C -s 1500 mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
overhead 6.05 cycles, precision 10000 units of 7.15e-10 secs, CPU freq 1398.94 MHz
mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
1500 10.3747 #9.5360
I have no explanation for the difference in cycles per limb for normalized vs
unnormalized divisor.
For comparison, current K7 assembler code:
$ ./speed -c -s 2-10 mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
overhead 6.06 cycles, precision 10000 units of 7.15e-10 secs, CPU freq 1398.94 MHz
mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
2 170.84 #136.43
3 179.93 #145.55
4 189.04 #153.64
5 196.11 #161.71
6 205.40 #169.89
7 213.45 #178.16
8 221.65 #186.05
9 229.71 #194.16
10 243.76 #197.49
$ ./speed -C -s 1500 mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
overhead 6.06 cycles, precision 10000 units of 7.15e-10 secs, CPU freq 1398.94 MHz
mpn_mod_1_1.0xabcdef01 mpn_mod_1_1.0xabcdef1
1500 8.1747 #8.1527
I don't understand why the normalized case seem to have more expensive
precomputation (but I haven't looked at the code).
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list