div_qr_1 interface
Niels Möller
nisse at lysator.liu.se
Sun Oct 20 23:02:56 CEST 2013
nisse at lysator.liu.se (Niels Möller) writes:
> I'll try to get the x86_64 assembly for mpn_div_qr_1n_pi1 in soon.
Pushed first working version now, see
http://gmplib.org:8000/gmp/file/tip/mpn/x86_64/div_qr_1n_pi1.asm
On my core2 laptop:
$ ./speed -s 2-10,100,500 -C mpn_divrem_1.0x9999999999999999 mpn_div_qr_1.0x9999999999999999
overhead 6.13 cycles, precision 10000 units of 8.33e-10 secs, CPU freq 1200.00 MHz
mpn_divrem_1.0x9999999999999999 mpn_div_qr_1.0x9999999999999999
2 60.6420 #39.9427
3 #40.9839 55.0469
4 #43.7667 44.4534
5 44.6333 #38.9055
6 39.6259 #34.4167
7 34.0063 #32.4018
8 30.1364 #28.5745
9 29.6472 #27.4599
10 29.1270 #26.7300
100 24.7920 #20.6700
500 24.4400 #19.7600
So here it's a clear win, except an ugly regression for n = 3.
On shell, the same command gives:
2 #37.4379 51.1157
3 #30.0256 61.0904
4 #25.8058 27.0781
5 #23.2717 24.2831
6 #21.7520 22.4346
7 #20.5219 21.1111
8 #19.4783 20.1101
9 #18.7726 19.3369
10 #18.3271 18.7228
100 #13.8063 13.8175
500 #13.2670 13.2750
So here the new code is epsilon slower for the larger sizes. Maybe the
loopmixer can help.
Regards,
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list