div_qr_1 interface

Niels Möller nisse at lysator.liu.se
Sun Oct 20 23:02:56 CEST 2013


nisse at lysator.liu.se (Niels Möller) writes:

> I'll try to get the x86_64 assembly for mpn_div_qr_1n_pi1 in soon.

Pushed first working version now, see
http://gmplib.org:8000/gmp/file/tip/mpn/x86_64/div_qr_1n_pi1.asm

On my core2 laptop:

$ ./speed -s 2-10,100,500 -C mpn_divrem_1.0x9999999999999999 mpn_div_qr_1.0x9999999999999999
overhead 6.13 cycles, precision 10000 units of 8.33e-10 secs, CPU freq 1200.00 MHz
        mpn_divrem_1.0x9999999999999999 mpn_div_qr_1.0x9999999999999999
2             60.6420      #39.9427
3            #40.9839       55.0469
4            #43.7667       44.4534
5             44.6333      #38.9055
6             39.6259      #34.4167
7             34.0063      #32.4018
8             30.1364      #28.5745
9             29.6472      #27.4599
10            29.1270      #26.7300
100           24.7920      #20.6700
500           24.4400      #19.7600

So here it's a clear win, except an ugly regression for n = 3.

On shell, the same command gives:

2            #37.4379       51.1157
3            #30.0256       61.0904
4            #25.8058       27.0781
5            #23.2717       24.2831
6            #21.7520       22.4346
7            #20.5219       21.1111
8            #19.4783       20.1101
9            #18.7726       19.3369
10           #18.3271       18.7228
100          #13.8063       13.8175
500          #13.2670       13.2750

So here the new code is epsilon slower for the larger sizes. Maybe the
loopmixer can help.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.


More information about the gmp-devel mailing list