More div_qr_2 assembler
Niels Möller
nisse at lysator.liu.se
Wed Mar 30 23:35:00 CEST 2011
I've hacked a mpn_div_qr_2u_pi1 (based on the previous mpn_div_qr_2n_pi1 which
handles the normalized case). Attached below.
Results on my core2 laptop:
$ ./speed -o cycles-broken -p 100000 -C -s3-10,50,200,500,1500 mpn_divrem_2 mpn_div_qr_2n mpn_div_qr_2u
clock_gettime is 1.000ns accurate
overhead 6.04 cycles, precision 100000 units of 1.00e-09 secs, CPU freq 1200.00 MHz
mpn_divrem_2 mpn_div_qr_2n mpn_div_qr_2u
3 #36.0929 41.9874 58.5886
4 #35.9422 39.5447 51.7249
5 #35.0578 38.5444 47.5463
6 #36.0060 37.4358 45.4954
7 #35.1112 36.6949 43.4555
8 #34.5703 36.2022 42.9322
9 #34.5919 36.1517 41.4659
10 #34.7529 37.0281 40.9467
50 34.5641 #33.4350 35.3366
200 34.3937 #33.0381 34.3683
500 34.2357 #32.8356 33.4413
1500 34.0592 #30.2301 30.9165
$ ./speed -o cycles-broken -p 100000 -c -s3 mpn_divrem_2 mpn_div_qr_2n mpn_div_qr_2u
clock_gettime is 1.000ns accurate
overhead 6.04 cycles, precision 100000 units of 1.00e-09 secs, CPU freq 1200.00 MHz
mpn_divrem_2 mpn_div_qr_2n mpn_div_qr_2u
3 #109.79 125.96 172.92
So for the unnormalized case, we get an additional constant overhead of
47 cycles (the main cost here is the more complicated handling of qh,
which may be almost a full limb), and then the normalization seems to
cost a cycle per quotient limb. Loopmixing both the normalized and
unnormalized loops might be worthwhile.
Current code uses shld (and shrd at the end). This means that the
normalized function could set the shift count to zero and jump to the
unnormalized loop after having determined qh, reducing code size a bit.
But since SHLD is slow on some processors, we'd need code without shld
(depending on the SHLD_SLOW configure variable), and then it gets harder
to handle a zero shift count.
As Torbjörn has observed, the current interface where qh is returned
separately leads to a bit of code duplication in the unnormalized case.
/nisse
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: div_qr_2u_pi1.asm
URL: <http://gmplib.org/list-archives/gmp-devel/attachments/20110330/c7f256a9/attachment.ksh>
-------------- next part --------------
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list