divexact_1 and bdiv_q_1
bodrato at mail.dm.unipi.it
bodrato at mail.dm.unipi.it
Fri Jan 21 20:23:18 CET 2011
Ciao,
I added support for mpn_bdiv_q_1 in tests/devel/try, but the returned
value is not tested.
> On Thu, January 20, 2011 2:09 pm, Niels Möller wrote:
>> I'd prefer that the most significant quotient limb is not stored in
>> memory, letting the caller do
>>
>> qp[n-1] = mpn_bdiv_q_1(...)
>>
>> if desired.
If the divisor is odd and the caller does not need the most significant
quotient limb, it is possible to call mpn_bdiv_q_1 with size n-1 ...
Anyway, I rearranged some other x86/*/dive_1.asm to obtain the
corresponding bdiv_q_1.asm . My old laptop uses the code in
x86/p6/bdiv_q_1.asm, but I measure something strange:
$ tune/speed -s 2-9000 -p100000 -f2 -C mpn_bdiv_q_1.171
mpn_pi1_bdiv_q_1.171 mpn_divexact_1.171
overhead 6.01 cycles, precision 100000 units of 7.14e-10 secs, CPU freq
1400.00 MHz
mpn_bdiv_q_1.171 mpn_pi1_bdiv_q_1.171 mpn_divexact_1.171
2 21.4806 #15.8675 24.9519
4 17.0482 #13.4631 18.6241
8 14.1526 #12.1004 14.7924
16 12.4156 #11.4608 12.9125
32 12.0303 #11.1521 12.0450
64 11.2851 #10.9555 11.3148
128 11.1476 #11.0156 11.3020
256 10.9572 #10.9535 11.0582
512 #10.8829 11.0549 11.0038
1024 #10.8668 11.3008 11.2293
2048 #10.9371 11.3928 11.2434
4096 #10.8800 11.3191 11.2822
8192 #10.7373 11.2311 10.9432
The two functions use exactly the same core loop: mpn_bdiv_q_1 perform
some initializations, then jumps in the code of mpn_pi1_bdiv_q_1... But,
for some strange reason, the first seems faster when the size is big...
Best regards,
Marco
--
http://bodrato.it/software/strassen.html
More information about the gmp-devel
mailing list