divexact_1 and bdiv_q_1

bodrato at mail.dm.unipi.it bodrato at mail.dm.unipi.it
Fri Jan 21 20:23:18 CET 2011


Ciao,

I added support for mpn_bdiv_q_1 in tests/devel/try, but the returned
value is not tested.

> On Thu, January 20, 2011 2:09 pm, Niels Möller wrote:
>> I'd prefer that the most significant quotient limb is not stored in
>> memory, letting the caller do
>>
>>   qp[n-1] = mpn_bdiv_q_1(...)
>>
>> if desired.

If the divisor is odd and the caller does not need the most significant
quotient limb, it is possible to call mpn_bdiv_q_1 with size n-1 ...

Anyway, I rearranged some other x86/*/dive_1.asm to obtain the
corresponding bdiv_q_1.asm . My old laptop uses the code in
x86/p6/bdiv_q_1.asm, but I measure something strange:

$ tune/speed -s 2-9000 -p100000 -f2 -C mpn_bdiv_q_1.171
mpn_pi1_bdiv_q_1.171 mpn_divexact_1.171
overhead 6.01 cycles, precision 100000 units of 7.14e-10 secs, CPU freq
1400.00 MHz
        mpn_bdiv_q_1.171 mpn_pi1_bdiv_q_1.171 mpn_divexact_1.171
2             21.4806      #15.8675       24.9519
4             17.0482      #13.4631       18.6241
8             14.1526      #12.1004       14.7924
16            12.4156      #11.4608       12.9125
32            12.0303      #11.1521       12.0450
64            11.2851      #10.9555       11.3148
128           11.1476      #11.0156       11.3020
256           10.9572      #10.9535       11.0582
512          #10.8829       11.0549       11.0038
1024         #10.8668       11.3008       11.2293
2048         #10.9371       11.3928       11.2434
4096         #10.8800       11.3191       11.2822
8192         #10.7373       11.2311       10.9432

The two functions use exactly the same core loop: mpn_bdiv_q_1 perform
some initializations, then jumps in the code of mpn_pi1_bdiv_q_1... But,
for some strange reason, the first seems faster when the size is big...

Best regards,
Marco

-- 
http://bodrato.it/software/strassen.html



More information about the gmp-devel mailing list