mpz_jacobi
Niels Möller
nisse at lysator.liu.se
Thu Apr 22 11:44:24 CEST 2010
Torbjorn Granlund <tg at gmplib.org> writes:
> * div1 and div2 identical to those in hgcd2.c,
>
> These are typically inlined, aren't they? Then the only problem is
> source code duplication.
Right.
> * mpn_hgcd2_jacobi almost the same as mpn_hgcd2 (but for performance
> this makes some sense. Benchmarking of jacobi vs gcd in the Lehmer
> range would be of some interest).
>
> Cannot speed already do that for us?
It can. Before:
overhead 0.000000010 secs, precision 10000 units of 1.67e-09 secs, CPU freq 600.00 MHz
mpn_gcd mpz_jacobi
1 #0.000000743 1.0570
2 #0.000001509 3.0255
3 #0.000005574 1.4893
4 #0.000008193 1.5063
5 #0.000010814 1.5662
6 #0.000013368 1.5956
7 #0.000016032 1.6998
8 #0.000018702 1.6430
9 #0.000021455 1.7903
10 #0.000024659 1.7064
11 #0.000027480 1.8332
12 #0.000030912 1.7154
13 #0.000034158 1.8432
14 #0.000037302 1.7355
15 #0.000039882 1.8945
16 #0.000042906 1.8057
17 #0.000045804 1.8985
18 #0.000049174 1.7665
19 #0.000052844 1.9063
20 #0.000056709 1.7568
21 #0.000059450 1.9489
22 #0.000062424 1.8196
23 #0.000065798 1.9711
24 #0.000069897 1.8136
25 #0.000073083 2.0123
26 #0.000077093 1.8539
27 #0.000080217 2.0286
28 #0.000083045 1.9071
29 #0.000086297 2.0778
30 #0.000091596 1.8784
After:
overhead 0.000000010 secs, precision 10000 units of 1.67e-09 secs, CPU freq 600.00 MHz
mpn_gcd mpz_jacobi
1 #0.000000743 1.2433
2 #0.000001512 1.3652
3 #0.000005549 1.0678
4 #0.000008194 1.0657
5 #0.000010799 1.0695
6 #0.000013371 1.0557
7 #0.000016004 1.0543
8 #0.000018701 1.0539
9 #0.000021446 1.0475
10 #0.000024703 1.0419
11 #0.000027499 1.0584
12 #0.000030861 1.0490
13 #0.000034139 1.0377
14 #0.000037136 1.0399
15 #0.000039961 1.0417
16 #0.000042900 1.0509
17 #0.000045740 1.0469
18 #0.000049126 1.0469
19 #0.000052971 1.0331
20 #0.000056848 1.0257
21 #0.000059378 1.0400
22 #0.000062482 1.0476
23 #0.000065990 1.0470
24 #0.000069855 1.0388
25 #0.000073120 1.0455
26 #0.000076873 1.0381
27 #0.000080559 1.0373
28 #0.000083166 1.0501
29 #0.000086714 1.0561
30 #0.000091266 1.0447
It looks like the linear term is quite significant, and that Jacobi is
some 4% slower than gcd (if I run speed -C -d, I get the difference to
about 70 cycles per limb. All this running on my x86_32 laptop. The
direct additional cost is one o two table lookups per quotient, but
there may also be the indirect cost of higher register pressure in
hgcd2.
I haven't looked at the assembler code for mpn_hgcd2 and
mpn_hgcd2_jacobi (these functions are the main part of for the linear
term of the running time).
> * The basecase function mpn_jacobi_2 is longer and more complicated than
> I like. Maybe it can be simplified to eliminate some of the cases.
>
> Is the JACOBI_BASE_METHOD stuff still relevant?
There are two basecase functions:
jacbase.c:mpn_jacobi_base, which handles single limb numbers, and ses
JACOBI_BASE_METHOD to select between 4 different variants.
jacobi_lehmer.c:mpn_jacobi_2, which handles double limb numbers (and
calls mpn_jacobi_base once they are reduced to single limbs) Uses
JACOBI_2_METHOD to select between two different variants.
BTW, it would be nice if the thresholds web page showed
JACOBI_BASE_METHOD and other tuned *_METHOD. It would be even better if
it also showed the next best method (needs additional output from
tuneup).
Currently, I have no idea which values of JACOBI_BASE_METHOD are still
relevant.
> Oh, so the new code is never used from mpz, currently?
Correct.
> Are here any tests of the new code now, at the mpn level?
No. To test it, it must be enabled in the source code and then
tests/mpz/t-jac will test it.
/Niels
--
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.
More information about the gmp-devel
mailing list