From nisse at lysator.liu.se Wed Mar 10 17:21:37 2010 From: nisse at lysator.liu.se (Niels =?iso-8859-1?Q?M=F6ller?=) Date: Wed, 10 Mar 2010 17:21:37 +0100 Subject: mpn_jacobi_base Message-ID: For fun, I've adapted the branch-free binary gcd loop from mpn_gcd_1 to mpn_jacobi_base. Seems to work, disabled by default, and pushed into the repository. Included below. Depends on a reasonably fast count_trailing_zeros. One could try replacing that with a loop or with a table lookup. Although the corresponding table-based code in gcd_1.c didn't seem so promising, but maybe it uses a too small table. Considerations should be the same for gcd and jacobi. On shell: $ ./speed -c -s10-60 -t10 mpn_jacobi_base_1 mpn_jacobi_base_2 mpn_jacobi_base_3 mpn_jacobi_base_4 overhead 6.06 cycles, precision 10000 units of 3.74e-10 secs, CPU freq 2672.99 MHz mpn_jacobi_base_1 mpn_jacobi_base_2 mpn_jacobi_base_3 mpn_jacobi_base_4 10 #104.73 143.50 146.80 112.92 20 240.51 383.43 370.81 #217.87 30 364.91 610.03 591.08 #315.81 40 492.92 844.08 810.62 #406.76 50 624.25 1081.43 1037.55 #500.09 60 756.88 1316.02 1259.01 #593.46 Regards, /Niels /* Computes (a/b) for odd b and any a. The initial bit is taken as a * parameter. We have no need for the convention that the sign is in * bit 1, internally we use bit 0. */ /* FIXME: Could try table-based count_trailing_zeros. */int mpn_jacobi_base (mp_limb_t a, mp_limb_t b, int bit) { int c; ASSERT (b & 1); if (a == 0) /* Ok, here we use that the sign is bit 1, after all. */ return b == 1 ? (1-(bit & 2)) : 0; bit >>= 1; /* Below, we represent a and b shifted right so that the least significant one bit is implicit. */ b >>= 1; count_trailing_zeros (c, a); bit ^= c & (b ^ (b >> 1)); /* We may have c==GMP_LIMB_BITS-1, so we can't use a>>c+1. */ a >>= c; a >>= 1; while (a != b) { mp_limb_t t = a - b; mp_limb_t bgta = LIMB_HIGHBIT_TO_MASK (t); /* If b > a, invoke reciprocity */ bit ^= (bgta & a & b); /* b <-- min (a, b) */ b += (bgta & t); /* a <-- |a - b| */ a = (t ^ bgta) - bgta; /* Number of trailing zeros is the same no matter if we look at * t or a, but using t gives more parallelism. */ count_trailing_zeros (c, t); c ++; /* (2/b) = -1 if b = 3 or 5 mod 8 */ bit ^= c & (b ^ (b >> 1)); a >>= c; } return a == 0 ? 1-2*(bit & 1) : 0; } -- Niels M?ller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. From zanoni at volterra.uniroma2.it Mon Mar 15 17:06:43 2010 From: zanoni at volterra.uniroma2.it (Alberto Zanoni) Date: Mon, 15 Mar 2010 17:06:43 +0100 Subject: New algorithm for cube (third power) computation Message-ID: <201003151706.43476.zanoni@volterra.uniroma2.it> Hi all, as anticipated in http://gmplib.org/list-archives/gmp-devel/2010-January/001442.html a new algorithm for the computation of the cube of a long integer has recently been discovered, based on a splitting in two a' la Karatsuba and on an ad hoc unbalanced Toom-3 method. The description and some details are contained in a preprint you can find in http://bodrato.it/papers/zanoni.html#CIVV2010 The file: http://bodrato.it/papers/zanoni/AnotherSugarCube.pdf and (improvable) GMP code. http://bodrato.it/papers/zanoni/cube.c It seems that in some cases the new algorithm can be effective (see the graphics in the preprint), but still some work must be done and see what happens if the new cube algorithm is "mixed" with binary algorithm for generic power exponentiation. Whatever comment, observation, suggestion is welcome. -- Alberto Zanoni Centro Interdipartimentale "Vito Volterra" Universita' degli Studi di Roma "Tor Vergata" Via Columbia 2 00133 Roma, Italia From bodrato at mail.dm.unipi.it Mon Mar 15 20:20:12 2010 From: bodrato at mail.dm.unipi.it (bodrato at mail.dm.unipi.it) Date: Mon, 15 Mar 2010 20:20:12 +0100 (CET) Subject: New algorithm for cube (third power) computation In-Reply-To: <201003151706.43476.zanoni@volterra.uniroma2.it> References: <201003151706.43476.zanoni@volterra.uniroma2.it> Message-ID: <57700.151.21.94.50.1268680812.squirrel@mail.dm.unipi.it> Ciao! > The file: > > http://bodrato.it/papers/zanoni/AnotherSugarCube.pdf > > and (improvable) GMP code. > > http://bodrato.it/papers/zanoni/cube.c Sorry Alberto, both links was broken, because I forgot to correctly set permissions. Now they should correctly work. Regards, Marco -- http://bodrato.it/ From arndt at jjj.de Wed Mar 17 11:26:51 2010 From: arndt at jjj.de (Joerg Arndt) Date: Wed, 17 Mar 2010 11:26:51 +0100 Subject: New algorithm for cube (third power) computation In-Reply-To: <201003151706.43476.zanoni@volterra.uniroma2.it> References: <201003151706.43476.zanoni@volterra.uniroma2.it> Message-ID: <20100317102651.GA11733@jjj.de> To Alberto: I answered your private mail and got this: --------------------------- This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: zanoni at volterra.uniroma2.it SMTP error from remote mail server after end of data: host volterra.uniroma2.it [160.80.46.10]: 550 Your message was not delivered for policy reasons. --------------------------- You IT department needs some serious spanking ;-) cheers, jj From stphanef3724 at gmail.com Fri Mar 19 23:39:56 2010 From: stphanef3724 at gmail.com (=?ISO-8859-1?Q?St=E9phane_Fillion?=) Date: Fri, 19 Mar 2010 18:39:56 -0400 Subject: Factorial project Message-ID: Is someone working on decomposition of factorial into power of prime? The latest copyright in fac_ui.c is 2003.I may be interested in working on it. Maybe only for factor of 3 at first. From bodrato at mail.dm.unipi.it Sat Mar 20 09:59:07 2010 From: bodrato at mail.dm.unipi.it (bodrato at mail.dm.unipi.it) Date: Sat, 20 Mar 2010 09:59:07 +0100 (CET) Subject: Factorial project In-Reply-To: References: Message-ID: <38297.151.21.94.50.1269075547.squirrel@mail.dm.unipi.it> Dear St?phane, > Is someone working on decomposition of factorial into power of prime? The > latest copyright in fac_ui.c is 2003.I may be interested in working on it. Yes and no... I'm (very slowly) working on the factorial and on binomial, but I'm not implementing the decomposition into prime powers, I'm following Peter Luschny's suggestions (swing factorial and the like). My code is quite messy now, that's why I did not publish it yet... Because you ask, I'll try to clean it up (somehow) and publish a preliminary version as soon as possible. So that you can help me in further development if you want to. Best regards, Marco -- http://bodrato.it/ From bodrato at mail.dm.unipi.it Wed Mar 24 16:16:20 2010 From: bodrato at mail.dm.unipi.it (bodrato at mail.dm.unipi.it) Date: Wed, 24 Mar 2010 16:16:20 +0100 (CET) Subject: Factorial project In-Reply-To: References: Message-ID: <35773.151.21.94.50.1269443780.squirrel@mail.dm.unipi.it> Dear St?phane, > Is someone working on decomposition of factorial into power of prime? The > latest copyright in fac_ui.c is 2003.I may be interested in working on it. > Maybe only for factor of 3 at first. Sorry for the delay. At last I was able to (somehow) clean the code, extract some graphs and publish it. You can find my new implementation of fac_ui.c on my web pages at: http://bodrato.it/software/combinatorics.html#fac_ui For the factorial I implemented three different algorithms, two are naive and used for smaller sizes, the third one is based on the "Divide, Swing, and Conquer" technique suggested by Peter Luschny. You can simply plug my implementation in mpz/fac_ui.c, recompile, and see what happen :-) To really use the new code, the two thresholds (FAC_DSC_THRESHOLD, and FAC_ODD_THRESHOLD) should be tuned. The code is almost ready for the library... But I'll be happy if anyone want to read/comment/modify it, or simply suggest. Meanwhile re-implemented also the binomial the code is in the SAME FILE with factorial. They are not "mixed", but they share a lot of code. There are sections (marked by comments) in the file, and it will be necessary to split them in single files... By the way, also the binomial code is asymptotically fast, for couples like (n,n/3). It is NOT fast for couples (n,k) with a small k. There are many algorithms ready in the file, but unused. Two implementations for the binomial are based on a Divide&Conquer technique, following the observation binomial(n,a+b)=binomial(n,a)*binomial(n-a,b)/binomial(a+b,a) . Both implementations are _VERY_ naive, but a range is shown where they are the fastest algorithm among the implemented ones. Further effort should be spent in a general function using D&C. All informations about the new binomial implementation can be found here: http://bodrato.it/software/combinatorics.html#bin_uiui To experiment also the new binomial code, you can replace the file mpz/bin_uiui.c with the two lines: #define OPERATION_bin_uiui #include "fac_ui.c" Many informations are already available on the web page, I simply show timings measured on my laptop, before and after insertion of the new code. CURRENT IMPLEMENTATION | MY IMPLEMENTATION mpz_fac_ui mpz_bin_uiui | mpz_fac_ui mpz_bin_uiui 1 0.000000015 #0.000000014 | #0.000000010 0.000000016 3 #0.000000015 0.000000077 | #0.000000010 0.000000016 9 #0.000000016 0.000000096 | #0.000000010 0.000000258 27 0.000000297 #0.000000230 | #0.000000194 0.000000836 81 0.000001838 #0.000000966 | #0.000000826 0.000001691 243 0.000007195 #0.000004460 | 0.000005391 #0.000004627 729 0.000047999 #0.000030608 | 0.000040048 #0.000013118 2187 0.000333120 #0.000229881 | 0.000263925 #0.000042094 6561 #0.002040726 0.002549343 | 0.001588065 #0.000151584 19683 #0.011657843 0.022333856 | 0.008881944 #0.000609732 59049 #0.080858608 0.198782997 | 0.044500135 #0.002539522 177147 #0.288538411 3.637505847 | 0.197642792 #0.011212615 531441 #1.381138015 33.224500000 | 0.961643126 #0.049363970 As you can see, bin_uiui was very slow for large operands, slower than factorial! My implementation is slower for small operands, because it is NOT tuned and does NOT switch between the many implementations. Any comment will be very appreciated! Regards, Marco -- http://bodrato.it/