Improved timing for chudnovsky.c to save to file
cino hilliard
hillcino368 at hotmail.com
Tue Nov 27 13:07:50 CET 2007
> From: gmp-discuss-request at swox.com
> Subject: gmp-discuss Digest, Vol 51, Issue 15
> To: gmp-discuss at swox.com
> Date: Sun, 25 Nov 2007 12:00:01 +0100
>
> Send gmp-discuss mailing list submissions to
> gmp-discuss at swox.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://gmplib.org/mailman/listinfo/gmp-discuss
> or, via email, send a message with subject or body 'help' to
> gmp-discuss-request at swox.com
>
> You can reach the person managing the list at
> gmp-discuss-owner at swox.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gmp-discuss digest..."
>
>
> Today's Topics:
>
> 1. Sample functions: mpf_out_raw, mpf_inp_raw (Jim White)
Thanks Jim for the functions. I was able to implement the mpf_out_raw and not
the mpf_inp_raw. I will continue to look at it.
In the meantime, the compression feature of mpz_out_raw motivated me to persue
an mpf to mpz conversion routine which I call the power method. What we do is multiply
the mpf number by 10^digits. This produces a big integer out of the mantissa. Then we
use the mpz_set_f to set the big integer part to to an mpz. Having that, we can take
advantage of the mpz_out_raw and mpz_inp_raw. This method timing is 8.7 times faster
than converting by strings. My guess is the speed of this power method is the result of
low level optimization of the power function performed in the compiler. This would make
sense to me at least, since all that has to be done is remove the 0. at the beginning and
the exponent at the end in the low level code.
The real plus here is that we have a compressed file.
Jim,
Maybe you can implement your functions into the code below and see how it compares
with the 6.6 sec I got for 10^7 places.
I present this in the chudpi2.c program below to demonstrate
this routine with some timings as follows for 10000000 digits.
chudpi2 output
..........................................................................................................
Time to compute = 75.734
String method time to convert mpf to mpz and save to zpi.bin = 57.234
Power method time to convert mpf to mpz and save to zpi2.bin = 6.594
.............................................................................................................
Chudnovski pi is no slouch. Here is the stats for Pifast and compress option. I was umable to
estimate the time to compress and save to file though.
Program : PiFast version 4.3 (fix 1), by Xavier Gourdon
Computation of 10000000 digits of Pi
Method used : Chudnovsky
Size of FFT : 1024 K
Physical memory used : ~ 61355 K
Disk memory used : ~ 0.00 Meg
------------------------------------------------------------
Computation run information :
Start : Tue Nov 27 05:14:00 2007
End : Tue Nov 27 05:14:57 2007
Duration : 56.73 seconds
============================================================
Total computation time : 56.73 seconds (~ 0.02 hours)
============================================================
CODE chudpi2.c Begins here
/* Pi computation using Chudnovsky's algortithm.
* Copyright 2002, 2005 Hanhong Xue (macroxue at yahoo dot com)
* Slightly modified 2005 by Torbjorn Granlund (tege at swox dot com) to allow
more than 2G digits to be computed.
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
* 1. Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright notice,
* this list of conditions and the following disclaimer in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR
* IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
* EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
* PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
* OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
* WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
* OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
* ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Some more modifications by Cino Hilliard to speed up saving output to a file.
*/
#include
#include
#include
#include
#include
#include
#include "gmp.h"
#define A 13591409
#define B 545140134
#define C 640320
#define D 12
#define BITS_PER_DIGIT 3.32192809488736234787
#define DIGITS_PER_ITER 14.1816474627254776555
#define DOUBLE_PREC 53
char *prog_name;
#if CHECK_MEMUSAGE
#undef CHECK_MEMUSAGE
#define CHECK_MEMUSAGE \
do { \
char buf[100]; \
snprintf (buf, 100, \
"ps aguxw | grep '[%c]%s'", prog_name[0], prog_name+1); \
system (buf); \
} while (0)
#else
#undef CHECK_MEMUSAGE
#define CHECK_MEMUSAGE
#endif
#define timer GetTickCount()/1000.0
////////////////////////////////////////////////////////////////////////////
// r = sqrt(x)
mpf_t t1, t2;
void
my_sqrt_ui(mpf_t r, unsigned long x)
{
unsigned long prec, bits, prec0;
prec0 = mpf_get_prec(r);
if (prec0<=DOUBLE_PREC) {
mpf_set_d(r, sqrt(x));
return;
}
bits = 0;
for (prec=prec0; prec>DOUBLE_PREC;) {
int bit = prec&1;
prec = (prec+bit)/2;
bits = bits*2+bit;
}
mpf_set_prec_raw(t1, DOUBLE_PREC);
mpf_set_d(t1, 1/sqrt(x));
while (prec full
mpf_mul_ui(t2, t2, x);
mpf_ui_sub(t2, 1, t2);
mpf_set_prec_raw(t2, prec/2);
mpf_div_2exp(t2, t2, 1);
mpf_mul(t2, t2, t1); // half x half -> half
mpf_set_prec_raw(t1, prec);
mpf_add(t1, t1, t2);
} else {
prec = prec0;
/* t2=x*t1, t1 = t2+t1*(x-t2*t2)/2; */
mpf_set_prec_raw(t2, prec/2);
mpf_mul_ui(t2, t1, x);
mpf_mul(r, t2, t2); // half x half -> full
mpf_ui_sub(r, x, r);
mpf_mul(t1, t1, r); // half x half -> half
mpf_div_2exp(t1, t1, 1);
mpf_add(r, t1, t2);
break;
}
prec -= (bits&1);
bits /=2;
}
}
// r = y/x WARNING: r cannot be the same as y.
void
my_div(mpf_t r, mpf_t y, mpf_t x)
{
unsigned long prec, bits, prec0;
prec0 = mpf_get_prec(r);
if (prec0<=DOUBLE_PREC) {
mpf_set_d(r, mpf_get_d(y)/mpf_get_d(x));
return;
}
bits = 0;
for (prec=prec0; prec>DOUBLE_PREC;) {
int bit = prec&1;
prec = (prec+bit)/2;
bits = bits*2+bit;
}
mpf_set_prec_raw(t1, DOUBLE_PREC);
mpf_ui_div(t1, 1, x);
while (prec full
mpf_ui_sub(t2, 1, t2);
mpf_set_prec_raw(t2, prec/2);
mpf_mul(t2, t2, t1); // half x half -> half
mpf_set_prec_raw(t1, prec);
mpf_add(t1, t1, t2);
} else {
prec = prec0;
/* t2=y*t1, t1 = t2+t1*(y-x*t2); */
mpf_set_prec_raw(t2, prec/2);
mpf_mul(t2, t1, y); // half x half -> half
mpf_mul(r, x, t2); // full x half -> full
mpf_sub(r, y, r);
mpf_mul(t1, t1, r); // half x half -> half
mpf_add(r, t1, t2);
break;
}
prec -= (bits&1);
bits /=2;
}
}
//////////////////////////////////////////////////////////////////////
//////
#define min(x,y) ((x)(y)?(x):(y))
typedef struct {
unsigned long max_facs;
unsigned long num_facs;
unsigned long *fac;
unsigned long *pow;
} fac_t[1];
typedef struct {
long int fac;
long int pow;
long int nxt;
} sieve_t;
sieve_t *sieve;
long int sieve_size;
fac_t ftmp, fmul;
#define INIT_FACS 32
void
fac_show(fac_t f)
{
long int i;
for (i=0; i< s) {
fac_clear(f);
fac_init_size(f, s);
}
}
// f = base^pow
inline void
fac_set_bp(fac_t f, unsigned long base, long int pow)
{
long int i;
assert(base0; i++, base = sieve[base].nxt) {
f[0].fac[i] = sieve[base].fac;
f[0].pow[i] = sieve[base].pow*pow;
}
f[0].num_facs = i;
assert(i<=f[0].max_facs);
}
// r = f*g
inline void
fac_mul2(fac_t r, fac_t f, fac_t g)
{
long int i, j, k;
for (i=j=k=0; i< g[0].fac[j]) {
r[0].fac[k] = f[0].fac[i];
r[0].pow[k] = f[0].pow[i];
i++;
} else {
r[0].fac[k] = g[0].fac[j];
r[0].pow[k] = g[0].pow[j];
j++;
}
}
for (; i0) {
if (jnum_facs, fg->num_facs));
for (i=j=k=0; inum_facs && jnum_facs; ) {
if (fp->fac[i] == fg->fac[j]) {
c = min(fp->pow[i], fg->pow[j]);
fp->pow[i] -= c;
fg->pow[j] -= c;
fmul->fac[k] = fp->fac[i];
fmul->pow[k] = c;
i++; j++; k++;
} else if (fp->fac[i] < fg->fac[j]) {
i++;
} else {
j++;
}
}
fmul->num_facs = k;
assert(k <= fmul->max_facs);
if (fmul->num_facs) {
bs_mul(gcd, 0, fmul->num_facs);
mpz_tdiv_q(p, p, gcd);
mpz_tdiv_q(g, g, gcd);
fac_compact(fp);
fac_compact(fg);
}
}
////////////////////////////////////////////////////////////////////////////
int out=0;
mpz_t *pstack, *qstack, *gstack;
fac_t *fpstack, *fgstack;
long int top = 0;
double progress=0, percent;
#define p1 (pstack[top])
#define q1 (qstack[top])
#define g1 (gstack[top])
#define fp1 (fpstack[top])
#define fg1 (fgstack[top])
#define p2 (pstack[top+1])
#define q2 (qstack[top+1])
#define g2 (gstack[top+1])
#define fp2 (fpstack[top+1])
#define fg2 (fgstack[top+1])
clock_t gcd_time = 0;
// binary splitting
void
bs(unsigned long a, unsigned long b, unsigned gflag, long int level)
{
unsigned long i, mid;
int ccc;
if (b-a==1) {
/*
g(b-1,b) = (6b-5)(2b-1)(6b-1)
p(b-1,b) = b^3 * C^3 / 24
q(b-1,b) = (-1)^b*g(b-1,b)*(A+Bb).
*/
mpz_set_ui(p1, b);
mpz_mul_ui(p1, p1, b);
mpz_mul_ui(p1, p1, b);
mpz_mul_ui(p1, p1, (C/24)*(C/24));
mpz_mul_ui(p1, p1, C*24);
mpz_set_ui(g1, 2*b-1);
mpz_mul_ui(g1, g1, 6*b-1);
mpz_mul_ui(g1, g1, 6*b-5);
mpz_set_ui(q1, b);
mpz_mul_ui(q1, q1, B);
mpz_add_ui(q1, q1, A);
mpz_mul (q1, q1, g1);
if (b%2)
mpz_neg(q1, q1);
i=b;
while ((i&1)==0) i>>=1;
fac_set_bp(fp1, i, 3); // b^3
fac_mul_bp(fp1, 3*5*23*29, 3);
fp1[0].pow[0]--;
fac_set_bp(fg1, 2*b-1, 1); // 2b-1
fac_mul_bp(fg1, 6*b-1, 1); // 6b-1
fac_mul_bp(fg1, 6*b-5, 1); // 6b-5
if (b>(int)(progress)) {
printf("."); fflush(stdout);
progress += percent*2;
}
} else {
/*
p(a,b) = p(a,m) * p(m,b)
g(a,b) = g(a,m) * g(m,b)
q(a,b) = q(a,m) * p(m,b) + q(m,b) * g(a,m)
*/
mid = a+(b-a)*0.5224; // tuning parameter
bs(a, mid, 1, level+1);
top++;
bs(mid, b, gflag, level+1);
top--;
if (level == 0)
puts ("");
ccc = level == 0;
if (ccc) CHECK_MEMUSAGE;
if (level>=4) { // tuning parameter
clock_t t = clock();
fac_remove_gcd(p2, fp2, g1, fg1);
gcd_time += clock()-t;
}
if (ccc) CHECK_MEMUSAGE;
mpz_mul(p1, p1, p2);
if (ccc) CHECK_MEMUSAGE;
mpz_mul(q1, q1, p2);
if (ccc) CHECK_MEMUSAGE;
mpz_mul(q2, q2, g1);
if (ccc) CHECK_MEMUSAGE;
mpz_add(q1, q1, q2);
if (ccc) CHECK_MEMUSAGE;
fac_mul(fp1, fp2);
if (gflag) {
mpz_mul(g1, g1, g2);
fac_mul(fg1, fg2);
}
}
if (out&2) {
printf("p(%ld,%ld)=",a,b); fac_show(fp1);
if (gflag)
printf("g(%ld,%ld)=",a,b); fac_show(fg1);
}
}
void
build_sieve(long int n, sieve_t *s)
{
long int m, i, j, k;
sieve_size = n;
m = (long int)sqrt(n);
memset(s, 0, sizeof(sieve_t)*n/2);
s[1/2].fac = 1;
s[1/2].pow = 1;
for (i=3; i<=n; i+=2) {
if (s[i/2].fac == 0) {
s[i/2].fac = i;
s[i/2].pow = 1;
if (i<=m) {
for (j=i*i, k=i/2; j<=n; j+=i+i, k++) {
if (s[j/2].fac==0) {
s[j/2].fac = i;
if (s[k].fac == i) {
s[j/2].pow = s[k].pow + 1;
s[j/2].nxt = s[k].nxt;
} else {
s[j/2].pow = 1;
s[j/2].nxt = k;
}
}
}
}
}
}
}
int main(int argc, char *argv[])
{
mpf_t pi,qi;
mpz_t piz,piz2;
FILE *fptr1,*fptr2;
mp_exp_t expptr;
char *pi_str;
pi_str = calloc(200000000,1);
char fil[24],fil2[24];
strcpy(fil,"zpi.bin");
fptr1=fopen(fil,"wb+");
strcpy(fil2,"zpi2.bin");
fptr2=fopen(fil2,"wb+");
double t,ttofile;
long int d=100,i, depth=1, terms,value,j;
unsigned long psize, qsize;
clock_t begin, mid0, mid1, mid2, mid3, mid4, end;
mpz_init(piz);
mpz_init(piz2);
prog_name = argv[0];
if (argc>1)
d = strtoul(argv[1], 0, 0);
if (argc>2)
out = atoi(argv[2]);
terms = d/DIGITS_PER_ITER;
while ((1L<<depth)
More information about the gmp-discuss
mailing list