Improved timing for chudnovsky.c to save to file

Tue Nov 27 13:07:50 CET 2007

> From: gmp-discuss-request at swox.com
> Subject: gmp-discuss Digest, Vol 51, Issue 15
> To: gmp-discuss at swox.com
> Date: Sun, 25 Nov 2007 12:00:01 +0100
>
> Send gmp-discuss mailing list submissions to
> gmp-discuss at swox.com
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://gmplib.org/mailman/listinfo/gmp-discuss
> or, via email, send a message with subject or body 'help' to
> gmp-discuss-request at swox.com
>
> You can reach the person managing the list at
> gmp-discuss-owner at swox.com
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of gmp-discuss digest..."
>
>
> Today's Topics:
>
> 1. Sample functions: mpf_out_raw, mpf_inp_raw (Jim White)

Thanks Jim for the functions. I was able to implement the mpf_out_raw and not 
the mpf_inp_raw. I will continue to look at it. 

In the meantime, the compression feature of mpz_out_raw motivated me to persue
an mpf to mpz conversion routine which I call the power method. What we do is multiply 
the mpf number by 10^digits. This produces a big integer out of the mantissa. Then we 
use the mpz_set_f  to set the big integer part to to an mpz. Having that, we can take 
advantage of the mpz_out_raw and mpz_inp_raw. This method timing is 8.7 times faster 
than converting by strings. My guess is the speed of this power method is the result of
low level optimization of the power function performed in the compiler. This would make
sense to me at least, since all that has to be done is remove the 0. at the beginning and 
the exponent at the end in the low level code.

The real plus here is that we have a compressed file.

Jim,
Maybe you can implement your functions into  the code below and see how it compares 
with the 6.6 sec I got for 10^7 places. 

I present this in the chudpi2.c program below to demonstrate
this routine with some timings as follows for 10000000 digits.

chudpi2 output
..........................................................................................................
Time to compute = 75.734
String method time to convert mpf to mpz and save to  zpi.bin   = 57.234
Power  method time to convert mpf to mpz and save to zpi2.bin   =  6.594
.............................................................................................................

Chudnovski pi is no slouch. Here is the stats for Pifast and compress option. I was umable to 
estimate the time to compress and save to file though. 

Program : PiFast version 4.3 (fix 1), by Xavier Gourdon
Computation of 10000000 digits of Pi
Method used : Chudnovsky
Size of FFT : 1024 K
Physical memory used : ~ 61355 K
Disk memory used : ~ 0.00 Meg
------------------------------------------------------------
Computation run information :

Start : Tue Nov 27 05:14:00 2007
End   : Tue Nov 27 05:14:57 2007
Duration : 56.73 seconds
============================================================
Total computation time : 56.73 seconds (~ 0.02 hours)
============================================================

CODE chudpi2.c Begins here

/* Pi computation using Chudnovsky's algortithm.
 * Copyright 2002, 2005 Hanhong Xue (macroxue at yahoo dot com)
 * Slightly modified 2005 by Torbjorn Granlund (tege at swox dot com) to allow
   more than 2G digits to be computed.
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 * 1. Redistributions of source code must retain the above copyright notice,
 * this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright notice,
 * this list of conditions and the following disclaimer in the documentation
 * and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR
 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
 * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO
 * EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
 * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
 * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
 * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
 * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
 * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

 Some more modifications by Cino Hilliard to speed up saving output to a file.

*/

#include 
#include 
#include 
#include 
#include 
#include 
#include "gmp.h"

#define A   13591409
#define B   545140134
#define C   640320
#define D   12

#define BITS_PER_DIGIT   3.32192809488736234787
#define DIGITS_PER_ITER  14.1816474627254776555
#define DOUBLE_PREC      53

char *prog_name;

#if CHECK_MEMUSAGE
#undef CHECK_MEMUSAGE
#define CHECK_MEMUSAGE							\
  do {									\
    char buf[100];							\
    snprintf (buf, 100,							\
	      "ps aguxw | grep '[%c]%s'", prog_name[0], prog_name+1);	\
    system (buf);							\
  } while (0)
#else
#undef CHECK_MEMUSAGE
#define CHECK_MEMUSAGE
#endif
#define timer GetTickCount()/1000.0

////////////////////////////////////////////////////////////////////////////

// r = sqrt(x)

mpf_t t1, t2;

void
my_sqrt_ui(mpf_t r, unsigned long x)
{
  unsigned long prec, bits, prec0;

  prec0 = mpf_get_prec(r);

  if (prec0<=DOUBLE_PREC) {
    mpf_set_d(r, sqrt(x));
    return;
  }

  bits = 0;
  for (prec=prec0; prec>DOUBLE_PREC;) {
    int bit = prec&1;
    prec = (prec+bit)/2;
    bits = bits*2+bit;
  }

  mpf_set_prec_raw(t1, DOUBLE_PREC);
  mpf_set_d(t1, 1/sqrt(x));

  while (prec full
      mpf_mul_ui(t2, t2, x);
      mpf_ui_sub(t2, 1, t2);
      mpf_set_prec_raw(t2, prec/2);
      mpf_div_2exp(t2, t2, 1);
      mpf_mul(t2, t2, t1);         // half x half -> half
      mpf_set_prec_raw(t1, prec);
      mpf_add(t1, t1, t2);
    } else {
      prec = prec0;
      /* t2=x*t1, t1 = t2+t1*(x-t2*t2)/2; */
      mpf_set_prec_raw(t2, prec/2);
      mpf_mul_ui(t2, t1, x);
      mpf_mul(r, t2, t2);          // half x half -> full
      mpf_ui_sub(r, x, r);
      mpf_mul(t1, t1, r);          // half x half -> half
      mpf_div_2exp(t1, t1, 1);
      mpf_add(r, t1, t2);
      break;
    }
    prec -= (bits&1);
    bits /=2;
  }
}

// r = y/x   WARNING: r cannot be the same as y.
void
my_div(mpf_t r, mpf_t y, mpf_t x)
{
  unsigned long prec, bits, prec0;

  prec0 = mpf_get_prec(r);

  if (prec0<=DOUBLE_PREC) {
    mpf_set_d(r, mpf_get_d(y)/mpf_get_d(x));
    return;
}
  bits = 0;
  for (prec=prec0; prec>DOUBLE_PREC;) {
    int bit = prec&1;
    prec = (prec+bit)/2;
    bits = bits*2+bit;
  }

  mpf_set_prec_raw(t1, DOUBLE_PREC);
  mpf_ui_div(t1, 1, x);

  while (prec full
      mpf_ui_sub(t2, 1, t2);
      mpf_set_prec_raw(t2, prec/2);
      mpf_mul(t2, t2, t1);         // half x half -> half
      mpf_set_prec_raw(t1, prec);
      mpf_add(t1, t1, t2);
    } else {
      prec = prec0;
      /* t2=y*t1, t1 = t2+t1*(y-x*t2); */
      mpf_set_prec_raw(t2, prec/2);
      mpf_mul(t2, t1, y);          // half x half -> half
      mpf_mul(r, x, t2);           // full x half -> full
      mpf_sub(r, y, r);
      mpf_mul(t1, t1, r);          // half x half -> half
      mpf_add(r, t1, t2);
      break;
    }
    prec -= (bits&1);
    bits /=2;
  }
}

//////////////////////////////////////////////////////////////////////
//////

#define min(x,y) ((x)(y)?(x):(y))

typedef struct {
  unsigned long max_facs;
  unsigned long num_facs;
  unsigned long *fac;
  unsigned long *pow;
} fac_t[1];

typedef struct {
  long int fac;
  long int pow;
  long int nxt;
} sieve_t;

sieve_t *sieve;
long int sieve_size;
fac_t   ftmp, fmul;

#define INIT_FACS 32

void
fac_show(fac_t f)
{
  long int i;
  for (i=0; i< s) {
    fac_clear(f);
    fac_init_size(f, s);
  }
}

// f = base^pow
inline void
fac_set_bp(fac_t f, unsigned long base, long int pow)
{
  long int i;
  assert(base0; i++, base = sieve[base].nxt) {
    f[0].fac[i] = sieve[base].fac;
    f[0].pow[i] = sieve[base].pow*pow;
  }
  f[0].num_facs = i;
  assert(i<=f[0].max_facs);
}

// r = f*g
inline void
fac_mul2(fac_t r, fac_t f, fac_t g)
{
  long int i, j, k;

  for (i=j=k=0; i< g[0].fac[j]) {
      r[0].fac[k] = f[0].fac[i];
      r[0].pow[k] = f[0].pow[i];
      i++;
    } else {
      r[0].fac[k] = g[0].fac[j];
      r[0].pow[k] = g[0].pow[j];
      j++;
    }
  }
  for (; i0) {
      if (jnum_facs, fg->num_facs));
  for (i=j=k=0; inum_facs && jnum_facs; ) {
    if (fp->fac[i] == fg->fac[j]) {
      c = min(fp->pow[i], fg->pow[j]);
      fp->pow[i] -= c;
      fg->pow[j] -= c;
      fmul->fac[k] = fp->fac[i];
      fmul->pow[k] = c;
      i++; j++; k++;
    } else if (fp->fac[i] < fg->fac[j]) {
      i++;
    } else {
      j++;
    }
  }
  fmul->num_facs = k;
  assert(k <= fmul->max_facs);

  if (fmul->num_facs) {
    bs_mul(gcd, 0, fmul->num_facs);
    mpz_tdiv_q(p, p, gcd);
    mpz_tdiv_q(g, g, gcd);
    fac_compact(fp);
    fac_compact(fg);
  }
}

////////////////////////////////////////////////////////////////////////////

int      out=0;
mpz_t   *pstack, *qstack, *gstack;
fac_t  *fpstack, *fgstack;
long int      top = 0;
double   progress=0, percent;

#define p1 (pstack[top])
#define q1 (qstack[top])
#define g1 (gstack[top])
#define fp1 (fpstack[top])
#define fg1 (fgstack[top])

#define p2 (pstack[top+1])
#define q2 (qstack[top+1])
#define g2 (gstack[top+1])
#define fp2 (fpstack[top+1])
#define fg2 (fgstack[top+1])

clock_t gcd_time = 0;

// binary splitting
void
bs(unsigned long a, unsigned long b, unsigned gflag, long int level)
{
  unsigned long i, mid;
  int ccc;

  if (b-a==1) {
    /*
      g(b-1,b) = (6b-5)(2b-1)(6b-1)
      p(b-1,b) = b^3 * C^3 / 24
      q(b-1,b) = (-1)^b*g(b-1,b)*(A+Bb).
    */
    mpz_set_ui(p1, b);
    mpz_mul_ui(p1, p1, b);
    mpz_mul_ui(p1, p1, b);
    mpz_mul_ui(p1, p1, (C/24)*(C/24));
    mpz_mul_ui(p1, p1, C*24);

    mpz_set_ui(g1, 2*b-1);
    mpz_mul_ui(g1, g1, 6*b-1);
    mpz_mul_ui(g1, g1, 6*b-5);

    mpz_set_ui(q1, b);
    mpz_mul_ui(q1, q1, B);
    mpz_add_ui(q1, q1, A);
    mpz_mul   (q1, q1, g1);
    if (b%2)
      mpz_neg(q1, q1);

    i=b;
    while ((i&1)==0) i>>=1;
    fac_set_bp(fp1, i, 3);    // b^3
    fac_mul_bp(fp1, 3*5*23*29, 3);
    fp1[0].pow[0]--;

    fac_set_bp(fg1, 2*b-1, 1);   // 2b-1
    fac_mul_bp(fg1, 6*b-1, 1);   // 6b-1
    fac_mul_bp(fg1, 6*b-5, 1);   // 6b-5

    if (b>(int)(progress)) {
      printf("."); fflush(stdout);
      progress += percent*2;
    }

  } else {
    /*
      p(a,b) = p(a,m) * p(m,b)
      g(a,b) = g(a,m) * g(m,b)
      q(a,b) = q(a,m) * p(m,b) + q(m,b) * g(a,m)
    */
    mid = a+(b-a)*0.5224;     // tuning parameter
    bs(a, mid, 1, level+1);

    top++;
    bs(mid, b, gflag, level+1);
    top--;

    if (level == 0)
      puts ("");

    ccc = level == 0;

    if (ccc) CHECK_MEMUSAGE;

    if (level>=4) {           // tuning parameter
      clock_t t = clock();
      fac_remove_gcd(p2, fp2, g1, fg1);
      gcd_time += clock()-t;
    }

    if (ccc) CHECK_MEMUSAGE;
    mpz_mul(p1, p1, p2);

    if (ccc) CHECK_MEMUSAGE;
    mpz_mul(q1, q1, p2);

    if (ccc) CHECK_MEMUSAGE;
    mpz_mul(q2, q2, g1);

    if (ccc) CHECK_MEMUSAGE;
    mpz_add(q1, q1, q2);

    if (ccc) CHECK_MEMUSAGE;
    fac_mul(fp1, fp2);

    if (gflag) {
      mpz_mul(g1, g1, g2);
      fac_mul(fg1, fg2);
    }
  }

  if (out&2) {
    printf("p(%ld,%ld)=",a,b); fac_show(fp1);
    if (gflag)
      printf("g(%ld,%ld)=",a,b); fac_show(fg1);
  }
}

void
build_sieve(long int n, sieve_t *s)
{
  long int m, i, j, k;

  sieve_size = n;
  m = (long int)sqrt(n);
  memset(s, 0, sizeof(sieve_t)*n/2);

  s[1/2].fac = 1;
  s[1/2].pow = 1;

  for (i=3; i<=n; i+=2) {
    if (s[i/2].fac == 0) {
      s[i/2].fac = i;
      s[i/2].pow = 1;
      if (i<=m) {
	for (j=i*i, k=i/2; j<=n; j+=i+i, k++) {
	  if (s[j/2].fac==0) {
	    s[j/2].fac = i;
	    if (s[k].fac == i) {
	      s[j/2].pow = s[k].pow + 1;
	      s[j/2].nxt = s[k].nxt;
	    } else {
	      s[j/2].pow = 1;
	      s[j/2].nxt = k;
	    }
	  }
	}
      }
    }
  }
}

int main(int argc, char *argv[])
{
mpf_t pi,qi;
mpz_t piz,piz2;
FILE *fptr1,*fptr2;
mp_exp_t expptr;
 char *pi_str;
 pi_str = calloc(200000000,1);
char fil[24],fil2[24];
strcpy(fil,"zpi.bin");
fptr1=fopen(fil,"wb+");
strcpy(fil2,"zpi2.bin");
fptr2=fopen(fil2,"wb+");
double t,ttofile;
  long int d=100,i, depth=1, terms,value,j;
  unsigned long psize, qsize;
  clock_t begin, mid0, mid1, mid2, mid3, mid4, end;
  mpz_init(piz);
  mpz_init(piz2);
  prog_name = argv[0];
  if (argc>1)
    d = strtoul(argv[1], 0, 0);
  if (argc>2)
    out = atoi(argv[2]);

  terms = d/DIGITS_PER_ITER;
  while ((1L<<depth)