digits of Pi

Tue Jan 12 17:56:12 CET 2010

Paul Zimmermann <Paul.Zimmermann at loria.fr> writes:

  with GMP 5.0.0 and MPFR, it is possible to compute 10^9 bits of Pi in less
  than one hour on a 2.83Ghz Core 2:

  patate% time ./tconst_pi 1000000000 0 0
  Seed GMP_CHECK_RANDOMIZE=1263667646 (include this in bug reports)
  2885.349u 192.229s 52:16.55 98.1%       0+0k 280+16io 3pf+0w

Cool!

A 3.4 GHz Phenom, using one core, should need about 1900 seconds then,
not far from 1/2 hour.

The present FFT code isn't great for these huge operands.  The new
mul_fft.c which you, Pierrick and Alexander wrote also runs poorly for
huge operands (although it perhaps runs somewhat less poorly...).

Fortunately, this problem was solved in 2008, principally by Niels
Möller, building on Tommy Färnqvists efforts from a few years ago.  We
just have to integrate that new FFT code.

The new code uses several word-size primes a2^b+1 = B-epsilson where B
is the limb base and b is as large as possible to allow high order
principal roots of unity.  The code currently uses a bounded number of
primes, which means it gets a slightly worse complexity than the mod
2^k+1 FFT, but that might be more of a theoretical problem than a
practical one.

It is an interesting problem with cache friendly "large ring" FFT.  By
large ring I mean a coefficient ring that is a function of the input
operand sizes.  It is my present opinion that the only reason why the
small primes FFT beats the mod 2^k+1 FFT is that the former can more
easily be made to get close to 100% cache hit.

-- 
Torbjörn