Arithmetic without limitations?
fabrice at bellard.org
Thu Feb 11 18:48:58 CET 2010
On 02/11/10 14:38, Torbjorn Granlund wrote:
> Paul Zimmermann<Paul.Zimmermann at loria.fr> writes:
> > My idea for GMP has long been to make "hierarchical locality" take care
> > of it all. A row in in the k-dimensional matrix would fit into L1
> > cache, a plane would fit into memory, further dimensions would live in
> > swap space (not exlicit files).
> I'm not sure this will work. Here is a concrete example, on a Core 2 with
> 16Gb of RAM and 4Gb of swap. I'm trying to multiply two numbers of 6e9
> decimal digits, thus using about 2.5G of memory each.
> With GMP 5.0.1, top says:
> The developments I was talking about are not in GMP 5.0.1. The FFT
> there has poor locality (which is mainly a property of its large
> coefficient FFT). Attemtping to compute large product with operands too
> large for main memory will just result in early retirement of the swap
> disk. :-)
> Besides, one will need lots of swap space for computing with large
> numbers. That's the natural way; You need to compute with a huge data
> set? Configure a huge swap area! 4 Gb (which I take as 4 gibibyte) is
> not good for huge computations, and really strangely small for a machine
> with 16 gibibyte RAM.
> Special explicit swap files in a general purpose library is not imho a
> good design. In a special purpose program, perfectly fine.
Relying on the OS swap is possible, but to have good performance you
will need to give hints to the OS to do the prefetching from the disk
because unless the OS uses very clever heuristics it won't be able to
prefetch the data correctly. This case will happen in case of
discontinuous accesses which are needed to compute a DFT (you need to do
either a matrix transposition or DFTs on matrix columns which makes
discontinuous memory accesses).
Overall, it is probably as difficult to give hints to the OS as to
directly make the corresponding disk I/Os !
In my case, where I used explicit disk I/Os, I found that it was very
interesting in terms of performance to do raw I/Os (O_DIRECT flag in
open() syscall). It shows that the OS (=Linux) disk cache is far from
optimal in this particular case where the I/O patterns are very regular
and where it makes no sense to cache the data for later use.
Another point is that it is very convenient to have one file per
mpz/mpf/... on the disk in case you want to restart a huge computation
from a known checkpoint.
> one could consider an optional interface in GMP where one makes
> available explicit swap files, I haven't thought about that.)
I think it should be possible to disable the compilation of the "out of
core" support because it won't be useful for many users. Having
different functions for disk aware mpz/mpf/... is a good idea if you
want to avoid modifying the existing code. It has the advantage of not
adding extra tests for the disk case in the existing code.
More information about the gmp-devel