Memory usage for large multiplications
tg at gmplib.org
Wed Feb 3 11:06:56 CET 2010
bodrato at mail.dm.unipi.it writes:
> FYI, Linux has the ability to overallocate memory, allocating it
> really only when needed. In such a case, the former strategy is
> better. But I wouldn't recommend it, as in case of lack of memory,
> expect random processes to be killed by the OOM Killer. :(
The OOM killer can kill a process even in the "just in time allocation"
scenario, when two processes concurrently ask for memory.
By the way, the kernel can delay real allocation, but it can not detect
when memory is not needed any more and free it.
Maybe the itch/scratch strategy should be limited to a few megabytes (or
to the small-to-middle-sizes algorithms)? And bigger memory areas should
be always allocated and freed on the fly?
We need to make a few things clear to ourselves:
* Allocating scratch in a caller speeds things up for small operations
in various ways. We should allocate as little as possible for
spatial locality and thus better cache utilisation ad TLB
utilisation. But spending many cycles on minimising allocation is
not a good stratagy here.
* Allocating scratch space in a caller for huge operations will not
save any time. But reusing scratch by passing already allocated
scratch can save space.
* Saving memory for the largest operations is very important. If we
compare the code used for the recent Pi record and GMP using
gmp-chudnovski.c, the record Pi code is many times more memory
efficient. One reason for GMP's memory apetite is that the
scratch/itch interface is not implemented throughout, so e.g. the FFT
code will not e able to make use of existing scratch memory that
callers have available.
So the scratch/itch interface for large and small operations serve
different purposes, saving time and saving space. We need to keep that
in mind. Concretely: We should not be afraid to spend thousands of
cycles for conserving memory when we have operands of millions of limbs,
according to a principle of "relative overhead".
What should we do?
I think we should as a first step make large operations accept but not
require scratch space, while small operations should continue to require
it. (Marco already did this for GMP 5.0 in at least one place.)
As a possible second step, we might want to introduce a more complex
scratch management system for large operations. Often, a caller has
several unused pieces of memory, it would make sense to allow it to pass
them all. ("Here are n bytes, and here are m bytes, use them as you
We will probably never reach the specialised Pi program's memory
efficiency, since we cannot glue GMP's routines as closely together as
is possible in such a program. But I think we could get close.
More information about the gmp-devel