mpz_t caching

Fri Sep 4 09:26:52 UTC 2015

On 2015-09-04 10:24:31 +0200, Niels Möller wrote:
> Vincent Lefevre <vincent at vinc17.net> writes:
> 
> > In 2014, Patrick Pelissier (in Cc) implemented a mpz_t allocation
> > cache for MPFR, redefining mpz_init and mpz_clear, in order to
> > avoid some deallocations/allocations (via the indirect call to
> > the allocation functions) when mpz_t's cleared and init'ed again
> > a bit after. I've attached the patch that was applied to MPFR.
> 
> I take it this was done to improve performance?

Yes.

> Do you have any information on what applications benefitted, and
> some numbers for typical performance improvement?

There were just some tests done by Patrick in 2014:

Before:
[precision is 333 bits]
exp(x)     took 0.006989 ms (262143 eval in 1832 ms)
log(x)     took 0.011536 ms (131071 eval in 1512 ms)
sin(x)     took 0.007965 ms (131071 eval in 1044 ms)
cos(x)     took 0.005875 ms (262143 eval in 1540 ms)
arccos(x)  took 0.054689 ms (32767 eval in 1792 ms)
arctan(x)  took 0.042116 ms (32767 eval in 1380 ms)

After:
[precision is 333 bits]
exp(x)     took 0.006744 ms (262143 eval in 1768 ms)
log(x)     took 0.011841 ms (131071 eval in 1552 ms)
sin(x)     took 0.007813 ms (131071 eval in 1024 ms)
cos(x)     took 0.005615 ms (262143 eval in 1472 ms)
arccos(x)  took 0.043580 ms (32767 eval in 1428 ms)
arctan(x)  took 0.035401 ms (32767 eval in 1160 ms)

(knowing that mpz_t is not used intensively everywhere in MPFR).

> If I understand the patch correctly, the idea is essentially to keep
> your own free-list for the limb storage pointed to by _mp_d?

Yes.

> If you beat libc's malloc, and by how much, is going to be quite
> system dependent.

The question is more whether one always beats the system malloc.

> > Note: A drawback is that it may be necessary to free the caches at
> > the end of the program so that tools like valgrind don't complain.
> 
> And there are also thread-safety issues; currently GMP leaves all such
> problems to the provided allocation function, while a cache would need
> some thread-local storage (or locks, but I guess that's highly
> undesirable if the idea is to gain performance).

That's true that in MPFR, we use thread-local storage for these things.

-- 
Vincent Lefèvre <vincent at vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)