[PPL-devel] mpz_addmul_ui with ui = 1

Fri Feb 6 09:56:55 CET 2009

On Wed, 4 Feb 2009, Abramo Bagnara wrote:

> with that two checks to avoid calls to mpz_addmul and mpz_submul permits
> to avoid 110958265 calls to mpz_addmul (94.3%) and 50079219 calls to
> mpz_submul (88,6%).

Indeed those percentages are the interesting data. For such a distribution 
it is obviously more efficient to inline the frequent trivial case.

> This show me that the inlining of some trivial tests can give good
> results when the test is true very often.

Ideally, this would be the compilers job. It would need:
- link time optimization, to have access to the gmp code when compiling 
your program. Several compilers can do this, and maybe gcc-4.5
- profile feedback, to know about this distribution
- partial inlining or arbitrary de-inlining

The last point seems like the hardest to find in a compiler, and writing 
the function as:

mpz_mul(...){
  if(test for 0){trivial case}
  else{call mpz_mul_nonzero}
}
mpz_mul_nonzero(...){current mpz_mul code, minus the check for 0}

may help the compiler a lot.

> 1) the GMP library does not check for any special values unless this is
> needed for correctness and the user code is responsible to add the
> trivial tests that increase efficiency for specific data set
>
> 2) the GMP library move the trivial checks in gmp.h to permit their
> inlining. Note that this approach does not conflict with 1: it would be
> feasible to have mpz_addmul defined in gmp.h and mpz_addmul_raw defined
> in GMP library, then the user code could call the version more suitable
> for its needs, the important part is that library version does not check
> for special values.

Indeed, with this approach, the compiler does not even need LTO or 
profiling. However, this is making the code a bit heavier by forcing those 
inlines for every operation, even for codes where the numbers are never 0. 
Don't know how much that matters.

> 3) (current situation) the GMP library does some special value checks
> and user code may decide to inline the very same checks. This makes the
> 'else' path heavier than needed.

I believe that the check for 0 in mpz_mul actually falls in your first 
category, it is necessary for correctness (however, mpz_mul also checks 
whether one of the arguments fits in a single limb, which fits your 
description better). And the else path is supposedly already heavy enough 
that the penalty should not be too bad. Did you see a significant 
performance gain using 2) instead of 3)?

(I am not a gmp developer, just another user, so feel free to ignore my 
questions if you think the answers won't help convince the real 
developers)

-- 
Marc Glisse