fac_ui rewrote.

Torbjorn Granlund tg at gmplib.org
Thu Dec 8 18:32:02 CET 2011


Torbjorn Granlund <tg at gmplib.org> writes:

  Using "configure --enable-alloca=debug --enable-assert" I tracked down
  this problem.  I pushed a fix that I feel confident to be correct.
  Please review it.
  
There ars still problems with kolga.bibsys.no, a POWER6 machine.
It looks like a compiler bug, but I am not certain.

An error looks like:

mpz_fac_ui(21) wrong
  got  6442746444000
  want 51090942171709440000

The failing function seem to be mpz_bc_fac_1.


I took a look at micro-optimisation of the new code.  There are several
integer divisions by non-constant divisors.  These are extremely
expensive, typically 50 to 200 cycles in modern CPUs.

In block_resieve, it seems 'step' is of the form x*2^t, with t
monotonically increasing.  Perhaps one could make use of that, if x is
limited?

In various places, you divide by n in order to figure out the number of
products to accumulate into a limb.  One could instead use
count_leading_zeros on n (after proper casting) and then use shifting;
this would be an estimate losing some accuracy.  For better accuracy,
one may use a little table, along these lines:

    count_leading_zeros (cnt, n);
    bits4 = n >> (GMP_LIMB_BITS - (cnt - 4));     /* extract 4 bits */
    max = 1 + magictab[bits4] >> (cnt - 4);

(One may actually perhaps let magictab be a constant in an unsigned long
and "index" using bit shift and masks.)

-- 
Torbjörn


More information about the gmp-devel mailing list