fac_ui rewrote.
Torbjorn Granlund
tg at gmplib.org
Thu Dec 8 18:32:02 CET 2011
Torbjorn Granlund <tg at gmplib.org> writes:
Using "configure --enable-alloca=debug --enable-assert" I tracked down
this problem. I pushed a fix that I feel confident to be correct.
Please review it.
There ars still problems with kolga.bibsys.no, a POWER6 machine.
It looks like a compiler bug, but I am not certain.
An error looks like:
mpz_fac_ui(21) wrong
got 6442746444000
want 51090942171709440000
The failing function seem to be mpz_bc_fac_1.
I took a look at micro-optimisation of the new code. There are several
integer divisions by non-constant divisors. These are extremely
expensive, typically 50 to 200 cycles in modern CPUs.
In block_resieve, it seems 'step' is of the form x*2^t, with t
monotonically increasing. Perhaps one could make use of that, if x is
limited?
In various places, you divide by n in order to figure out the number of
products to accumulate into a limb. One could instead use
count_leading_zeros on n (after proper casting) and then use shifting;
this would be an estimate losing some accuracy. For better accuracy,
one may use a little table, along these lines:
count_leading_zeros (cnt, n);
bits4 = n >> (GMP_LIMB_BITS - (cnt - 4)); /* extract 4 bits */
max = 1 + magictab[bits4] >> (cnt - 4);
(One may actually perhaps let magictab be a constant in an unsigned long
and "index" using bit shift and masks.)
--
Torbjörn
More information about the gmp-devel
mailing list