braintwo at gmail.com
Wed Sep 4 05:18:50 UTC 2019
Yes I've tried that too. It was the same exact speed, so I started to
profile the function, which hasn't been easy.
I think the vast majority (98%+) of the function is spent in calls to
If I don't change the small primes for trail division than the same number
of calls get made to mpz_rabinmiller
and speed should be largely unchanged (minus what is a frustrating amount
of noise in tune/speed due to randomness).
I'm working on getting a setup where I can profile tune/speed for
1. number of false calls to mpz_millerrabin
2. overall time spent in mpz_millerrabin
When I use gprof I see 98% of the cum time is in __gmpn_redc_1,
__gmpn_sqr_basecase, and __gmpn_powm
which I'm interpreting as mpz_millerrabin but I'd love to see the call flow
like with gperftool and pprof instead of
the flat profile (which is all I can get gprof to produce)
On Tue, Sep 3, 2019 at 9:44 PM Niels Möller <nisse at lysator.liu.se> wrote:
> Seth Troisi <braintwo at gmail.com> writes:
> > The idea is to try and avoid performing modulo by storing the distance to
> > the
> > next multiple of each prime. This avoids divisions, unfortunately it
> > requires
> > writing back to the table each time a multiple is passed.
> Have you tried divisibility test based on 2-adic inverse? It works like
> Let p be an odd prime, and x any number that fits in a uint16_t (say),
> and you want to know if p divides x.
> p_inv = p^-1 (mod 2^16)
> p_limit = floor (2^16 / p) = floor ( (2^16 - 1) / p)
> Then p divides x if and only if x * p_inv mod 2^16 <= p_limit.
> The reason this works is that (i) multiplication by p_inv mod 2^16 is a
> one-to-one mapping, and (ii) whenever p divides x, it agrees with normal
> division, p_inv * x mod 2^16 = x / p.
> Therefore all the multiples of p are mapped to the interval 0 <= x *
> p_inv mod 2^16 <= p_limit, and non-multiples are mapped to larger values.
> So if you tabulate p_inv and p_limit together with the prime table,
> r = (moduli[i] + incr) % prime;
> if (r == 0) goto next;
> could be replaced with
> if ((uint16_t)((moduli[i] + incr) * p_inv) <= p_limit)
> goto next;
> which a multiply instead of a division. I would expect it to be faster
> than the loop
> while (incr > distance[i])
> distance[i] += prime;
> even though it's a bit unclear to me how many times it typically runs
> and how well predictable the branch is.
> Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
> Internet email is subject to wholesale government surveillance.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 4954 bytes
Desc: not available
More information about the gmp-devel