mpz_nextprime
Seth Troisi
braintwo at gmail.com
Wed Sep 4 05:18:50 UTC 2019
Yes I've tried that too. It was the same exact speed, so I started to
profile the function, which hasn't been easy.
I think the vast majority (98%+) of the function is spent in calls to
mpz_rabinmiller.
If I don't change the small primes for trail division than the same number
of calls get made to mpz_rabinmiller
and speed should be largely unchanged (minus what is a frustrating amount
of noise in tune/speed due to randomness).
I'm working on getting a setup where I can profile tune/speed for
1. number of false calls to mpz_millerrabin
2. overall time spent in mpz_millerrabin
When I use gprof I see 98% of the cum time is in __gmpn_redc_1,
__gmpn_sqr_basecase, and __gmpn_powm
which I'm interpreting as mpz_millerrabin but I'd love to see the call flow
like with gperftool and pprof instead of
the flat profile (which is all I can get gprof to produce)
On Tue, Sep 3, 2019 at 9:44 PM Niels Möller <nisse at lysator.liu.se> wrote:
> Seth Troisi <braintwo at gmail.com> writes:
>
> > The idea is to try and avoid performing modulo by storing the distance to
> > the
> > next multiple of each prime. This avoids divisions, unfortunately it
> > requires
> > writing back to the table each time a multiple is passed.
>
> Have you tried divisibility test based on 2-adic inverse? It works like
> this:
>
> Let p be an odd prime, and x any number that fits in a uint16_t (say),
> and you want to know if p divides x.
>
> Precompute
>
> p_inv = p^-1 (mod 2^16)
>
> and
>
> p_limit = floor (2^16 / p) = floor ( (2^16 - 1) / p)
>
> Then p divides x if and only if x * p_inv mod 2^16 <= p_limit.
>
> The reason this works is that (i) multiplication by p_inv mod 2^16 is a
> one-to-one mapping, and (ii) whenever p divides x, it agrees with normal
> division, p_inv * x mod 2^16 = x / p.
>
> Therefore all the multiples of p are mapped to the interval 0 <= x *
> p_inv mod 2^16 <= p_limit, and non-multiples are mapped to larger values.
>
> So if you tabulate p_inv and p_limit together with the prime table,
>
> r = (moduli[i] + incr) % prime;
> if (r == 0) goto next;
>
> could be replaced with
>
> if ((uint16_t)((moduli[i] + incr) * p_inv) <= p_limit)
> goto next;
>
> which a multiply instead of a division. I would expect it to be faster
> than the loop
>
> while (incr > distance[i])
> {
> distance[i] += prime;
> }
>
> even though it's a bit unclear to me how many times it typically runs
> and how well predictable the branch is.
>
> Regards,
> /Niels
>
> --
> Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
> Internet email is subject to wholesale government surveillance.
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mpz_nextprime_faster_mod.diff
Type: text/x-patch
Size: 4954 bytes
Desc: not available
URL: <https://gmplib.org/list-archives/gmp-devel/attachments/20190903/30034f2f/attachment.bin>
More information about the gmp-devel
mailing list