Optimizing Modulus (was Re: Would someone mind elaborating the explanation in the manual?)

David McKen cic_3_b at yahoo.com
Mon Oct 27 16:54:56 CET 2003

Hmm, Interesting discussion.

I kinda want to diverge for a minute. I have just profiled my
program, I
must admit the results are quite startling. I believe the best effect
be carried across if I quote you the first couple lines of the
produced by gprof.

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 70.72    356.85   356.85                             __gmpn_mod_1
 19.14    453.45    96.60                             main
  4.46    475.98    22.53                            
  1.63    484.19     8.21                             __gmpz_sub_ui
  1.45    491.53     7.34                             __gmpz_fdiv_ui

In my searches through the GMP source I have narrowed the field of
calls to the following group of statements. The function that calls
__gmpn_mod_1 is mpz_fdiv_ui but essentially these statements work
to achieve their task. update_func is a pointer to either the
mpz_add_ui or
mpz_sub_ui depending on whether I am going up from teh bottom of the
or down from the top. From what I understand function pointers are
than having an if..then..else statement that would be evaluated every
an iteration occours.

      remainder = mpz_fdiv_ui (up_counter, prime_product_3_29);
      if(!((remainder %  3) &&
           (remainder %  5) &&
          (remainder %  7) &&
           (remainder % 11) &&
          (remainder % 13) &&
           (remainder % 17) &&
           (remainder % 19) &&
           (remainder % 23) &&
           (remainder % 29)) )
        update_func (up_counter, up_counter, 2);

This code is in a loop that iterates 300,000+ times a second. There
is other
code in the loop but essentially the earlier parts of the loop are
to stop access to the remainder of the loop when it can be guaranteed
the latter tests will fail, i.e. weed out useless values. At the
moment I am
at a loss for optimizations as I have already optimized this program
iterating at 1333+ times a second to the current lower bound of it's
potential at 300,000+. I should mention that I am working with a
range of
numbers from 12 digits long to 310 digits, when working on the lower
of the range working upwards the speed jumps to almost 800,000 times

My question is looking at the results of the profiler can anyone
anything that might further speed up my program. As I have just
explained I
am more looking to speed it up as the numbers get bigger than I am
about the smaller numbers. What I am looking for is a faster way to
get the
remainder of a big number and even then all I really care about is
wether it
is an even divide (remander == 0).

Sisyphus, I am reading the book you showed me however I have not come
anything that pertains to my current predicament. If you know of a
that would be helpful I would be ever so appreciative if you would
tell me.

Thanks in Advance
David McKen

----- Original Message ----- 
From: "Brian Hurt" <bhurt at spnz.org>
To: "Kevin Ryde" <user42 at zip.com.au>
Cc: "Sisyphus" <kalinabears at iinet.net.au>; <gmp-discuss at swox.com>
Sent: Monday, October 27, 2003 11:40 AM
Subject: Re: Would someone mind elaborating the explanation in the

> On Mon, 27 Oct 2003, Kevin Ryde wrote:
> > If you're only testing divisibility then for instance n%3==0
> >
> > (n * 0xAAAAAAAB) <= 0x55555555
> >
> > which is one multiply.
> In 32-bit.  In 64-bit, if n = 3, then n * 0xAAAAAAAB becomes
> and the test fails.  Although you can make you life somewhat
simpler by
> only dealing with three cases: 1) 32-bit, where you test that (n *
> 0xAAAAAAAB) <= 0x5555555, 2) 64-bit, where you test that (n *
> 0xAAAAAAAAAAAAAAAB) <= 0x5555555555555555), and 3) everything else,
> you test that (n % 3) == 0.
> I'd keep case 3.
> You snipped too much here:
> > > Which isn't always faster than a divide instruction.
> >
> > True, that should be checked, but almost everywhere one multiply
> > faster than a divide.
> >
> In every architecture I know of, multiply is at least as fast as
> and generally faster.  I'm not sure it's true that a divide is
slower than
> 2 multiplies, a shift, and a subtract (which is what I was
comparing it
> to).
> Brian
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at swox.com
> https://gmplib.org/mailman/listinfo/gmp-discuss

Do you Yahoo!?
Exclusive Video Premiere - Britney Spears

More information about the gmp-discuss mailing list