Optimizing Modulus (was Re: Would someone mind elaborating the explanation in the manual?)

Mon Oct 27 16:54:56 CET 2003

Hmm, Interesting discussion.

I kinda want to diverge for a minute. I have just profiled my
program, I
must admit the results are quite startling. I believe the best effect
will
be carried across if I quote you the first couple lines of the
flat-profile
produced by gprof.

--Begin
Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 70.72    356.85   356.85                             __gmpn_mod_1
 19.14    453.45    96.60                             main
  4.46    475.98    22.53                            
__gmpn_modexact_1_odd
  1.63    484.19     8.21                             __gmpz_sub_ui
  1.45    491.53     7.34                             __gmpz_fdiv_ui
--End

In my searches through the GMP source I have narrowed the field of
guilt
calls to the following group of statements. The function that calls
__gmpn_mod_1 is mpz_fdiv_ui but essentially these statements work
together
to achieve their task. update_func is a pointer to either the
mpz_add_ui or
mpz_sub_ui depending on whether I am going up from teh bottom of the
range
or down from the top. From what I understand function pointers are
faster
than having an if..then..else statement that would be evaluated every
time
an iteration occours.

--Begin
      remainder = mpz_fdiv_ui (up_counter, prime_product_3_29);
      if(!((remainder %  3) &&
           (remainder %  5) &&
          (remainder %  7) &&
           (remainder % 11) &&
          (remainder % 13) &&
           (remainder % 17) &&
           (remainder % 19) &&
           (remainder % 23) &&
           (remainder % 29)) )
      {
        update_func (up_counter, up_counter, 2);
        continue;
      }
--End

This code is in a loop that iterates 300,000+ times a second. There
is other
code in the loop but essentially the earlier parts of the loop are
designed
to stop access to the remainder of the loop when it can be guaranteed
that
the latter tests will fail, i.e. weed out useless values. At the
moment I am
at a loss for optimizations as I have already optimized this program
from
iterating at 1333+ times a second to the current lower bound of it's
potential at 300,000+. I should mention that I am working with a
range of
numbers from 12 digits long to 310 digits, when working on the lower
bound
of the range working upwards the speed jumps to almost 800,000 times
a
second.

My question is looking at the results of the profiler can anyone
suggest
anything that might further speed up my program. As I have just
explained I
am more looking to speed it up as the numbers get bigger than I am
worrying
about the smaller numbers. What I am looking for is a faster way to
get the
remainder of a big number and even then all I really care about is
wether it
is an even divide (remander == 0).

Sisyphus, I am reading the book you showed me however I have not come
across
anything that pertains to my current predicament. If you know of a
section
that would be helpful I would be ever so appreciative if you would
tell me.

Thanks in Advance
David McKen

----- Original Message ----- 
From: "Brian Hurt" <bhurt at spnz.org>
To: "Kevin Ryde" <user42 at zip.com.au>
Cc: "Sisyphus" <kalinabears at iinet.net.au>; <gmp-discuss at swox.com>
Sent: Monday, October 27, 2003 11:40 AM
Subject: Re: Would someone mind elaborating the explanation in the
manual?

> On Mon, 27 Oct 2003, Kevin Ryde wrote:
>
> > If you're only testing divisibility then for instance n%3==0
becomes
> >
> > (n * 0xAAAAAAAB) <= 0x55555555
> >
> > which is one multiply.
>
> In 32-bit.  In 64-bit, if n = 3, then n * 0xAAAAAAAB becomes
0x200000001,
> and the test fails.  Although you can make you life somewhat
simpler by
> only dealing with three cases: 1) 32-bit, where you test that (n *
> 0xAAAAAAAB) <= 0x5555555, 2) 64-bit, where you test that (n *
> 0xAAAAAAAAAAAAAAAB) <= 0x5555555555555555), and 3) everything else,
where
> you test that (n % 3) == 0.
>
> I'd keep case 3.
>
> You snipped too much here:
> > > Which isn't always faster than a divide instruction.
> >
> > True, that should be checked, but almost everywhere one multiply
is
> > faster than a divide.
> >
>
> In every architecture I know of, multiply is at least as fast as
divide,
> and generally faster.  I'm not sure it's true that a divide is
slower than
> 2 multiplies, a shift, and a subtract (which is what I was
comparing it
> to).
>
> Brian
>
>
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at swox.com
> https://gmplib.org/mailman/listinfo/gmp-discuss

__________________________________
Do you Yahoo!?
Exclusive Video Premiere - Britney Spears
http://launch.yahoo.com/promos/britneyspears/