bdiv_q_2.c improved

Tue Oct 6 19:38:46 UTC 2015

Cool to see some new division work. How do you measure the speed gain of
your new code? How do you test it, by hooking it in in mpn/divexact.c
(patch in earlier mail) and then make gmp's standard make check?

Some comments below:

jgk at panix.com (Joe keane) writes:

> /* mpn_bdiv_q -- Hensel division with precomputed inverse, returning quotient.
>
>    Contributed to the GNU project by Torbjorn Granlund.
>
>    THE FUNCTIONS IN THIS FILE ARE INTERNAL WITH MUTABLE INTERFACES.  IT IS ONLY
>    SAFE TO REACH THEM THROUGH DOCUMENTED INTERFACES.  IN FACT, IT IS ALMOST
>    GUARANTEED THAT THEY WILL CHANGE OR DISAPPEAR IN A FUTURE GMP RELEASE.
>
> Copyright 2006, 2007, 2009 Free Software Foundation, Inc.

You should obviusly update credits and copyright years (and if some code
is copied from earlier files, you inherit copyright years from them). To
be able include new code in GMP, we ask for assignment of copyright to
the FSF. Are you willing to do that?

>   /* FIXME: Add ASSERTs for allowable overlapping; i.e., that qp = np is OK,
>      but some over N/Q overlaps will not work.  */

You might want MPN_SAME_OR_SEPARATE_P.

>   for (i = nn - 2; i > 0; i--)
>     {
>       mp_limb_t cl;
>       mp_limb_t rl;
>       mp_limb_t zl;
>       mp_limb_t al;
>
>       q = dinv * nx;
>       qp[0] = q;
>
>       umul_ppmm (ypl, xpl, d0, q);
>       ASSERT(xpl == nx);
>
>       umul_ppmm (hpl, lpl, d1, q);
>
>       sum = ypl + lpl;
>       cl = (sum < ypl) + hpl;
>
>       rl = ny;
>       zl = rl - sum;
>       cl += zl > rl;
>       nx = zl;
>       qp++;
>
>       rl = np[2];
>       zl = rl - cl;
>       cl = zl > rl;
>       al = zl - bic;
>       cl += al > zl;
>       ny = al;
>
>       np++;
>       bic = cl;
>     }

A diagram of how the different terms are added up would help
understanding of the algorithm. Maybe add_ssaaaa or sub_ddmmss could be
used for some of the additions.

You might want to experiment with using a two-limb inverse. I haven't
tried to really understand your loop, but going to a larger inverse
typically helps on platforms with high multiplier throughput, where the
loop bottleneck is the dependency chain from one iteration to the next
(via the ny and bic variables in your loop, I think).

>   /* Final limb */
>   q = dinv * dl;
>   qp[1] = q;

A side note: My preference is usually to return the high limb, and leave
to the caller to store it if that's desired (but mpn interfaces are not
consistent on this point). Returning the high limb from bdiv family of
functions would let Jebelean's bidirectional divexact algorithm compute
the low and high half of the quotient with an overlapping word, and
store both halfs directly into the quotient result operand, *without*
having the two versions of that quotient word collide in memory.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26.
Internet email is subject to wholesale government surveillance.