Porting mpn_bdiv_q_1

Sat Nov 20 20:35:29 CET 2010

Hy;

May I ask a question about porting the "mpn_bdiv_q_1" function into another 
project? Essentially I rewrote the function to plain C++, but with "mp_limb_t" 
being ushorts. I pretty much thought the implementation is streightforward 
remappable to different datatypes. Sadly it does not compute well. :) Almost no 
division results correct at all. And I can't see why, I don't have a gdc to step 
through the original implementation, and I didn't try to re-configure gmp to use 
GMP_NUMB_BITS==16 though. I don't even know if it's possible. In 
"mpn_divexact_1" you use a fixed 64 bits (bug?) for wrap-around-shift which 
makes me hesitate to poke around with the datatypes.

The code is here:
http://paradice-insight.us/stuff/divide-smallLUT.hpp

Ultimately I try to make a SSE2-version of the code (there are two muls that can 
be done in parallel as long as limp-size is 16 or 32). Maybe it's of interest to 
you after it computes right. :)

Maybe I don't understand the conditions of the computation: "binvert_limb" works 
for all odd divisors? The reduction to ushort/ushort for an odd divisor:

	/* all types unsigned short int */
         u = dividend;
         l = (u * inverse);	/* mul, lo result */
	result = l;

Is intuitively wrong, isn't it? But if you reduce the various revisions of the 
division code in GMP, it all boils down to that, so it must either be a wrong 
assumption of the datatype-compatibility, or some specific condition for 
divisors that I can't see (mentioned).

Thanks
   Niels