Bit interleaving

Fri Apr 2 21:42:36 CEST 2004

On Fri, 2 Apr 2004 DTAshley at aol.com wrote:

> Very simply, the dividing line is whether the client of the GMP can 
> efficiently do what it wants to do without that functionality.
> 
> If the client can work efficiently without the functionality, exclude it.
> 
> If the client cannot work efficiently without the functionality, include it.

IMHO, how widely used the client(s) should also be taken into account.  
I'd be less inclined to include code that is only needed by one 
special-case client.

This actually brings up the issue I was discussing earlier: what are the 
limits of the GMP library?  My original goal was to expand GMP to be able 
to efficiently handle GF(2^n) computations.  Several of the responses I've 
gotten have indicated to me that not everyone agrees that this would be a 
good thing.

BTW: One of the reasons I've been quiet lately is that I'm less confident
of what operations I need to be fast in GF(2^n).  Optimal normal bases are
great for hard, but suck for software- polynomial bases are better.  But
polynomials in what? GF(2)?  Or GF(2^m)?  Different answers to this
question give very different code structures.  If the answer is GF(2^m), 
probably with m ~ 16, I'm not sure I want to involve GMP at all- my 
fundamental structure is an array of 16-bit values.

> One might notice that a continued fraction expansion is O(log N).  
> There are a "small" number of integer operations involved (an O[log N]
> number of divisions, etc.).  Therefore, asymptotically as the integers
> get larger, the number of operations outside the GMP (divisions, zero
> tests, etc.) will be small compared with the number of operations that
> occur inside the GMP (limb operations). Therefore, the efficiency will
> be good.  Therefore, there is not enough benefit to including continued
> fraction expansion inside the GMP, as the client can do it efficiently
> with the simpler interface provided.

Operations with points on an elliptic curve, in any number system, have 
the same effect.  Once you provide for the basic operations (addition, 
multiplication, division, modulo, etc) to be efficient (which may, in 
turn, simply be providing more fundamental operations in an efficient 
manner), there are a finite number of high level function calls to 
implement ECs.  So they shouldn't be in GMP.

With respect to the original question, is there a better way to implement 
his request behavior than something like:

    for (i = 0; i < nbits; ++i) {
        dst[(i*spread)/GMP_NUMB_BITS] |= 
            ((src[i/GMP_NUMB_BITS] >> (i % GMP_NUMB_BITS)) & 1) 
            << ((i * spread) % GMP_NUMB_BITS);
    }

Some constant-time optimizations are obvious, but the fundamental 
algorithm of pick a bit out of src and place it into dst is the best I can 
think of.  The other alternative would be table lookups (which only work 
for limited-size spreads).

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian