Bitwise logic gmpxx.h problems

Mon Jul 28 19:02:11 CEST 2008

On Sun, 27 Jul 2008, Torbjorn Granlund wrote:

> Here are the new code I am considering for gmpxx.h in GMP 4.2.3.
> Comments?

What I write is really only comments, feel free to ignore any of it. I 
started with a simple answer to your mail and ended up writing a rather 
long rant...

> I will not improve the mpz_init/mpz_clear of other functions for
> 4.2.3, since it is not a release for speed improvements, just bug
> fixes.

I think one possible improvement (obviously not for 4.2.*) is to use more 
in-place operations. For instance, unary compound operations are currently 
implemented as:

__gmp_expr<T, T> temp(expr.val);
Op::eval(p, temp.__get_mp());

I believe it could be safely replaced by:

expr.val.eval(p);
Op::eval(p,p);

which avoids the copy. This gives the compiler a chance to turn "x=-(-x)" 
into a nop instead of two copies.

The unary operator+ is also implemented as making a copy, whereas I guess 
it could be implemented as the identity (although nobody ever uses it so 
it does not matter).

Having more in-place operations would likely require an additional 
argument to __gmp_expr::eval (and __gmp_set_expr) that for instance says 
whether the return pointer is precious (may be used in expr) or simply a 
new value that can be used as temp.

The way gmpxx works has two stages: building the __gmp_expr tree, and 
evaluating it. The tree is nice because when we evaluate it we know there 
is no aliasing taking place between the internal nodes (on the other hand, 
it means that square(x+x) actually performs two additions). The aliasing 
with the return value is the only one that is unknown, and that is one 
reason I think an additional argument giving this information (maybe 
adding at the same time a way to remember that we are doing a += and not 
just any addition, though that can also be checked by comparing pointers 
at runtime) would help. Some rewriting of the tree could be possible, at 
creation time, for instance to replace -(-x) by x or (-x)*y by -(x*y), 
although I am not sure this would give real improvements on not 
stupidly-written code.

I read the doc about in-place operations, and it says that in-place 
mpz_add isn't really useful except when an operand is small, but it often 
avoids an allocation, so it looks useful. I read the code in mpz/aors.h, 
and I find it surprising that the result is required to have one more limb 
than the size of the largest operand even in the case of a substraction 
(for the addition, obviously there can be a carry). Looks like we could 
save a limb, which means a realloc in some cases.

It would also be interesting to make statistics about how often a typical 
gmpxx program calls mpz_set(x,y) with x==y.

Begin most useless rant paragraphs:

I was surprised to see that the allocation policy choice in gmp is at the 
level of the malloc/realloc/free functions. This means that if for 
instance I want a policy like "never allocate less than two limbs" for 
mpz_t, the best I can do is apparently to have the malloc replacement 
allocate more than asked and the realloc replacement do nothing if the new 
size asked is still smaller than already allocated. It would have been 
nicer if I could have made the mpz_t aware of the size actually allocated 
to it. Looks like one of the "bright ideas" in the tasks.

Also, the requirement that realloc and free can't be called on NULL in gmp 
means we can't avoid the allocation of 1 limb even when we know it is 
useless (otherwise it seems like a memset(p,0,sizeof(__mpz_struct)) would 
do in many cases): looks like the lazy allocation thing mentionned in the 
tasks.

End of most useless rant paragraphs :-)

> // Max allocations for plain types when converted to mpz_t
> // FIXME: how do we get the proper max "double" exponent?

I would say DBL_MAX_EXP (from <float.h>) in C and 
std::numeric_limits<double>::max_exponent (from <limits>) in C++ if I 
understand the question.

> #define __GMP_DBL_LIMBS (2 + (1025 - 1) / GMP_NUMB_BITS)

I had never realized double could only represent such small integers that 
they can be allocated on the stack (I thought this trick would be used 
only for signed/unsigned long, and for initialization of a mpf by a 
double).

> #define __GMP_ULI_LIMBS (1 + (8 * sizeof (long) - 1) / GMP_NUMB_BITS)
>
> #define __GMPXX_LOGOP_UI(OP)						\

The name of the macro does not reflect the fact that it is for mpz only 
(not mpq, which could get a similar macro later).

>  mpz_t temp;								\
>  mp_limb_t limbs[__GMP_ULI_LIMBS];					\
>  temp->_mp_d = limbs;							\
>  temp->_mp_alloc = __GMP_ULI_LIMBS;					\
>  mpz_set_ui (temp, l);						\
>  mpz_##OP (z, w, temp)

I thought you could leave the OP part out of the macro, make the macro 
only provide a temp for use by the calling code. If I want to reuse this 
one for sub(unsigned,mpz) I need to call sub(mpz,unsigned) and then neg. 
For division it does not work. Though if the permutation of the arguments 
is the only issue it would be pretty easy to provide a second macro for 
this.

The original macros I thought of were:
(leaving out the trailing \, they are inconvenient in the mail)

#define MPZ_INIT2_STACK(temp,N)
mp_limb_t limbs[N];
temp->_mp_d=limbs;
temp->_mp_alloc=N

#define MPZ_CLEAR_STACK(temp) /* do nothing */

with versions:

#define MPZ_INIT_SET_UI_STACK(temp,l)
MPZ_INIT2_STACK(temp,__GMP_ULI_LIMBS);
mpz_set_ui(temp,l)

modelled after the regular mpz_init2 and mpz_init_set_ui. I even left the 
mpz_t declaration out of it so a #define MPZ_INIT2_STACK mpz_init2 (and 
similarly for mpz_clear) would work.

I don't know what is best. Anyway I believe these macros are not 
interfaces so they can be changed in the next release.

> #define __GMPXX_LOGOP_SI(OP)						\
>  mpz_t temp;								\
>  mp_limb_t limbs[__GMP_ULI_LIMBS];					\
>  temp->_mp_d = limbs;							\
>  temp->_mp_alloc = __GMP_ULI_LIMBS;					\
>  mpz_set_si (temp, l);							\
>  mpz_##OP (z, w, temp)
> #define __GMPXX_LOGOP_D(OP)						\
>  mpz_t temp;								\
>  mp_limb_t limbs[__GMP_DBL_LIMBS];					\
>  temp->_mp_d = limbs;							\
>  temp->_mp_alloc = __GMP_DBL_LIMBS;					\
>  mpz_set_d (temp, d);							\
>  mpz_##OP (z, w, temp)
>
> struct __gmp_binary_and
> {
>  static void eval(mpz_ptr z, mpz_srcptr w, mpz_srcptr v)
>  { mpz_and(z, w, v); }
>
>  static void eval(mpz_ptr z, mpz_srcptr w, unsigned long int l)
>  {  __GMPXX_LOGOP_UI(and);  }
>  static void eval(mpz_ptr z, unsigned long int l, mpz_srcptr w)
>  {  __GMPXX_LOGOP_UI(and);  }
>  static void eval(mpz_ptr z, mpz_srcptr w, signed long int l)
>  {  __GMPXX_LOGOP_SI(and);  }
>  static void eval(mpz_ptr z, signed long int l, mpz_srcptr w)
>  {  __GMPXX_LOGOP_SI(and);  }
>  static void eval(mpz_ptr z, mpz_srcptr w, double d)
>  {  __GMPXX_LOGOP_D(and); }
>  static void eval(mpz_ptr z, double d, mpz_srcptr w)
>  {  __GMPXX_LOGOP_D(and); }
> };
>
> struct __gmp_binary_ior
> {
>  static void eval(mpz_ptr z, mpz_srcptr w, mpz_srcptr v)
>  { mpz_ior(z, w, v); }
>  static void eval(mpz_ptr z, mpz_srcptr w, unsigned long int l)
>  {  __GMPXX_LOGOP_UI(ior);  }
>  static void eval(mpz_ptr z, unsigned long int l, mpz_srcptr w)
>  {  __GMPXX_LOGOP_UI(ior);  }
>  static void eval(mpz_ptr z, mpz_srcptr w, signed long int l)
>  {  __GMPXX_LOGOP_SI(ior);  }
>  static void eval(mpz_ptr z, signed long int l, mpz_srcptr w)
>  {  __GMPXX_LOGOP_SI(ior);  }
>  static void eval(mpz_ptr z, mpz_srcptr w, double d)
>  {  __GMPXX_LOGOP_D(ior); }
>  static void eval(mpz_ptr z, double d, mpz_srcptr w)
>  {  __GMPXX_LOGOP_D(ior); }
> };
>
> struct __gmp_binary_xor
> {
>  static void eval(mpz_ptr z, mpz_srcptr w, mpz_srcptr v)
>  { mpz_xor(z, w, v); }
>  static void eval(mpz_ptr z, mpz_srcptr w, unsigned long int l)
>  {  __GMPXX_LOGOP_UI(xor);  }
>  static void eval(mpz_ptr z, unsigned long int l, mpz_srcptr w)
>  {  __GMPXX_LOGOP_UI(xor);  }
>  static void eval(mpz_ptr z, mpz_srcptr w, signed long int l)
>  {  __GMPXX_LOGOP_SI(xor);  }
>  static void eval(mpz_ptr z, signed long int l, mpz_srcptr w)
>  {  __GMPXX_LOGOP_SI(xor);  }
>  static void eval(mpz_ptr z, mpz_srcptr w, double d)
>  {  __GMPXX_LOGOP_D(xor); }
>  static void eval(mpz_ptr z, double d, mpz_srcptr w)
>  {  __GMPXX_LOGOP_D(xor); }
> };

Aren't you missing the macro change to make operators like &= work? I 
think it was just a PP->P and ZZ->Z to make it mimic operator+=.

While I am writing an email, I might as well add a clean-up. In gmp.h, 
there is:

#define __need_size_t  /* tell gcc stddef.h we only want size_t */
#if defined (__cplusplus)
#include <cstddef>     /* for size_t */
#else
#include <stddef.h>    /* for size_t */
#endif
#undef __need_size_t

It should be changed to:

#if defined (__cplusplus)
#include <cstddef>     /* for size_t */
#else
#define __need_size_t  /* tell gcc stddef.h we only want size_t */
#include <stddef.h>    /* for size_t */
#undef __need_size_t
#endif

The C++ headers are not ready for use with __need_*, as can be seen by 
compiling a file that contains only these lines with g++, and the only 
reason it did not cause problems is that iosfwd (included before in gmp.h) 
already includes cstddef (without any __need_*) in the current g++ 
implementation.

-- 
Marc Glisse