hash of mpz_t and mpq_t

Wed Jun 16 15:37:57 UTC 2021

Thank you all for your valuable feedback
on this matter.

I can see the point that there are many
possibilities for the hash function and this
is surely a good reason for not providing
one.

The argument I can put forward against
this choice are:
--- Such algorithmic criticism likely applies
to many parts of the code. Though here it
is only a user facing functionality.
--- It is probably better for GMP to provide
its own hash function that would avoid many
potential problems of user defined functions
and their associated debugging and reinventions.
--- The functionality can be defined relatively easily:
size_t mpz_hash(mpz_t inp, size_t seed);

Thank you all for your contribution to GMP and
the discussion.

  Mathieu

On Tue, 15 Jun 2021 at 23:24, Marco Bodrato <bodrato at mail.dm.unipi.it>
wrote:

> Ciao Hans :-)
>
> Il 2021-06-15 19:03 Hans Åberg ha scritto:
> >> On 15 Jun 2021, at 18:44, Marco Bodrato <bodrato at mail.dm.unipi.it>
> >> wrote:
> >> Anyway, the features that different users may ask for a "hash"
> >> function are very different...
>
> If one wants a hash giving the same result everywhere, then I agree with
> the way proposed by Torbjörn: using mpz_export and an hash function on
> the result.
> The drawback is that it requires to allocate a new buffer.
> I'd additionally suggest to handle the sign in some way if one does not
> like to obtain the same hash for 1 and -1, 2 and -2, ... n and -n.
>
> > The C++ standard comes with hash functions for use with its containers
> > std::unordered_map etc, which is was asked for. Typically, one wants
> > something fast here.
>
> But maybe "something fast" is not always good enough for all contexts.
> If a hash must is needed for "random" numbers, I'd suggest
> mpz_get_ui(Z), it is fast.
>
> On the other side, if all the numbers are powers of the same (possibly
> even) base, then mpz_sizeinbase(Z,2) can be the faster, and better
> choice to avoid collisions.
> The hash you proposed, has many collision if applied on powers of 2, and
> always return the same result on any power of 2^128. Is this a problem?
> It depends on the context.
>
> If one doesn't care to obtain different results on different machines,
> then building a hash from the array pointed by mpz_limbs_read(Z) seems a
> good idea (it does not need an additional buffer); possibly taking into
> account not only the mpz_size(Z), but also the mpz_sign(Z).
>
> And for mpq_t, the numref and denref suggested by Torbjörn, bring us
> back to the mpz_t level.
>
> Ĝis,
> m
> _______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>