hash of mpz_t and mpq_t

Marco Bodrato bodrato at mail.dm.unipi.it
Tue Jun 15 21:24:06 UTC 2021


Ciao Hans :-)

Il 2021-06-15 19:03 Hans Åberg ha scritto:
>> On 15 Jun 2021, at 18:44, Marco Bodrato <bodrato at mail.dm.unipi.it> 
>> wrote:
>> Anyway, the features that different users may ask for a "hash" 
>> function are very different...

If one wants a hash giving the same result everywhere, then I agree with 
the way proposed by Torbjörn: using mpz_export and an hash function on 
the result.
The drawback is that it requires to allocate a new buffer.
I'd additionally suggest to handle the sign in some way if one does not 
like to obtain the same hash for 1 and -1, 2 and -2, ... n and -n.

> The C++ standard comes with hash functions for use with its containers
> std::unordered_map etc, which is was asked for. Typically, one wants
> something fast here.

But maybe "something fast" is not always good enough for all contexts.
If a hash must is needed for "random" numbers, I'd suggest 
mpz_get_ui(Z), it is fast.

On the other side, if all the numbers are powers of the same (possibly 
even) base, then mpz_sizeinbase(Z,2) can be the faster, and better 
choice to avoid collisions.
The hash you proposed, has many collision if applied on powers of 2, and 
always return the same result on any power of 2^128. Is this a problem? 
It depends on the context.

If one doesn't care to obtain different results on different machines, 
then building a hash from the array pointed by mpz_limbs_read(Z) seems a 
good idea (it does not need an additional buffer); possibly taking into 
account not only the mpz_size(Z), but also the mpz_sign(Z).

And for mpq_t, the numref and denref suggested by Torbjörn, bring us 
back to the mpz_t level.

Ĝis,
m


More information about the gmp-discuss mailing list