Add mpz_inp_str to mini-gmp

Sun Jul 10 09:20:08 UTC 2016

>
> I'd like to hear Torbjörn's opinion on using white space as terminator,
> i.e., reading a file containing "12345678 " with base 8 will result in
> an error, which I suspect differs from the behavior of mpz_inp_str in
> the real gmmp.
>
> Correct that my implementation of mpz_inp_str differs in behavior than the
main GMP, but please notice that, unlike the main gmp, my implementation is
consistent between mpz_inp_str and mpz_set_str. This is because my
mpz_inp_str function simply reads in a string and passes that string to
mpz_set_str. The main gmp library does its own processing of a string
separate from mpz_set_str. You alluded to the idea of of using a helper
function so that mpz_inp_str and mpz_set_str could both try to guess the
base of a number without having repeated code. I don't think that is
necessary. I think mpz_set_str should do all of the work and all
mpz_inp_str should do is create a valid string and pass that to mpz_set_str.

I started to look at implementing a helper function that mpz_inp_str and
mpz_set_str could use to determine the base of an input string. The
difficulty with making a single helper function is that both mpz_inp_str
and mpz_set_str rely on two variables that would need to be "returned".
Since only one variable can be returned the other would need to be passed
by reference. The two specific variables are the sign and the base. It
would make the sense to return the base and "return" the sign by changing
the value of the referenced passed as a parameter. This approach seems
rather sloppy though. The other difficulty I found was since mpz_set_str
needs to guess the base of a char* where as mpz_inp_str needs to guess the
base from a file stream, I don't know how we could use the same logic to to
read both a stream and a char*. As mentioned above, I don't think this is
necessary.

I don't like providing a problem without providing a solution, so I propose
the following:
1) We keep the logic that checks for sign and base of a number in both
mpz_inp_str and mpz_set_str functions. This would generally reflect the way
the main gmp library currently handles file stream inputs and users could
potentially get different values depending if they use mpz_inp_str or
mpz_set_str.

2) We change the way the main gmp library's mpz_inp_str handles a string so
that the output from mpz_inp_str matches the output from mpz_set_str. If
you use "12345678" with base=8 on mpz_inp_str, you'll get "342391" base 10
because mpz_inp_str reads the input until an invalid base digit is found,
then converts what's it's read up to that point into an mpz_t number. The
implementation of mpz_set_str won't process partial strings. As soon as it
detects a digit that isn't within the expected base, it returns an error.
Changing the way the main gmp library's mpz_inp_str works could be a rater
drastic change and I have no idea how the programs that rely on it would be
affected, however this would be a forcing function to standardize the way
all strings are handled as input, regardless if the input is from a file
stream or a char*.

For what it's worth, my recommendation is #2. You get a little more
standardization of the library in the way strings are interpreted and
handled and it's only at a cost of reading over the string once with a cost
of O(n), which when added to the cost of mpz_set_str, doesn't really have
an effect.

Some more detailed comments further below.
>
> Thanks for the comments. I hope I'm benefiting the project and not just
wasting your time reviewing amateur code!

> > I have updated mini-gmp to support up to base 62 also.
>
> Please do this as a separate patch.
>
> Understood.

>
> >   ungetc (c, stream);
>
> Not sure if ungetc is safe when c == EOF?
>
> If c == EOF ungetc (c, stream) will return EOF.
https://www.gnu.org/software/libc/manual/html_node/How-Unread.html#How-Unread

--Austyn