converting a decimal string to mpz_t (was: converting a decimal string to mpq_t)

Fri May 2 14:06:00 CEST 2025

> On 2025-05-01 11:59:21 +0200, Torbjorn Granlund wrote:
>> Vincent Lefevre<vincent at vinc17.net> writes:
>>
>>    0.1 is a notation that was used in math before floating point existed:
>>    https://en.wikipedia.org/wiki/Decimal#History
>>
>> What formats should we support around any "/"?
>>
>> {0x}d+{E}
>> {0x}d+.d*{E}
>> {0x}d*.d+{E}
>>
>> E in turn is [eE]{+-}d+
>>
>> {} means optional
>> [] is a range
>> d is a digit in the base
> Note that if the base is >= 15, you cannot use [eE] for the exponent
> (see mpf_inp_str).
> 
>> I cannot recall if we let a leading 0 to mean octal.  If we don't,
>> allowing it would create a compatibility problem.
> AFAIK, leading 0 as meaning octal is nowadays discouraged as being
> confusing, in favor of alternate ways to mean octal (e.g. 0o). And
> this has never been used to mean octal in math.
> 
>> Some examples (assuming the argument "base" is 0):
>>
>> 0.1
>> .1
>> 1.
>> 0x0.f   if a mantissa is in hex, the part after a base point should have the same base
>> 17
>>
>> 0.1e1
>> 0.1e+1
>> 0.1e-1
>>
>>
>> .       disallowed: at least one digit is needed
>> 1.f     disallowed unless the "base" argument >= 16
>>
>> How about the base of an exponent?  E.g., should it also be hex if
>> "base" is hex?
> This could be similar to mpf_inp_str / mpfr_inp_str.

mpz_t obviously doesn't have the ability to store decimals "directly" 
but similar to the discussion before there would be a way to handle that 
by also setting a "scale" variable:

for base 10 only
     mpz_set_str_dec10 (mpz_ptr rop, const char *str, int *scale)
otherwise
     mpz_set_str_dec (mpz_ptr rop, const char *str, int *scale, int base)

This function would set rop as-if there wouldn't have been a decimal 
point, ignoring leading zeroes.

   Input 0.1   Output rop = 1 scale = 1, return value 0 (successful)
   Input 1     Output rop = 1 scale = 0, return value 0 (successful)
   Input 1.2   Output rop = 12 scale = 1, return value 0 (successful)

the function _could_ also handle scientific notation (depending on the 
base, I think it would be fine to only accept that for base 10).

I'd guess that something like this function is commonly enough to be 
added to libgmp (and would assume a function within libgmp to be fast), 
but I don't know.

I also don't know if/how libgmp handles decimal/period comma/grouping 
delimiter. Depending on the answer that may mean adding another argument 
("deduce/period/comma").

Opinions?

Simon