Support underscores for mpz/mpq assignments from strings

Fri Jun 11 13:02:11 UTC 2021

> On 11 Jun 2021, at 10:33, Vincent Lefevre <vincent at vinc17.net> wrote:
> 
> On 2021-06-10 21:11:11 +0200, Hans Åberg wrote:
>> The current international standard is to use as decimal separator either a period '.'or a comma ',', and as number separator spaces ' '.
>> https://en.wikipedia.org/wiki/Decimal_separator#Current_standards
> 
> But note that this is mainly for output (for humans), not to read back
> values.

Humans can copy and paste. If I do that with the numbers below that use a space as digit separator, and paste into the calculator app, then it works on MacOS, but not iOS.

https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/statistik-och-analyser/statistik-over-registrerade-vaccinationer-covid-19/

> 
>> However, in a computer context, for writing code, for example the list [1,2,3,4] then becomes the same as [1.2, 3.4], which is not likely what was intended.
> 
> Source code still needs to be readable, understood and maintained
> by humans, which is why a separator may be useful. Since the
> underscore is already used as part of identifiers, it is generally
> a good character to be used as the separator. But it may clash with
> existing features of some languages.

A feature of one programming language may not work well in an another language.

> 
>> Therefore I think the support of such formats should be put into special libraries.
> 
> This means that the library would have to parse and copy the value
> for GMP, something already done by GMP: see
> 
>  /* Remove spaces from the string and convert the result from ASCII to a
>     byte array.  */
> 
> in mpz/set_str.c. This is a bit of a waste. IMHO, a GMP function that
> accepts a byte array would be better for use by special libraries, with
> their own parsing rule. Or advise to use mpn_set_str in such cases?

This is a C standard (discarding spaces) that is is also present in C++, so changing it may break some programs.
https://en.cppreference.com/w/cpp/string/basic_string/stoul

When using a lexer program like Flex, one typically matches the whole number string, and then passes it onto a function like mpz_set_str (as opposed to computing the number value in the lexer). Doing these translations are probably not time critical: a parser typically spends most time in the actions and lexer, and less in the parser part.

So to facilitate that, you might have a special function that indicates which characters should be discarded, and the decimal separators. The international standard mentioned above would require the latter to be a string, like ",.". Setting the former to " " would be the C/C++ behavior, "" would not discard anything.