Support underscores for mpz/mpq assignments from strings
haberg-1 at telia.com
Fri Jun 11 13:02:11 UTC 2021
> On 11 Jun 2021, at 10:33, Vincent Lefevre <vincent at vinc17.net> wrote:
> On 2021-06-10 21:11:11 +0200, Hans Åberg wrote:
>> The current international standard is to use as decimal separator either a period '.'or a comma ',', and as number separator spaces ' '.
> But note that this is mainly for output (for humans), not to read back
Humans can copy and paste. If I do that with the numbers below that use a space as digit separator, and paste into the calculator app, then it works on MacOS, but not iOS.
>> However, in a computer context, for writing code, for example the list [1,2,3,4] then becomes the same as [1.2, 3.4], which is not likely what was intended.
> Source code still needs to be readable, understood and maintained
> by humans, which is why a separator may be useful. Since the
> underscore is already used as part of identifiers, it is generally
> a good character to be used as the separator. But it may clash with
> existing features of some languages.
A feature of one programming language may not work well in an another language.
>> Therefore I think the support of such formats should be put into special libraries.
> This means that the library would have to parse and copy the value
> for GMP, something already done by GMP: see
> /* Remove spaces from the string and convert the result from ASCII to a
> byte array. */
> in mpz/set_str.c. This is a bit of a waste. IMHO, a GMP function that
> accepts a byte array would be better for use by special libraries, with
> their own parsing rule. Or advise to use mpn_set_str in such cases?
This is a C standard (discarding spaces) that is is also present in C++, so changing it may break some programs.
When using a lexer program like Flex, one typically matches the whole number string, and then passes it onto a function like mpz_set_str (as opposed to computing the number value in the lexer). Doing these translations are probably not time critical: a parser typically spends most time in the actions and lexer, and less in the parser part.
So to facilitate that, you might have a special function that indicates which characters should be discarded, and the decimal separators. The international standard mentioned above would require the latter to be a string, like ",.". Setting the former to " " would be the C/C++ behavior, "" would not discard anything.
More information about the gmp-discuss