Functions for reading mpz_t from wide character string

Cade Brown brown.cade at gmail.com
Wed Oct 14 23:30:22 UTC 2020


John,

Does your code need the parser to ignore those characters? In general, I
think you should either write your lexer to report such characters as
errors (i.e. ensure all are ascii, or apply some unicode normalization
algorithms).

Depending on how in depth you want to get, you can generate a unicode
database for use in your project (here's an example of one I did:
https://raw.githubusercontent.com/ChemicalDevelopment/kscript/master/tools/gen_unicode.py
)
Using the database, you would organize the data into a hash table or sorted
list, and then query for unicode information. You can have it emit the
'decimal equivalent' of a string, and you can check the category of the
character.

Otherwise, see here:
https://stackoverflow.com/questions/4884854/unicode-string-normalization-in-c-c
. The utf8proc library mentioned in some of the answers is a good go-to.
However, it's unlikely that GMP will grow to support such a functionality
(in my opinion, that's for the better. You should just generate a
latin-ized ASCII string to give to GMP instead).


Thanks,
----
*Cade Brown*
Research Assistant @ ICL (Innovative Computing Laboratory)
Personal Email: brown.cade at gmail.com
ICL/College Email: cade at utk.edu




On Wed, Oct 14, 2020 at 3:49 AM John Scott <jscott at posteo.net> wrote:

> Hi,
>
> It doesn't appear that GMP has a function to read from a wide string and
> for
> my use case a function like gmp_swscanf would be appreciated. Even if I
> were
> to convert to a multibyte string I'm not sure whether gmp_sscanf can
> handle
> that, for example on platforms where __STDC_MB_MIGHT_NEQ_WC__ may be
> defined.
>
> In fact I can't find a bignum library that has such a function, so maybe
> I'm
> going about this wrong. I don't have any code with substance to share, but
> I'm
> working on writing a calculator in C and writing the lexing and parsing
> bits.
> I think working with wide characters are my best choice for ignoring
> locales'
> decimal separators, character classification, etc., but corrections and
> suggestions would be
> appreciated._______________________________________________
> gmp-discuss mailing list
> gmp-discuss at gmplib.org
> https://gmplib.org/mailman/listinfo/gmp-discuss
>


More information about the gmp-discuss mailing list