Add support for (32-bit) Wide-character Unicode

While 16-bit Wide-character Unicode is supported, 32-bit Unicode isn't.
In my C environment, we are using wchar_t by default for all characters;
encoding UTF-8 --> stemming --> decoding UTF-8 would be quite inefficient for a big number of terms.

At the time, in libstemmer_c/runtime/api.h i'm defining:
_typedef wchar_t symbol;_
and can so use the ISO-8859 stem variants without conversions.

For some of the provided languages only the UTF-8 variants exist, what leads us to this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for (32-bit) Wide-character Unicode #267

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for (32-bit) Wide-character Unicode #267

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions