Encoding

Throughout the application, the UTF-8 encoding using 1 to 3 bytes per character is used for texts.

Basic Latin letters, numbers and punctuation use 1 byte.
European and Middle Eastern letters mostly fit into 2 bytes.
Korean, Chinese and Japanese ideographs use 3 bytes.

UTF-8 4-bytes (including e.g., emojis) is not supported.

Special Characters

The Pricefx application uses the general collation utf8_general_ci to provide good support for most languages (globally). This case-insensitive collation uses a level of normalization which essentially treats special characters as their base characters (e.g., Ä = A, Ö = O, Ü = U, ß = s).

No language specific collations are used.

This affects sorting, comparisons and filtering where special characters are treated as their closest related characters (e.g., Müller is the same as Muller).

In customer implementations, do not use national special characters in business keys.

If national special characters are present in the data, it is best to replace them in the ETL phase with some other (mutually agreed) combination of characters (such as replacing Å with ?A?).