UTF-8 Encoding
Basics
Applications in CODESYS can process a wide variety of characters, for example, to output an error message in various languages. Or to display visualizations in a language selected by the user which accepts user input in a wide variety of languages, characters, or symbols.
If a comprehensive character set is not necessary, or if a project should not be changed, then strings which are encoded Latin-1 format can still be used.
Character Set | Code Page Number | Description | Character Encoding |
---|---|---|---|
ASCII | 20127 |
| 7-bit encoded character |
DOS-Latin-1 | 819, 850 |
| 8-bit encoded character |
Latin-1 | 28591 |
| 8-bit encoded character |
Windows 1252 Encoding | 1252 |
| 8-bit encoded character |
Unicode |
For more information, see: https://home.unicode.org/ | ||
Unicode 14.0 | 144,697 characters | ||
UTF-16 | 1200 |
| 16-bit encoded characters The characters are encoded either in 2 bytes or 4 bytes. |
UTF-8 | 65001 |
| Tuple of 8-bit words per character The characters are encoded in different length from 1 to 4 bytes. |
UTF-8 in CODESYS
Tip
UTF-8 encoding is the encoding with the most comprehensive character set. Therefore, it is recommended that you enable UTF-8 encoding for new projects as well as for existing projects to be used in a new context.
Data Type | Compile Option: UTF8 Encoding for STRING | Which encoding is used project-wide? |
---|---|---|
| Enabled | UTF-8 |
Disabled | Windows 1252 encoding (default Windows encoding) Latin-1 | |
| Enabled | UTF-16 |
Disabled | UTF-16 |
In CODESYS, the STRING
data type can be encoded in Latin-1 or UTF-8 formats. The WSTRING
data type always encodes its characters as Unicode in UTF-16.
Encoding a single string literal in UTF-8 format
Even if the project-wide encoding format is set to Latin-1, you can encode a single literal in UTF-8 format. To do this, add the UTF8#
type prefix to the literal.
{attribute 'monitoring_encoding' := 'UTF-8'} strVarUtf8: STRING := UTF8#'你好,世界!ÜüÄäÖö';
For more information, see: Constant: UTF8# String; Pragma Attribute: monitoring_encoding
String conversion for UTF-8 encoding
If you have enabled UTF-8 encoding project-wide, then you can use the string conversion functions as usual.
String manipulation
Use library functions to manipulate your strings.
If STRING
variables should be manipulated, then an index access to a variable in ASCII format often leads to the desired result. It is better not to use this construct. It is not just a bad programming style. To make matters worse, with UTF-8 encoding, index access leads to unwanted string manipulation.
UTF-8 encoding only for project-wide configuration
A UTF-8 encoding is used if the project-wide compile option UTF8 encoding for STRING is enabled. Library functions and add-ons are then also oriented according to this setting.
If you use single UTF-8 encoded strings, then you need to make sure that they are interpreted correctly wherever they are used. For example, a string variable in the OPC server will be converted to UTF-8 before being transferred to a client if the setting is not selected. Values such as UTF8#'äöü'
would then be misinterpreted. Similar problems can arise when outputting strings in the visualization.