Character Encoding - BBj
Description
BBj strings are composed of 8-bit characters. The local encoding format is defined by the host operating system and can be retrieved using the BBj function, INFO(1,2). For example, Microsoft Windows uses the 8-bit character set called Cp1252.
Standard Java library classes, such as java.lang.String and java.lang.Character, work with 16-bit characters encoded in the Unicode standard. Conversion functions are provided to convert between BBj strings, encoded in the local 8-bit character set, and Java strings, encoded in the 16-bit Unicode character set. For example:
0010 REM ' Converting between the local character set and Unicode |
When passing characters or strings to Java library routines such as java.lang.Character and java.lang.String, it is important to consider the character encoding format. The encoding format does not matter with methods like String.length() and String.substring() because they do not examine specific characters. It is significant when working with methods like String.toUpperCase() or Character.getType(), because they work with Unicode characters. Use BBjAPI().toUnicode() to get the Unicode equivalent of any BBj string. Note: This method converts any characters that are undefined in the local character set to the Unicode character '\ufffD'.
Use the BBjAPI().toLocal() method to convert Unicode strings returned from Java library routines to BBj format. Note: This method converts any Unicode characters that are undefined in the local character set to the "?" character ($3F$).
BBj programs and data files use the local character set exclusively. Conversion issues between local and Unicode are only important when interacting with Java library routines.
The following program can be used to determine how closely the local 8-bit character set corresponds to Unicode:
0010 PRINT "Characters that differ between ",info(1,2)," and Unicode:",'LF' |
See Also
Character Sets and Character Encoding
For more information about the Unicode standard, seewww.unicode.org