UnicodeToUtf8 (Unicode function)

From m204wiki
Revision as of 08:19, 10 December 2010 by 198.242.244.47 (talk) (Created page with "<span style="font-size:120%; color:black"><b>Unicode string converted to UTF-8 byte stream</b></span> UnicodeToUtf8 function [[Category:Int...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Unicode string converted to UTF-8 byte stream

This function converts a Unicode string to a UTF-8 Longstring byte stream.

The UnicodeToUtf8 function is available as of version 7.3 of the Sirius Mods.

Syntax

  %utf8Stream = unicode:UnicodeToUtf8([InsertBOM=bool])])
%utf8Stream
A String or Longstring variable to receive the method object string translated to a UTF-8 byte stream.
unicode
A Unicode string.
InsertBOM=bool
The optional (name required) InsertBOM argument is a Boolean:
  • If its value is True, the “Byte Order Mark” (U+FEFF, encoded as X'EFBBBF') is inserted at the start of the output stream.
  • If its value is False, the default, no Byte Order Mark is inserted.

Exceptions

This function can throw the following exception:

CharacterTranslationException
If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem. See CharacterTranslationException exception class.

Usage Notes

  • For more information about UTF-8 conversions, see ?? refid=utf816.
  • The UnicodeToUtf16 method (described ??UnicodeToUtf16) converts a Unicode string to UTF-16.
  • The Utf8ToUnicode method (described ??Utf8ToUnicode) converts a UTF-8 Longstring byte stream to Unicode.

Examples

In the following fragment, UnicodeToUtf8 is used to show how the Unicode U+B2 character (superscript 2) is represented in UTF-8. Appending the StringToHex method is useful for viewing the hex values of characters that do not have displayable EBCDIC equivalents.

The U constant function used in the example is described ??U; and StringToHex is ??StringToHex.

    %u Unicode Initial('&#xB2;':U)
    Print %u:UnicodeToUtf8:StringToHex

The result is:

    C2B2