UnicodeToUtf8 (Unicode function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
mNo edit summary
 
(35 intermediate revisions by 7 users not shown)
Line 1: Line 1:
<span style="font-size:120%; color:black"><b>Unicode string converted to UTF-8 byte stream</b></span>
{{Template:Unicode:UnicodeToUtf8 subtitle}}
[[Category:Intrinsic Unicode methods|UnicodeToUtf8 function]]
<var>UnicodeToUtf8</var> converts a <var>Unicode</var> string to a UTF-8 <var>Longstring</var> byte stream.
[[Category:Intrinsic methods]]
[[Category:System methods]]
<!--DPL?? Category:Intrinsic Unicode methods|UnicodeToUtf8 function: Unicode string converted to UTF-8 byte stream-->
<!--DPL?? Category:Intrinsic methods|UnicodeToUtf8 (Unicode function): Unicode string converted to UTF-8 byte stream-->
<!--DPL?? Category:System methods|UnicodeToUtf8 (Unicode function): Unicode string converted to UTF-8 byte stream-->


This function converts a Unicode string to a UTF-8
==Syntax==
Longstring byte stream.
{{Template:Unicode:UnicodeToUtf8 syntax}}
 
===Syntax terms===
The UnicodeToUtf8 function is available as of version 7.3 of the ''Sirius Mods''.
<table class="syntaxTable">
===Syntax===
<tr><th>%string</th>
  %utf8Stream = unicode:UnicodeToUtf8([InsertBOM=bool])])
<td>A <var>String</var> or <var>Longstring</var> variable to receive the method object string translated to a UTF-8 byte stream.</td></tr>
<dl>
<tr><th>unicode</th>
<dt><i>%utf8Stream</i>
<td>A <var>Unicode</var> string.</td></tr>
<dd>A String or Longstring variable to receive the method object string
<tr><th><var>InsertBOM</var></th>
translated to a UTF-8 byte stream.
<td>The optional, [[Notation conventions for methods#Named parameters|name required]], <var>InsertBOM</var> argument is a [[Boolean enumeration|Boolean]]:
<dt><i>unicode</i>
<dd>A Unicode string.
<dt><b>InsertBOM=</b><i>bool</i>
<dd>The optional (name required) InsertBOM argument is a Boolean:
<ul>
<ul>
<li>If its value is <tt>True</tt>, the &ldquo;Byte Order
<li>If its value is <code>True</code>, the "Byte Order Mark" (U+FEFF, encoded as X'EFBBBF') is inserted at the start of the output stream.
Mark&rdquo; (U+FEFF, encoded as X'EFBBBF') is inserted at the start of the output
<li>If its value is <code>False</code>, the default, no Byte Order Mark is inserted.
stream.
</ul></td></tr>
<li>If its value is <tt>False</tt>, the default,
</table>
no Byte Order Mark is inserted.
</ul>


</dl>
===Exceptions===
===Exceptions===
 
<var>UnicodeToUtf8</var> can throw the following exception:
This function can throw the following exception:
<dl>
<dl>
<dt>CharacterTranslationException
<dt><var>[[CharacterTranslationException class|CharacterTranslationException]]</var>
<dd>If the method encounters a translation problem,
<dd>If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem.
properties of the exception object may indicate the location and type of problem.
See [[CharacterTranslationException exception class]].
</dl>
</dl>
===Usage Notes===
 
==Usage notes==
<ul>
<ul>
<li>For more information about UTF-8 conversions, see [[??]] refid=utf816.
<li><var>UnicodeToUtf8</var> is available as of <var class="product">Sirius Mods</var> Version 7.3.
<li>The UnicodeToUtf16 method (described ??[[UnicodeToUtf16 (Unicode function)|UnicodeToUtf16]])
converts a Unicode string to UTF-16.
<li>The Utf8ToUnicode method (described ??[[Utf8ToUnicode (String function)|Utf8ToUnicode]])
converts a UTF-8 Longstring byte stream to Unicode.
</ul>
</ul>
===Examples===


In the following fragment, UnicodeToUtf8 is used to show how the
==Examples==
Unicode U+B2 character (superscript 2) is represented in UTF-8.
In the following fragment, <var>UnicodeToUtf8</var> is used to show how the <var>Unicode</var> U+B2 character (superscript 2) is represented in UTF-8. Appending the <var>StringToHex</var> method is useful for viewing the hex values of characters that do not have displayable EBCDIC equivalents.
Appending the StringToHex method is useful for viewing
<p class="code">%u unicode initial('&amp;#xB2;':[[U (String function)|U]])
the hex values of characters that do not have displayable EBCDIC equivalents.
print %u:UnicodeToUtf8:[[StringToHex (String function)|stringToHex]]
</p>
The result is:
<p class="output">C2B2
</p>


The <tt>U</tt> constant function used in the example is described
==See also==
??[[U (String function)|U]]; and StringToHex is ??[[StringToHex (String function)|StringToHex]].
<ul><li>For more information about UTF-8 conversions, see [[Unicode#UTF-8 and UTF-16|"Unicode: UTF-8 and UTF-16"]].
<pre>
<li><var>[[UnicodeToUtf16 (Unicode function)|UnicodeToUtf16]]</var> converts a <var>Unicode</var> string to UTF-16.
    %u Unicode Initial('&amp;#xB2;':U)
<li><var>[[Utf8ToUnicode (String function)|Utf8ToUnicode]]</var> converts a UTF-8 <var>Longstring</var> byte stream to <var>Unicode</var>.
    Print %u:UnicodeToUtf8:StringToHex
</ul>
</pre>
{{Template:Unicode:UnicodeToUtf8 footer}}
 
The result is:
<pre>
    C2B2
</pre>

Latest revision as of 20:10, 6 November 2012

Translate to UTF-8 (Unicode class)

UnicodeToUtf8 converts a Unicode string to a UTF-8 Longstring byte stream.

Syntax

%string = unicode:UnicodeToUtf8[( [InsertBOM= boolean])]

Syntax terms

%string A String or Longstring variable to receive the method object string translated to a UTF-8 byte stream.
unicode A Unicode string.
InsertBOM The optional, name required, InsertBOM argument is a Boolean:
  • If its value is True, the "Byte Order Mark" (U+FEFF, encoded as X'EFBBBF') is inserted at the start of the output stream.
  • If its value is False, the default, no Byte Order Mark is inserted.

Exceptions

UnicodeToUtf8 can throw the following exception:

CharacterTranslationException
If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem.

Usage notes

  • UnicodeToUtf8 is available as of Sirius Mods Version 7.3.

Examples

In the following fragment, UnicodeToUtf8 is used to show how the Unicode U+B2 character (superscript 2) is represented in UTF-8. Appending the StringToHex method is useful for viewing the hex values of characters that do not have displayable EBCDIC equivalents.

%u unicode initial('&#xB2;':U) print %u:UnicodeToUtf8:stringToHex

The result is:

C2B2

See also