U (String function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
mNo edit summary
Line 1: Line 1:
{{Template:String:U subtitle}}
{{Template:String:U subtitle}}


The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and <b>'&amp;amp;amp;'</b> references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[System classes and methods#Constant_methods|"Constant methods"]].
The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references and XHTML entity references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[System classes and methods#Constant_methods|"Constant methods"]].


==Syntax==
==Syntax==
Line 8: Line 8:
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>%unicode</th>
<tr><th>%unicode</th>
<td>A Unicode string variable to receive the Unicode encoding of the method object <var class="term">string</var>.</td></tr>
<td>A <var>Unicode</var> variable to receive the Unicode string represented by the method object <var class="term">string</var>.</td></tr>
<tr><th>string</th>
<tr><th>string</th>
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;amp;) in the following cases:<ul><li>As the substring ''''&amp;amp;amp;''''. This substring is converted to a single ampersand (<code>'&amp;'</code>) character.<li>At the start of a hexadecimal character reference (for example, the eight characters <code>'&amp;amp;#x201C;'</code> for the Unicode "Left double quotation mark" <code>'&amp;#x201C;'</code>). The character reference is converted to the referenced character.<li>As of <var class=product>Sirius Mods</var> Version 7.6, an XHTML entity reference (for example, the six characters '&amp;amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.</ul>A decimal character reference (for example, &amp;amp;#172;) is ''not'' allowed.</td></tr>
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, <var class="term">string</var> may contain an ampersand (<code>&amp;</code>) in the following cases:<ul><li>At the start of the substring <code>&amp;amp;</code>. This substring is converted to a single ampersand (<code>&amp;</code>) character.<li>At the start of a hexadecimal character reference (for example, the eight characters <code>&amp;#x201C;</code> for the Unicode "Left double quotation mark" (<code>&#x201C;</code>)). The character reference is converted to the referenced character.<li>As of <var class=product>Sirius Mods</var> Version 7.6, an XHTML entity reference (for example, the six characters <code>&amp;nbsp;</code> for the "non-breaking-space" character). The entity reference is converted to the referenced character.</ul>A decimal character reference (for example, <code>&amp;#172;</code>) is ''not'' allowed.</td></tr>
</table>
</table>


==Usage notes==
==Usage notes==
<ul>
<ul>
<li><var>U</var> is a compile-time-only equivalent of the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> method of the intrinsic <var>[[String class|string]]</var> class (with its CharacterDecode argument implicitly set to <code>'True'</code>).
<li><var>U</var> is a compile-time-only equivalent of the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> method of the intrinsic <var>[[String class|string]]</var> class (with its <var>CharacterDecode</var> argument implicitly set to <code>True</code>).
<li>Using the <var>U</var> method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by <var>U</var>.
<li>Using the <var>U</var> method (or <var>EbcdicToUnicode</var>) is necessary for converting to type <var>Unicode</var> if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>Unicode</var> variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>Unicode</var> variable without character reference and without conversion by <var>U</var>.
<li>The <var>U</var> method is available as of <var class="product">[[Sirius Mods|Sirius Mods]]</var> Version 7.3.</ul>
<li>The <var>U</var> method is available as of <var class="product">[[Sirius Mods|Sirius Mods]]</var> Version 7.3.</ul>


==Examples==
==Examples==
<ol><li>The first Print statement below displays a plus sign <code>+</code>:
<ol><li>The following <var>Print</var> statement displays a plus sign (<code>+</code>):
<p class="code">%p Unicode Initial('+')
<p class="code">%p Unicode Initial('+')
print %p
print %p
</p>
</p>
<li>The second Print displays a copyright sign <code>&amp;copy;</code>):
<li>The following <var>Print</var> statement displays a copyright sign (<code>&copy;</code>):
<p class="code">%copy Unicode Initial('&amp;amp;copy;':U)
<p class="code">%copy Unicode Initial('&amp;amp;copy;':U)
print %copy
print %copy
</p><li>The third displays <code>2122</code>:
</p><li>The following <var>Print</var> statement displays <code>2122</code>:
<p class="code">%tm Unicode Initial('&amp;amp;#x2122;':U)
<p class="code">%tm Unicode Initial('&amp;#x2122;':U)
print %tm:UnicodeToUtf16:[[StringToHex (String_function)|StringToHex]]
print %tm:UnicodeToUtf16:[[StringToHex (String_function)|StringToHex]]
</p>
</p>


===Note===
===Note===
Simply specifying <code>'print %tm'</code> in the previous example above (or its equivalent <code>'print %tm:UnicodeToEbcdic'</code>) would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the <var>UnicodeToUtf16</var> method can convert the Unicode variable to a byte-stream string, which the <var>StringToHex</var> method converts to its hex representation.</ol>
Simply specifying <code>print %tm</code> in the previous example above would attempt to convert to EBCDIC, but since the Unicode trademark character does not translate to a valid EBCDIC character, the <var>Print</var> output will use a character reference: <code>&amp;#x2122;</code>.</ol>


==See also==
==See also==

Revision as of 19:02, 8 February 2012

Convert EBCDIC string to Unicode constant, including character encoding (String class)


The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references and XHTML entity references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the "Constant methods".

Syntax

%unicode = string:U

Syntax terms

%unicode A Unicode variable to receive the Unicode string represented by the method object string.
string A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
  • At the start of the substring &amp;. This substring is converted to a single ampersand (&) character.
  • At the start of a hexadecimal character reference (for example, the eight characters &#x201C; for the Unicode "Left double quotation mark" ()). The character reference is converted to the referenced character.
  • As of Sirius Mods Version 7.6, an XHTML entity reference (for example, the six characters &nbsp; for the "non-breaking-space" character). The entity reference is converted to the referenced character.
A decimal character reference (for example, &#172;) is not allowed.

Usage notes

  • U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic string class (with its CharacterDecode argument implicitly set to True).
  • Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
  • The U method is available as of Sirius Mods Version 7.3.

Examples

  1. The following Print statement displays a plus sign (+):

    %p Unicode Initial('+') print %p

  2. The following Print statement displays a copyright sign (©):

    %copy Unicode Initial('&amp;copy;':U) print %copy

  3. The following Print statement displays 2122:

    %tm Unicode Initial('&#x2122;':U) print %tm:UnicodeToUtf16:StringToHex

    Note

    Simply specifying print %tm in the previous example above would attempt to convert to EBCDIC, but since the Unicode trademark character does not translate to a valid EBCDIC character, the Print output will use a character reference: &#x2122;.

See also