U (String function): Difference between revisions
Jump to navigation
Jump to search
m (tags and link targets) |
m (tags and link targets) |
||
Line 1: | Line 1: | ||
{{Template:String:U subtitle}} | {{Template:String:U subtitle}} | ||
The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and <b>'&amp;'</b> references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[System classes and methods#Constant_methods|"Constant methods"]]. | The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and <b>'&amp;amp;'</b> references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[System classes and methods#Constant_methods|"Constant methods"]]. | ||
==Syntax== | ==Syntax== | ||
Line 10: | Line 10: | ||
<td>A Unicode string variable to receive the Unicode encoding of the method object <var class="term">string</var>.</td></tr> | <td>A Unicode string variable to receive the Unicode encoding of the method object <var class="term">string</var>.</td></tr> | ||
<tr><th>string</th> | <tr><th>string</th> | ||
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&) in the following cases:<ul><li>As the substring ''''&amp;''''. This substring is converted to a single ampersand (<code>'&'</code>) character.<li>At the start of a hexadecimal character reference (for example, the eight characters <code>'&#x201C;'</code> for the Unicode "Left double quotation mark" <code>'“'</code>). The character reference is converted to the referenced character.<li>As of <var class=product>Sirius Mods</var> Version 7.6, an XHTML entity reference (for example, the six characters '&nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.</ul>A decimal character reference (for example, &#172;) is ''not'' allowed.</td></tr> | <td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:<ul><li>As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand (<code>'&'</code>) character.<li>At the start of a hexadecimal character reference (for example, the eight characters <code>'&amp;#x201C;'</code> for the Unicode "Left double quotation mark" <code>'&#x201C;'</code>). The character reference is converted to the referenced character.<li>As of <var class=product>Sirius Mods</var> Version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.</ul>A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr> | ||
</table> | </table> | ||
Line 17: | Line 17: | ||
<li><var>U</var> is a compile-time-only equivalent of the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> method of the intrinsic <var>[[String class|string]]</var> class (with its CharacterDecode argument implicitly set to <code>'True'</code>). | <li><var>U</var> is a compile-time-only equivalent of the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> method of the intrinsic <var>[[String class|string]]</var> class (with its CharacterDecode argument implicitly set to <code>'True'</code>). | ||
<li>Using the <var>U</var> method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by <var>U</var>. | <li>Using the <var>U</var> method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by <var>U</var>. | ||
<li>The <var>U</var> method is available as of <var class="product">[[Sirius Mods| | <li>The <var>U</var> method is available as of <var class="product">[[Sirius Mods|Sirius Mods]]</var> Version 7.3.</ul> | ||
==Examples== | ==Examples== | ||
Line 24: | Line 24: | ||
print %p | print %p | ||
</p> | </p> | ||
<li>The second Print displays a copyright sign <code>©</code>): | <li>The second Print displays a copyright sign <code>&copy;</code>): | ||
<p class="code">%copy Unicode Initial('&copy;':U) | <p class="code">%copy Unicode Initial('&amp;copy;':U) | ||
print %copy | print %copy | ||
</p><li>The third displays <code>2122</code>: | </p><li>The third displays <code>2122</code>: | ||
<p class="code">%tm Unicode Initial('&#x2122;':U) | <p class="code">%tm Unicode Initial('&amp;#x2122;':U) | ||
print %tm:UnicodeToUtf16:[[StringToHex (String_function)|StringToHex]] | print %tm:UnicodeToUtf16:[[StringToHex (String_function)|StringToHex]] | ||
</p> | </p> |
Revision as of 17:46, 4 May 2011
Convert EBCDIC string to Unicode constant, including character encoding (String class)
The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&amp;' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the "Constant methods".
Syntax
%unicode = string:U
Syntax terms
%unicode | A Unicode string variable to receive the Unicode encoding of the method object string. |
---|---|
string | A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
|
Usage notes
- U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic string class (with its CharacterDecode argument implicitly set to
'True'
). - Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
- The U method is available as of Sirius Mods Version 7.3.
Examples
- The first Print statement below displays a plus sign
+
:%p Unicode Initial('+') print %p
- The second Print displays a copyright sign
©
):%copy Unicode Initial('&copy;':U) print %copy
- The third displays
2122
:%tm Unicode Initial('&#x2122;':U) print %tm:UnicodeToUtf16:StringToHex
Note
Simply specifying'print %tm'
in the previous example above (or its equivalent'print %tm:UnicodeToEbcdic'
) would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream string, which the StringToHex method converts to its hex representation.
See also
- You can find the list of XHTML entities on the Internet at the following URL: