U (String function): Difference between revisions
Jump to navigation
Jump to search
m (→Examples) |
mNo edit summary |
||
Line 5: | Line 5: | ||
==Syntax== | ==Syntax== | ||
{{Template:String:U syntax}} | {{Template:String:U syntax}} | ||
===Syntax terms=== | ===Syntax terms=== | ||
<table class="syntaxTable"> | <table class="syntaxTable"> | ||
<tr><th>%unicode</th> | <tr><th>%unicode</th> | ||
<td>A <var>Unicode</var> variable to receive the Unicode string represented by the method object <var class="term">string</var>.</td></tr> | <td>A <var>Unicode</var> variable to receive the Unicode string represented by the method object <var class="term">string</var>.</td></tr> | ||
<tr><th>string</th> | <tr><th>string</th> | ||
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, <var class="term">string</var> may contain an ampersand (< | <td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, <var class="term">string</var> may contain an ampersand (<tt>&</tt>) in the following cases: | ||
<ul> | |||
<li>At the start of the substring <code>&amp;</code>. This substring is converted to a single ampersand (<code>&</code>) character. | <li>At the start of the substring <code>&amp;</code>. This substring is converted to a single ampersand (<code>&</code>) character. | ||
<li>At the start of a hexadecimal character reference: for example, the eight characters <code>&#x201C;</code> for the Unicode "Left double quotation mark" (< | |||
<li>At the start of a hexadecimal character reference: for example, the eight characters <code>&#x201C;</code> for the Unicode "Left double quotation mark" (<tt>“</tt>). The character reference is converted to the referenced character. | |||
<li>As of <var class=product>Sirius Mods</var> Version 7.6, an XHTML entity reference (for example, the six characters <code>&nbsp;</code> for the "non-breaking-space" character). The entity reference is converted to the referenced character. | |||
</ul> | </ul> | ||
Line 21: | Line 27: | ||
<ul> | <ul> | ||
<li><var>U</var> is a compile-time-only equivalent of the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> method of the intrinsic <var>[[String class|string]]</var> class (with its <var>CharacterDecode</var> argument implicitly set to <code>True</code>). | <li><var>U</var> is a compile-time-only equivalent of the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> method of the intrinsic <var>[[String class|string]]</var> class (with its <var>CharacterDecode</var> argument implicitly set to <code>True</code>). | ||
<li>Using the <var>U</var> method (or <var>EbcdicToUnicode</var>) is necessary for converting to type <var>Unicode</var> if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>Unicode</var> variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>Unicode</var> variable without character reference and without conversion by <var>U</var>. | <li>Using the <var>U</var> method (or <var>EbcdicToUnicode</var>) is necessary for converting to type <var>Unicode</var> if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>Unicode</var> variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>Unicode</var> variable without character reference and without conversion by <var>U</var>. | ||
<li>The <var>U</var> method is available as of <var class="product"> | |||
<li>The <var>U</var> method is available as of <var class="product">Sirius Mods</var> Version 7.3.</ul> | |||
==Examples== | ==Examples== | ||
<ol><li>The following <var>Print</var> statement displays a plus sign (<code>+</code>): | <ol> | ||
<li>The following <var>Print</var> statement displays a plus sign (<code>+</code>): | |||
<p class="code">%p Unicode Initial('+') | <p class="code">%p Unicode Initial('+') | ||
print %p | print %p | ||
</p> | </p> | ||
<li>The following <var>Print</var> statement displays a copyright sign (<code>©</code>): | <li>The following <var>Print</var> statement displays a copyright sign (<code>©</code>): | ||
<p class="code">%copy Unicode Initial('&copy;':U) | <p class="code">%copy Unicode Initial('&copy;':U) |
Revision as of 22:46, 5 November 2012
Convert EBCDIC string to Unicode constant, including character encoding (String class)
The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references and XHTML entity references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also included in the list of Constant methods.
Syntax
%unicode = string:U
Syntax terms
%unicode | A Unicode variable to receive the Unicode string represented by the method object string. |
---|---|
string | A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
¬ ) is not allowed. |
Usage notes
- U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic string class (with its CharacterDecode argument implicitly set to
True
). - Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
- The U method is available as of Sirius Mods Version 7.3.
Examples
- The following Print statement displays a plus sign (
+
):%p Unicode Initial('+') print %p
- The following Print statement displays a copyright sign (
©
):%copy Unicode Initial('©':U) print %copy
- The following Print statement displays
2122
:%tm Unicode Initial('™':U) print %tm:UnicodeToUtf16:StringToHex
Simply specifyingprint %tm
in the previous example above would attempt to convert to EBCDIC, but since the Unicode trademark character does not translate to a valid EBCDIC character, the Print output will use a character reference:™
.
See also
- You can find the list of XHTML entities on the Internet at the following URL: