U (String function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
m (match syntax diagram to revised template; fix tags.)
Line 1: Line 1:
{{Template:String:U subtitle}}
{{Template:String:U subtitle}}


This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and ''''&'''' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]].
The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and <b>'&amp;amp;'</b> references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant_methods]].


The <var>U</var> method is available as of <var class=product>Sirius Mods</var> version 7.3.
==Syntax==
==Syntax==
{{Template:String:U syntax}}
{{Template:String:U syntax}}
Line 9: Line 8:
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>%unicode</th>
<tr><th>%unicode</th>
<td>A <var>U</var>nicode string variable to receive the <var>U</var>nicode encoding of the method object string.                   </td></tr>
<td>A Unicode string variable to receive the Unicode encoding of the method object <var class="term>string</var>.</td></tr>
<tr><th>string</th>
<tr><th>string</th>
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:       *As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand character.             *At the start of a hexadecimal character reference (for example, the eight characters '&amp;#x201C;' for the <var>U</var>nicode "Left double quotation mark"). The character reference is converted to the referenced character.                                             *As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.                                                                                                                                                             A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr>
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:<ul><li>As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand (<code>'&'</code>) character.<li>At the start of a hexadecimal character reference (for example, the eight characters <code>'&amp;#x201C;'</code> for the Unicode "Left double quotation mark" <code>'&#x201C;'</code>). The character reference is converted to the referenced character.<li>As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.</ul>A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr>
</table>
</table>


==<var>U</var>sage notes==
==Usage notes==
*The <var>U</var> method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic <var>String</var> class (with its CharacterDecode argument implicitly set to ''''True'''').
<ul><li><var>U</var> is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic <var>String</var> class (with its CharacterDecode argument implicitly set to <code>'True'</code>).
*<var>U</var>sing the <var>U</var> method (or EbcdicTo<var>U</var>nicode) is necessary for converting to type <var>U</var>nicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>U</var>nicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>U</var>nicode variable without character reference and without conversion by <var>U</var>.
<li>Using the <var>U</var> method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by <var>U</var>.
*You can find the list of XHTML entities on the Internet at the following <var>U</var>RL:
<li>You can find the list of XHTML entities on the Internet at the following <var>U</var>RL:
<p class="code">http://www.w3.org/TR/xhtml1/dtds.html#h-A2
<p class="code">http://www.w3.org/TR/xhtml1/dtds.html#h-A2
</p>
</p>
===Example===
<li>The <var>U</var> method is available as of <var class="product">[[Sirius Mods]]</var> version 7.3.</ul>


The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (&copy;); the third displays ''''2122'''':
==Examples==
<p class="code">%p <var>U</var>nicode Initial('+')
<ol><li>The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (&copy;); the third displays ''''2122'''':
<p class="code">%p Unicode Initial('+')
Print %p
Print %p
</p>
<li>Entity for copyright symbol:
<p class="code">%copy Unicode Initial('&amp;copy;':U)
Print %copy
</p><li>Constant for trademark symbol:
<p class="code">%tm Unicode Initial('&amp;#x2122;':U)
Print %tm:UnicodeToUtf16:StringToHex
</p></ol>


* Entity for copyright symbol:
===Note===
%copy <var>U</var>nicode Initial('&amp;copy;':<var>U</var>)
Simply specifying ''''Print %tm''''  in the the third Print statement above (or its equivalent ''''Print %tm:UnicodeToEbcdic'''') would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream string, which the StringToHex method converts to its hex representation.
Print %copy


* Constant for trademark symbol:
%tm <var>U</var>nicode Initial('&amp;#x2122;':<var>U</var>)
Print %tm:<var>U</var>nicodeTo<var>U</var>tf16:<var>String</var>ToHex
</p>
====Note====
Simply specifying ''''Print %tm''''  in the the third Print statement above (or its equivalent ''''Print %tm:<var>U</var>nicodeToEbcdic'''') would
attempt to translate to EBCDIC and fail because the <var>U</var>nicode trademark character does not translate to a valid EBCDIC character. But the <var>U</var>nicodeTo<var>U</var>tf16 method can convert the <var>U</var>nicode variable to a byte-stream <var>Longstring</var>, which the <var>String</var>ToHex method converts to its hex representation.
==See also==
==See also==
{{Template:String:U footer}}
{{Template:String:U footer}}

Revision as of 07:05, 2 February 2011

Convert EBCDIC string to Unicode constant, including character encoding (String class)


The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&amp;' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the Constant_methods.

Syntax

%unicode = string:U

Syntax terms

%unicode A Unicode string variable to receive the Unicode encoding of the method object string.
string A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
  • As the substring '&amp;'. This substring is converted to a single ampersand ('&') character.
  • At the start of a hexadecimal character reference (for example, the eight characters '&#x201C;' for the Unicode "Left double quotation mark" '“'). The character reference is converted to the referenced character.
  • As of Sirius Mods version 7.6, an XHTML entity reference (for example, the six characters '&nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.
A decimal character reference (for example, &#172;) is not allowed.

Usage notes

  • U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic String class (with its CharacterDecode argument implicitly set to 'True').
  • Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
  • You can find the list of XHTML entities on the Internet at the following URL:

    http://www.w3.org/TR/xhtml1/dtds.html#h-A2

  • The U method is available as of Sirius Mods version 7.3.

Examples

  1. The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays '2122':

    %p Unicode Initial('+') Print %p

  2. Entity for copyright symbol:

    %copy Unicode Initial('&copy;':U) Print %copy

  3. Constant for trademark symbol:

    %tm Unicode Initial('&#x2122;':U) Print %tm:UnicodeToUtf16:StringToHex

Note

Simply specifying 'Print %tm' in the the third Print statement above (or its equivalent 'Print %tm:UnicodeToEbcdic') would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream string, which the StringToHex method converts to its hex representation.

See also