U (String function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
Line 14: Line 14:
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, <var class="term">string</var> may contain an ampersand (<tt>&amp;</tt>) in the following cases:
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, <var class="term">string</var> may contain an ampersand (<tt>&amp;</tt>) in the following cases:
<ul>
<ul>
<li>At the start of the substring <code>&</code>. This substring is converted to a single ampersand (<code>&amp;</code>) character.
<li>At the start of the substring <code>&amp;amp;</code>. This substring is converted to a single ampersand (<code>&amp;</code>) character.


<li>At the start of a hexadecimal character reference: for example, the eight characters <code>&amp;#x201C;</code> for the Unicode "Left double quotation mark" (<tt>&#x201C;</tt>). The character reference is converted to the referenced character.
<li>At the start of a hexadecimal character reference: for example, the eight characters <code>&amp;#x201C;</code> for the Unicode "Left double quotation mark" (<tt>&#x201C;</tt>). The character reference is converted to the referenced character.

Revision as of 19:33, 6 November 2012

Convert EBCDIC string to Unicode constant, including character encoding (String class)


The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references and XHTML entity references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also included in the list of Constant methods.

Syntax

%unicode = string:U

Syntax terms

%unicode A Unicode variable to receive the Unicode string represented by the method object string.
string A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
  • At the start of the substring &amp;. This substring is converted to a single ampersand (&) character.
  • At the start of a hexadecimal character reference: for example, the eight characters &#x201C; for the Unicode "Left double quotation mark" (). The character reference is converted to the referenced character.
  • As of Sirius Mods Version 7.6, an XHTML entity reference (for example, the six characters &nbsp; for the "non-breaking-space" character). The entity reference is converted to the referenced character.
A decimal character reference (for example, &#172;) is not allowed.

Usage notes

  • U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic string class (with its CharacterDecode argument implicitly set to True).
  • Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
  • The U method is available as of Sirius Mods Version 7.3.

Examples

  1. The following Print statement displays a plus sign (+):

    %p Unicode Initial('+') print %p

  2. The following Print statement displays a copyright sign (©):

    %copy Unicode Initial('&copy;':U) print %copy

  3. The following Print statement displays 2122:

    %tm Unicode Initial('&#x2122;':U) print %tm:UnicodeToUtf16:StringToHex

    Simply specifying print %tm in the previous example above would attempt to convert to EBCDIC, but since the Unicode trademark character does not translate to a valid EBCDIC character, the Print output will use a character reference: &#x2122;.

See also