U (String function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (tidy up example note)
m (examples again)
Line 19: Line 19:


==Examples==
==Examples==
<ol><li>The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (&copy;); the third displays ''''2122'''':
<ol><li>The first Print statement below displays a plus sign (+):
<p class="code">%p Unicode Initial('+')
<p class="code">%p Unicode Initial('+')
Print %p
Print %p
</p>
</p>
<li>Entity for copyright symbol:
<li>The second Print displays a copyright sign (&copy;):
<p class="code">%copy Unicode Initial('&amp;copy;':U)
<p class="code">%copy Unicode Initial('&amp;copy;':U)
Print %copy
Print %copy
</p><li>Constant for trademark symbol:
</p><li>The third displays ''''2122'''':
<p class="code">%tm Unicode Initial('&amp;#x2122;':U)
<p class="code">%tm Unicode Initial('&amp;#x2122;':U)
Print %tm:UnicodeToUtf16:StringToHex
Print %tm:UnicodeToUtf16:StringToHex

Revision as of 22:11, 2 February 2011

Convert EBCDIC string to Unicode constant, including character encoding (String class)


The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&amp;' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the Constant methods.

Syntax

%unicode = string:U

Syntax terms

%unicode A Unicode string variable to receive the Unicode encoding of the method object string.
string A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
  • As the substring '&amp;'. This substring is converted to a single ampersand ('&') character.
  • At the start of a hexadecimal character reference (for example, the eight characters '&#x201C;' for the Unicode "Left double quotation mark" '“'). The character reference is converted to the referenced character.
  • As of Sirius Mods version 7.6, an XHTML entity reference (for example, the six characters '&nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.
A decimal character reference (for example, &#172;) is not allowed.

Usage notes

  • U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic String class (with its CharacterDecode argument implicitly set to 'True').
  • Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
  • The U method is available as of Sirius Mods version 7.3.

Examples

  1. The first Print statement below displays a plus sign (+):

    %p Unicode Initial('+') Print %p

  2. The second Print displays a copyright sign (©):

    %copy Unicode Initial('&copy;':U) Print %copy

  3. The third displays '2122':

    %tm Unicode Initial('&#x2122;':U) Print %tm:UnicodeToUtf16:StringToHex

    Note

    Simply specifying 'print %tm' in the the third Print example above (or its equivalent 'print %tm:UnicodeToEbcdic') would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream string, which the StringToHex method converts to its hex representation.

See also