U (String function): Difference between revisions

Revision as of 15:32, 19 January 2011

Convert EBCDIC string to Unicode constant, including character encoding (String class)

This intrinsic function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the Constant methods.

The U method is available as of Sirius Mods version 7.3.

Syntax

%unicode = string:U

Syntax terms

%unicode

A Unicode string variable to receive the Unicode encoding of the method object string.

string

A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases: *As the substring '&'. This substring is converted to a single ampersand character. *At the start of a hexadecimal character reference (for example, the eight characters '“' for the Unicode "Left double quotation mark"). The character reference is converted to the referenced character. *As of Sirius Mods version 7.6, an XHTML entity reference (for example, the six characters ' ' for the "non-breaking-space" character). The entity reference is converted to the referenced character. A decimal character reference (for example, ¬) is not allowed.

`U`sage notes

The U method is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic String class (with its CharacterDecode argument implicitly set to 'True').
Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
You can find the list of XHTML entities on the Internet at the following URL:

   http://www.w3.org/TR/xhtml1/dtds.html#h-A2

Example

The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays '2122':

   %p Unicode Initial('+')
   Print %p

   * Entity for copyright symbol:
   %copy Unicode Initial('&copy;':U)
   Print %copy

   * Constant for trademark symbol:
   %tm Unicode Initial('&#x2122;':U)
   Print %tm:UnicodeToUtf16:StringToHex

Note

Simply specifying 'Print %tm' in the the third Print statement above (or its equivalent 'Print %tm:UnicodeToEbcdic') would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.

@@ Line 3: / Line 3: @@
 This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and ''''&amp;amp;'''' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]].
-The U method is available as of <var class=product>Sirius Mods</var> version 7.3.
+The <var>U</var> method is available as of <var class=product>Sirius Mods</var> version 7.3.
 ==Syntax==
 {{Template:String:U syntax}}
@@ Line 9: / Line 9: @@
 <table class="syntaxTable">
 <tr><th>%unicode                                                                                                  </th>
-<td>A Unicode string variable to receive the Unicode encoding of the method object string.                    </td></tr>
+<td>A <var>U</var>nicode string variable to receive the <var>U</var>nicode encoding of the method object string.                    </td></tr>
 <tr><th>string                                                                                                    </th>
-<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:       *As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand character.             *At the start of a hexadecimal character reference (for example, the eight characters '&amp;#x201C;' for the Unicode "Left double quotation mark"). The character reference is converted to the referenced character.                                             *As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.                                                                                                                                                              A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr>
+<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:       *As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand character.             *At the start of a hexadecimal character reference (for example, the eight characters '&amp;#x201C;' for the <var>U</var>nicode "Left double quotation mark"). The character reference is converted to the referenced character.                                             *As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.                                                                                                                                                              A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr>
 </table>
-==Usage notes==
+==<var>U</var>sage notes==
-*The U method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic String class (with its CharacterDecode argument implicitly set to ''''True'''').
+*The <var>U</var> method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic String class (with its CharacterDecode argument implicitly set to ''''True'''').
-*Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
+*<var>U</var>sing the <var>U</var> method (or EbcdicTo<var>U</var>nicode) is necessary for converting to type <var>U</var>nicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>U</var>nicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>U</var>nicode variable without character reference and without conversion by <var>U</var>.
-*You can find the list of XHTML entities on the Internet at the following URL:
+*You can find the list of XHTML entities on the Internet at the following <var>U</var>RL:
      http://www.w3.org/TR/xhtml1/dtds.html#h-A2
@@ Line 23: / Line 23: @@
 The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (&copy;); the third displays ''''2122'''':
-     %p Unicode Initial('+')
+     %p <var>U</var>nicode Initial('+')
      Print %p
      * Entity for copyright symbol:
-     %copy Unicode Initial('&amp;copy;':U)
+     %copy <var>U</var>nicode Initial('&amp;copy;':<var>U</var>)
      Print %copy
      * Constant for trademark symbol:
-     %tm Unicode Initial('&amp;#x2122;':U)
+     %tm <var>U</var>nicode Initial('&amp;#x2122;':<var>U</var>)
-     Print %tm:UnicodeToUtf16:StringToHex
+     Print %tm:<var>U</var>nicodeTo<var>U</var>tf16:StringToHex
 ====Note====
-Simply specifying ''''Print %tm''''  in the the third Print statement above (or its equivalent ''''Print %tm:UnicodeToEbcdic'''') would
+Simply specifying ''''Print %tm''''  in the the third Print statement above (or its equivalent ''''Print %tm:<var>U</var>nicodeToEbcdic'''') would
-attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.
+attempt to translate to EBCDIC and fail because the <var>U</var>nicode trademark character does not translate to a valid EBCDIC character. But the <var>U</var>nicodeTo<var>U</var>tf16 method can convert the <var>U</var>nicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.
 ==See also==
 [[List of intrinsic String methods]]

U (String function): Difference between revisions

Revision as of 15:32, 19 January 2011

Contents

Syntax

Syntax terms

`U`sage notes

Example

Note

See also

Navigation menu

U (String function): Difference between revisions

Revision as of 15:32, 19 January 2011

Syntax

Syntax terms

Usage notes

Example

Note

See also

Navigation menu

Search

`U`sage notes