U (String function): Difference between revisions
m (1 revision) |
m (1 revision) |
||
Line 3: | Line 3: | ||
This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and ''''&'''' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]]. | This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and ''''&'''' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]]. | ||
The U method is available as of <var class=product>Sirius Mods</var> version 7.3. | The <var>U</var> method is available as of <var class=product>Sirius Mods</var> version 7.3. | ||
==Syntax== | ==Syntax== | ||
{{Template:String:U syntax}} | {{Template:String:U syntax}} | ||
Line 9: | Line 9: | ||
<table class="syntaxTable"> | <table class="syntaxTable"> | ||
<tr><th>%unicode </th> | <tr><th>%unicode </th> | ||
<td>A | <td>A <var>U</var>nicode string variable to receive the <var>U</var>nicode encoding of the method object string. </td></tr> | ||
<tr><th>string </th> | <tr><th>string </th> | ||
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&) in the following cases: *As the substring ''''&amp;''''. This substring is converted to a single ampersand character. *At the start of a hexadecimal character reference (for example, the eight characters '&#x201C;' for the | <td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&) in the following cases: *As the substring ''''&amp;''''. This substring is converted to a single ampersand character. *At the start of a hexadecimal character reference (for example, the eight characters '&#x201C;' for the <var>U</var>nicode "Left double quotation mark"). The character reference is converted to the referenced character. *As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters '&nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character. A decimal character reference (for example, &#172;) is ''not'' allowed.</td></tr> | ||
</table> | </table> | ||
== | ==<var>U</var>sage notes== | ||
*The U method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic String class (with its CharacterDecode argument implicitly set to ''''True''''). | *The <var>U</var> method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic String class (with its CharacterDecode argument implicitly set to ''''True''''). | ||
* | *<var>U</var>sing the <var>U</var> method (or EbcdicTo<var>U</var>nicode) is necessary for converting to type <var>U</var>nicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>U</var>nicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>U</var>nicode variable without character reference and without conversion by <var>U</var>. | ||
*You can find the list of XHTML entities on the Internet at the following | *You can find the list of XHTML entities on the Internet at the following <var>U</var>RL: | ||
http://www.w3.org/TR/xhtml1/dtds.html#h-A2 | http://www.w3.org/TR/xhtml1/dtds.html#h-A2 | ||
Line 23: | Line 23: | ||
The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays ''''2122'''': | The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays ''''2122'''': | ||
%p | %p <var>U</var>nicode Initial('+') | ||
Print %p | Print %p | ||
* Entity for copyright symbol: | * Entity for copyright symbol: | ||
%copy | %copy <var>U</var>nicode Initial('&copy;':<var>U</var>) | ||
Print %copy | Print %copy | ||
* Constant for trademark symbol: | * Constant for trademark symbol: | ||
%tm | %tm <var>U</var>nicode Initial('&#x2122;':<var>U</var>) | ||
Print %tm: | Print %tm:<var>U</var>nicodeTo<var>U</var>tf16:StringToHex | ||
====Note==== | ====Note==== | ||
Simply specifying ''''Print %tm'''' in the the third Print statement above (or its equivalent ''''Print %tm: | Simply specifying ''''Print %tm'''' in the the third Print statement above (or its equivalent ''''Print %tm:<var>U</var>nicodeToEbcdic'''') would | ||
attempt to translate to EBCDIC and fail because the | attempt to translate to EBCDIC and fail because the <var>U</var>nicode trademark character does not translate to a valid EBCDIC character. But the <var>U</var>nicodeTo<var>U</var>tf16 method can convert the <var>U</var>nicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation. | ||
==See also== | ==See also== | ||
[[List of intrinsic String methods]] | [[List of intrinsic String methods]] |
Revision as of 15:32, 19 January 2011
Convert EBCDIC string to Unicode constant, including character encoding (String class)
This intrinsic function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the Constant methods.
The U method is available as of Sirius Mods version 7.3.
Syntax
%unicode = string:U
Syntax terms
%unicode | A Unicode string variable to receive the Unicode encoding of the method object string. |
---|---|
string | A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases: *As the substring '&'. This substring is converted to a single ampersand character. *At the start of a hexadecimal character reference (for example, the eight characters '“' for the Unicode "Left double quotation mark"). The character reference is converted to the referenced character. *As of Sirius Mods version 7.6, an XHTML entity reference (for example, the six characters ' ' for the "non-breaking-space" character). The entity reference is converted to the referenced character. A decimal character reference (for example, ¬) is not allowed. |
Usage notes
- The U method is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic String class (with its CharacterDecode argument implicitly set to 'True').
- Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
- You can find the list of XHTML entities on the Internet at the following URL:
http://www.w3.org/TR/xhtml1/dtds.html#h-A2
Example
The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays '2122':
%p Unicode Initial('+') Print %p
* Entity for copyright symbol: %copy Unicode Initial('©':U) Print %copy
* Constant for trademark symbol: %tm Unicode Initial('™':U) Print %tm:UnicodeToUtf16:StringToHex
Note
Simply specifying 'Print %tm' in the the third Print statement above (or its equivalent 'Print %tm:UnicodeToEbcdic') would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.