U (String function): Difference between revisions
Jump to navigation
Jump to search
m (match syntax diagram to revised template; fix tags.) |
m (tidy up example note) |
||
Line 1: | Line 1: | ||
{{Template:String:U subtitle}} | {{Template:String:U subtitle}} | ||
The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and <b>'&amp;'</b> references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[ | The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and <b>'&amp;'</b> references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]]. | ||
==Syntax== | ==Syntax== | ||
Line 16: | Line 16: | ||
<ul><li><var>U</var> is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic <var>String</var> class (with its CharacterDecode argument implicitly set to <code>'True'</code>). | <ul><li><var>U</var> is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic <var>String</var> class (with its CharacterDecode argument implicitly set to <code>'True'</code>). | ||
<li>Using the <var>U</var> method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by <var>U</var>. | <li>Using the <var>U</var> method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by <var>U</var>. | ||
<li>The <var>U</var> method is available as of <var class="product">[[Sirius Mods]]</var> version 7.3.</ul> | <li>The <var>U</var> method is available as of <var class="product">[[Sirius Mods]]</var> version 7.3.</ul> | ||
Line 32: | Line 29: | ||
<p class="code">%tm Unicode Initial('&#x2122;':U) | <p class="code">%tm Unicode Initial('&#x2122;':U) | ||
Print %tm:UnicodeToUtf16:StringToHex | Print %tm:UnicodeToUtf16:StringToHex | ||
</p> | </p> | ||
===Note=== | ===Note=== | ||
Simply specifying ' | Simply specifying <code>'print %tm'<code> in the the third Print example above (or its equivalent <code>'print %tm:UnicodeToEbcdic'</code>) would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream string, which the StringToHex method converts to its hex representation.</ol> | ||
==See also== | ==See also== | ||
<ul><li>You can find the list of XHTML entities on the Internet at the following <var>U</var>RL: | |||
<p class="code">http://www.w3.org/TR/xhtml1/dtds.html#h-A2 | |||
</p></ul> | |||
{{Template:String:U footer}} | {{Template:String:U footer}} |
Revision as of 22:02, 2 February 2011
Convert EBCDIC string to Unicode constant, including character encoding (String class)
The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the Constant methods.
Syntax
%unicode = string:U
Syntax terms
%unicode | A Unicode string variable to receive the Unicode encoding of the method object string. |
---|---|
string | A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
|
Usage notes
- U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic String class (with its CharacterDecode argument implicitly set to
'True'
). - Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
- The U method is available as of Sirius Mods version 7.3.
Examples
- The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays '2122':
%p Unicode Initial('+') Print %p
- Entity for copyright symbol:
%copy Unicode Initial('©':U) Print %copy
- Constant for trademark symbol:
%tm Unicode Initial('™':U) Print %tm:UnicodeToUtf16:StringToHex
Note
Simply specifying'print %tm'
in the the third Print example above (or its equivalent
'print %tm:UnicodeToEbcdic'
) would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream string, which the StringToHex method converts to its hex representation.
See also
- You can find the list of XHTML entities on the Internet at the following URL: