U (String function): Difference between revisions
mNo edit summary |
m (→Usage Notes) |
||
Line 18: | Line 18: | ||
</dl> | </dl> | ||
===Usage Notes=== | ===Usage Notes=== | ||
*The U method is a compile-time-only equivalent of the [[ | *The U method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic String class (with its CharacterDecode argument implicitly set to ''''True''''). | ||
*Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U. | *Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U. | ||
*You can find the list of XHTML entities on the Internet at the following URL: | *You can find the list of XHTML entities on the Internet at the following URL: | ||
http://www.w3.org/TR/xhtml1/dtds.html#h-A2 | http://www.w3.org/TR/xhtml1/dtds.html#h-A2 | ||
===Example=== | ===Example=== | ||
Revision as of 23:27, 20 October 2010
This intrinsic function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the Constant methods.
The U method is available as of Sirius Mods version 7.3.
U syntax
%unicode = string:U
Syntax Terms
- %unicode
- A Unicode string variable to receive the Unicode encoding of the method object string.
- string
- A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
- As the substring '&'. This substring is converted to a single ampersand character.
- At the start of a hexadecimal character reference (for example, the eight characters '“' for the Unicode "Left double quotation mark"). The character reference is converted to the referenced character.
- As of Sirius Mods version 7.6, an XHTML entity reference (for example, the six characters ' ' for the "non-breaking-space" character). The entity reference is converted to the referenced character.
Usage Notes
- The U method is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic String class (with its CharacterDecode argument implicitly set to 'True').
- Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
- You can find the list of XHTML entities on the Internet at the following URL:
http://www.w3.org/TR/xhtml1/dtds.html#h-A2
Example
The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays '2122':
%p Unicode Initial('+') Print %p * Entity for copyright symbol: %copy Unicode Initial('©':U) Print %copy * Constant for trademark symbol: %tm Unicode Initial('™':U) Print %tm:UnicodeToUtf16:StringToHex
Note
Simply specifying 'Print %tm' in the the third Print statement above (or its equivalent 'Print %tm:UnicodeToEbcdic') would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.