U (String function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
m (1 revision)
Line 1: Line 1:
{{Template:String:U subtitle}}
{{Template:String:U subtitle}}


This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and ''''&'''' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]].  
This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and ''''&'''' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]].
                                                                                                             
 
The U method is available as of [[Sirius Mods]] version 7.3.                                                
The U method is available as of [[Sirius Mods]] version 7.3.
==Syntax==
==Syntax==
{{Template:String:U syntax}}
{{Template:String:U syntax}}
Line 13: Line 13:
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:      *As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand character.            *At the start of a hexadecimal character reference (for example, the eight characters '&amp;#x201C;' for the Unicode "Left double quotation mark"). The character reference is converted to the referenced character.                                            *As of [[Sirius Mods]] version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.                                                                                                                                                              A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr>
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:      *As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand character.            *At the start of a hexadecimal character reference (for example, the eight characters '&amp;#x201C;' for the Unicode "Left double quotation mark"). The character reference is converted to the referenced character.                                            *As of [[Sirius Mods]] version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.                                                                                                                                                              A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr>
</table>
</table>
                                                                                                       
 
==Usage notes==
==Usage notes==
*The U method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic String class (with its CharacterDecode argument implicitly set to ''''True'''').            
*The U method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic String class (with its CharacterDecode argument implicitly set to ''''True'''').
*Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.  
*Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
*You can find the list of XHTML entities on the Internet at the following URL:                  
*You can find the list of XHTML entities on the Internet at the following URL:
     http://www.w3.org/TR/xhtml1/dtds.html#h-A2
     http://www.w3.org/TR/xhtml1/dtds.html#h-A2


===Example===                                                                                  
===Example===
                                                                                               
 
The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (&copy;); the third displays ''''2122'''':                              
The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (&copy;); the third displays ''''2122'''':
     %p Unicode Initial('+')                                                                    
     %p Unicode Initial('+')
     Print %p                                                                                    
     Print %p
                                                                                               
 
     * Entity for copyright symbol:                                                              
     * Entity for copyright symbol:
     %copy Unicode Initial('&amp;copy;':U)                                                      
     %copy Unicode Initial('&amp;copy;':U)
     Print %copy                                                                                
     Print %copy
                                                                                               
 
     * Constant for trademark symbol:                                                            
     * Constant for trademark symbol:
     %tm Unicode Initial('&amp;#x2122;':U)                                                      
     %tm Unicode Initial('&amp;#x2122;':U)
     Print %tm:UnicodeToUtf16:StringToHex                                                        
     Print %tm:UnicodeToUtf16:StringToHex
====Note====                                                                                    
====Note====
Simply specifying ''''Print %tm''''  in the the third Print statement above (or its equivalent ''''Print %tm:UnicodeToEbcdic'''') would  
Simply specifying ''''Print %tm''''  in the the third Print statement above (or its equivalent ''''Print %tm:UnicodeToEbcdic'''') would
attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.  
attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.
===See also===                                                                                  
===See also===
[[List of intrinsic String methods]]
[[List of intrinsic String methods]]


[[Category:Intrinsic String methods|U function]]
[[Category:Intrinsic String methods|U function]]
[[Category:Intrinsic methods]]
[[Category:Intrinsic methods]]

Revision as of 14:04, 19 January 2011

Convert EBCDIC string to Unicode constant, including character encoding (String class)


This intrinsic function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and '&amp;' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the Constant methods.

The U method is available as of Sirius Mods version 7.3.

Syntax

%unicode = string:U

Syntax terms

%unicode A Unicode string variable to receive the Unicode encoding of the method object string.
string A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases: *As the substring '&amp;'. This substring is converted to a single ampersand character. *At the start of a hexadecimal character reference (for example, the eight characters '&#x201C;' for the Unicode "Left double quotation mark"). The character reference is converted to the referenced character. *As of Sirius Mods version 7.6, an XHTML entity reference (for example, the six characters '&nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character. A decimal character reference (for example, &#172;) is not allowed.

Usage notes

  • The U method is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic String class (with its CharacterDecode argument implicitly set to 'True').
  • Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
  • You can find the list of XHTML entities on the Internet at the following URL:
   http://www.w3.org/TR/xhtml1/dtds.html#h-A2

Example

The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (©); the third displays '2122':

   %p Unicode Initial('+')
   Print %p
   * Entity for copyright symbol:
   %copy Unicode Initial('&copy;':U)
   Print %copy
   * Constant for trademark symbol:
   %tm Unicode Initial('&#x2122;':U)
   Print %tm:UnicodeToUtf16:StringToHex

Note

Simply specifying 'Print %tm' in the the third Print statement above (or its equivalent 'Print %tm:UnicodeToEbcdic') would attempt to translate to EBCDIC and fail because the Unicode trademark character does not translate to a valid EBCDIC character. But the UnicodeToUtf16 method can convert the Unicode variable to a byte-stream Longstring, which the StringToHex method converts to its hex representation.

See also

List of intrinsic String methods