U (String function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (1 revision)
 
(26 intermediate revisions by 6 users not shown)
Line 1: Line 1:
{{Template:String:U subtitle}}
{{Template:String:U subtitle}}


This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references, XHTML entity references, and ''''&'''' references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also documented with the [[Constant methods]].
The <var>U</var> [[Intrinsic classes|intrinsic]] method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references and XHTML entity references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also included in the [[System classes and methods#Constant_methods|list of Constant methods]].


The <var>U</var> method is available as of <var class=product>Sirius Mods</var> version 7.3.
==Syntax==
==Syntax==
{{Template:String:U syntax}}
{{Template:String:U syntax}}
===Syntax terms===
===Syntax terms===
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>%unicode</th>
<tr><th>%unicode</th>
<td>A <var>U</var>nicode string variable to receive the <var>U</var>nicode encoding of the method object string.                   </td></tr>
<td>A <var>Unicode</var> variable to receive the Unicode string represented by the method object <var class="term">string</var>.</td></tr>
 
<tr><th>string</th>
<tr><th>string</th>
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, ''string'' may contain an ampersand (&amp;) in the following cases:       *As the substring ''''&amp;amp;''''. This substring is converted to a single ampersand character.             *At the start of a hexadecimal character reference (for example, the eight characters '&amp;#x201C;' for the <var>U</var>nicode "Left double quotation mark"). The character reference is converted to the referenced character.                                             *As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters '&amp;nbsp;' for the "non-breaking-space" character). The entity reference is converted to the referenced character.                                                                                                                                                             A decimal character reference (for example, &amp;#172;) is ''not'' allowed.</td></tr>
<td>A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, <var class="term">string</var> may contain an ampersand (<tt>&amp;</tt>) in the following cases:
<ul>
<li>At the start of the substring <code>&amp;amp;</code>. This substring is converted to a single ampersand (<code>&amp;</code>) character.
 
<li>At the start of a hexadecimal character reference: for example, the eight characters <code>&amp;#x201C;</code> for the Unicode "Left double quotation mark" (<tt>&#x201C;</tt>). The character reference is converted to the referenced character.
 
<li>As of <var class=product>Sirius Mods</var> Version 7.6, an XHTML entity reference (for example, the six characters <code>&amp;nbsp;</code> for the "non-breaking-space" character). The entity reference is converted to the referenced character.
</ul>
 
A decimal character reference (for example, <code>&amp;#172;</code>) is ''not'' allowed.</td></tr>
</table>
</table>


==<var>U</var>sage notes==
==Usage notes==
*The <var>U</var> method is a compile-time-only equivalent of the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] method of the intrinsic <var>String</var> class (with its CharacterDecode argument implicitly set to ''''True'''').
<ul>
*<var>U</var>sing the <var>U</var> method (or EbcdicTo<var>U</var>nicode) is necessary for converting to type <var>U</var>nicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>U</var>nicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>U</var>nicode variable without character reference and without conversion by <var>U</var>.
<li><var>U</var> is a compile-time-only equivalent of the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> method of the intrinsic <var>[[String class|string]]</var> class (with its <var>CharacterDecode</var> argument implicitly set to <code>True</code>).
*You can find the list of XHTML entities on the Internet at the following <var>U</var>RL:
 
<p class="code">http://www.w3.org/TR/xhtml1/dtds.html#h-A2
<li>Using the <var>U</var> method (or <var>EbcdicToUnicode</var>) is necessary for converting to type <var>Unicode</var> if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>Unicode</var> variable otherwise, whereas keyboard-available characters can simply be assigned directly to a <var>Unicode</var> variable without character reference and without conversion by <var>U</var>.
 
<li>The <var>U</var> method is available as of <var class="product">Sirius Mods</var> Version 7.3.</ul>
 
==Examples==
<ol>
<li>The following <var>Print</var> statement displays a plus sign (<code>+</code>):
<p class="code">%p Unicode Initial('+')
print %p
</p>


<li>The following <var>Print</var> statement displays a copyright sign (<code>&copy;</code>):
<p class="code">%copy Unicode Initial('&amp;copy;':U)
print %copy
</p><li>The following <var>Print</var> statement displays <code>2122</code>:
<p class="code">%tm Unicode Initial('&amp;#x2122;':U)
print %tm:UnicodeToUtf16:[[StringToHex (String function)|StringToHex]]
</p>
</p>
===Example===


The first Print statement below displays a plus sign (+); the second Print displays a copyright sign (&copy;); the third displays ''''2122'''':
<p class="note"><b>Note:</b> Simply specifying <code>print %tm</code> in this example would produce an attempt to convert the Unicode character to EBCDIC for printing, but since the Unicode trademark character does not translate to a valid EBCDIC character, the <var>Print</var> output would be the character reference for the trademark: <code>&amp;#x2122;</code>.</p></li>
<p class="code">%p <var>U</var>nicode Initial('+')
Print %p


* Entity for copyright symbol:
<li id="brackets">Say you want to use the following [[XPath]] expression within a [[Janus SOAP]] method:
%copy <var>U</var>nicode Initial('&amp;copy;':<var>U</var>)
Print %copy
<p class="code"><nowiki>*/pers[@name="Rebecca"]</nowiki></p>
<p>
This selects <code>pers</code> children elements with the <code>name</code> attribute equal to <code>Rebecca</code>. However, simply typing the square brackets (<tt>[</tt> <tt>]</tt>) on your terminal keyboard is vulnerable to an XPath "compilation" error when the method using that expression is executed. The error occurs if you have a mismatch between the 3270 configuration on your PC and the [[Unicode#Code points.2C character set mappings|codepage]] setting (as can be determined by the <var>[[UNICODE command#Display forms of UNICODE|UNICODE]]</var> command) in your Online. </p>
<p>
To avoid this potential error, you can use the <var>U</var> method with the <code>&lsqb;</code> and <code>&rsqb;</code> XHTML entities added in Model&nbsp;204 7.6: </p>
<p class="code">%doc:print('*/pers&lsqb;@name="Rebecca"&rsqb;':u)</p>
<p>
Using the XHTML entities like this specifies the square brackets in a way that is sure to be correct. The <var>U</var> method treats the entities as their character equivalents and creates a <var>Unicode</var> constant string as the argument of the method. </p>
<p>
An alternative bracket substitution is to use <var>[[Using variables and values in computation#Declare statements for %variables|Static]]</var> %variables initialized to the correct values, for example: </p>
<p class="code">%lsq is string len 1 static initial('&amp;#x5b;'):u
%rsq is string len 1 static initial('&amp;#x5d;'):u </p>
<p>
But this approach requires declaring a pair of variables (within the scope of the method/subroutine), and it performs a run-time execution of two concatenations as well as conversion from EBCDIC to Unicode. </p>
<p>
For more information about square bracket translations, see [[Unicode#sqbrackets|this short summary]]. </p></li></ol>


* Constant for trademark symbol:
%tm <var>U</var>nicode Initial('&amp;#x2122;':<var>U</var>)
Print %tm:<var>U</var>nicodeTo<var>U</var>tf16:<var>String</var>ToHex
</p>
====Note====
Simply specifying ''''Print %tm''''  in the the third Print statement above (or its equivalent ''''Print %tm:<var>U</var>nicodeToEbcdic'''') would
attempt to translate to EBCDIC and fail because the <var>U</var>nicode trademark character does not translate to a valid EBCDIC character. But the <var>U</var>nicodeTo<var>U</var>tf16 method can convert the <var>U</var>nicode variable to a byte-stream <var>Longstring</var>, which the <var>String</var>ToHex method converts to its hex representation.
==See also==
==See also==
[[List of intrinsic String methods]]
<ul><li>You can find the list of XHTML entities on the Internet at the following <var>U</var>RL:
 
<p class="code">http://www.w3.org/TR/xhtml1/dtds.html#h-A2
[[Category:String methods|U function]]
</p></ul>
[[Category:Intrinsic methods]]
{{Template:String:U footer}}

Latest revision as of 07:40, 3 June 2016

Convert EBCDIC string to Unicode constant, including character encoding (String class)


The U intrinsic method converts an EBCDIC string, which may include XML character and entity references, to a Unicode string. The function also converts XML style hexadecimal character references and XHTML entity references to the represented Unicode character. Since in use the method acts like a Unicode constant, it is also included in the list of Constant methods.

Syntax

%unicode = string:U

Syntax terms

%unicode A Unicode variable to receive the Unicode string represented by the method object string.
string A constant character string value, which may include an XML-style hexadecimal character reference or an XHTML entity reference. That is, string may contain an ampersand (&) in the following cases:
  • At the start of the substring &amp;. This substring is converted to a single ampersand (&) character.
  • At the start of a hexadecimal character reference: for example, the eight characters &#x201C; for the Unicode "Left double quotation mark" (). The character reference is converted to the referenced character.
  • As of Sirius Mods Version 7.6, an XHTML entity reference (for example, the six characters &nbsp; for the "non-breaking-space" character). The entity reference is converted to the referenced character.
A decimal character reference (for example, &#172;) is not allowed.

Usage notes

  • U is a compile-time-only equivalent of the EbcdicToUnicode method of the intrinsic string class (with its CharacterDecode argument implicitly set to True).
  • Using the U method (or EbcdicToUnicode) is necessary for converting to type Unicode if the string you want to convert may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise, whereas keyboard-available characters can simply be assigned directly to a Unicode variable without character reference and without conversion by U.
  • The U method is available as of Sirius Mods Version 7.3.

Examples

  1. The following Print statement displays a plus sign (+):

    %p Unicode Initial('+') print %p

  2. The following Print statement displays a copyright sign (©):

    %copy Unicode Initial('&copy;':U) print %copy

  3. The following Print statement displays 2122:

    %tm Unicode Initial('&#x2122;':U) print %tm:UnicodeToUtf16:StringToHex

    Note: Simply specifying print %tm in this example would produce an attempt to convert the Unicode character to EBCDIC for printing, but since the Unicode trademark character does not translate to a valid EBCDIC character, the Print output would be the character reference for the trademark: &#x2122;.

  4. Say you want to use the following XPath expression within a Janus SOAP method:

    */pers[@name="Rebecca"]

    This selects pers children elements with the name attribute equal to Rebecca. However, simply typing the square brackets ([ ]) on your terminal keyboard is vulnerable to an XPath "compilation" error when the method using that expression is executed. The error occurs if you have a mismatch between the 3270 configuration on your PC and the codepage setting (as can be determined by the UNICODE command) in your Online.

    To avoid this potential error, you can use the U method with the [ and ] XHTML entities added in Model 204 7.6:

    %doc:print('*/pers[@name="Rebecca"]':u)

    Using the XHTML entities like this specifies the square brackets in a way that is sure to be correct. The U method treats the entities as their character equivalents and creates a Unicode constant string as the argument of the method.

    An alternative bracket substitution is to use Static %variables initialized to the correct values, for example:

    %lsq is string len 1 static initial('&#x5b;'):u %rsq is string len 1 static initial('&#x5d;'):u

    But this approach requires declaring a pair of variables (within the scope of the method/subroutine), and it performs a run-time execution of two concatenations as well as conversion from EBCDIC to Unicode.

    For more information about square bracket translations, see this short summary.

See also