EbcdicToUnicode (String function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(38 intermediate revisions by 7 users not shown)
Line 1: Line 1:
{{Template:String:EbcdicToUnicode subtitle}}
{{Template:String:EbcdicToUnicode subtitle}}


This [[Intrinsic classes|intrinsic]] function converts an EBCDIC string to Unicode    
<var>EbcdicToUnicode</var> is an [[Intrinsic classes|intrinsic]] function that converts an EBCDIC string to Unicode using the current [[Unicode]] tables. Options are available to control:<ul><li>The conversion to the represented Unicode character of XML-style hexadecimal character references, XHTML entity references, and <code>&amp;amp;</code> references<li>How to handle untranslatable EBCDIC characters</ul>
using the current Unicode tables.                                                    
 
As an option, XML style hexadecimal character references, XHTML entity references,    
and ''''&amp;amp;'''' references are converted to the represented Unicode character. 
An additional option lets you specify how to handle untranslatable EBCDIC characters
                                                                                     
The EbcdicToUnicode function is available as of version 7.3 of the [[Sirius Mods]].   
==Syntax==
==Syntax==
{{Template:String:EbcdicToUnicode syntax}}
{{Template:String:EbcdicToUnicode syntax}}
===Syntax terms===
===Syntax terms===
<dl>                                                                                  
<table class="syntaxTable">
<dt>%unicode                                                                          
<tr><th>%unicode</th>
<dd>A string variable to receive the method object string translated to Unicode.      
<td>A <var>Unicode</var> variable to receive the method object <var class="term">string</var> translated to Unicode.</td></tr>
<dt>string                                                                            
 
<dd>An EBCDIC character string.                                                      
<tr><th>string</th>
<dt>CharacterDecode=bool                                                             
<td>An EBCDIC character string.</td></tr>
<dd>The optional (name required) CharacterDecode argument is a [[Boolean]]:          
 
*If its value is ''''True'''', an ampersand (&) in the input EBCDIC string is allowed ''only'' as the beginning of one of these types of character or entity reference:                                              
<tr><th><var>CharacterDecode</var></th>
**The substring ''''&amp;amp;''''. This substring is converted to a single ''''&'''' character.                          
<td>The optional, [[Notation conventions for methods#Named parameters|name required]], argument <var>CharacterDecode</var> is a <var>[[Boolean enumeration]]</var>:
**A hexadecimal character reference (for example, the eight characters '&amp;#x201C;' for the Unicode ''Left double quotation mark''). The character reference is converted to the referenced character.  
<ul>
**As of [[Sirius Mods]] version 7.6, an XHTML entity reference (for example, the six characters '&amp;copy;' for the "copyright" character). The entity reference is converted to the referenced character.                        
<li>If its value is <var>True</var>, an ampersand (<code>&amp;</code>) in the input EBCDIC string is allowed <b>only</b> as the beginning of one of these types of character or entity reference:
                                                                                     
<ul>
A decimal character reference (for example, &amp;#172;) is ''not'' allowed.                                    
<li>The substring <code>&amp;amp;</code>. This substring is converted to a single <code>&amp;</code> character.
</dd>
<li>A hexadecimal character reference (for example, the eight characters <code>&amp;#x201C;</code> for the Unicode <i>Left double quotation mark</i>, &#x201C;). The character reference is converted to the referenced character.
*If its value is ''''False'''', the default, an ampersand is treated only as a normal character.  
<li>As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters <code>&amp;copy;</code> for the <i>copyright</i> character, &copy;). The entity reference is converted to the referenced character.</ul>
<dt>Untranslatable=char                                                                                       
<p> A decimal character reference (for example, <code>&amp;#172;</code>) is <b>not</b> allowed.</p>
<dd>The optional (name required) Untranslatable argument is a single character or a null string that specifies  
<li>If its value is <var>False</var>, the default, an ampersand is treated only as a normal character.
how to handle EBCDIC input characters that are not translatable to Unicode:                                    
</ul></td></tr>
*If the value is a single Unicode character, any untranslatable EBCDIC characters are replaced with that Unicode character.                                                                      
<tr><th><var>Untranslatable</var></th>
*If the value is the null string, any untranslatable EBCDIC characters are removed from the input string.                                                                                        
<td>The optional, name required, argument <var>Untranslatable</var> is a single character or a null string that specifies how to handle EBCDIC input characters that are not translatable to Unicode:
                                                                                                               
<ul>
The ''''Untranslatable'''' parameter is optional. If it is omitted and an EBCDIC character is encountered that is not translatable to Unicode, a [[CharacterTranslationException]] exception is thrown.                                                                                                              
<li>If the value is a single Unicode character, any untranslatable EBCDIC characters are replaced with that Unicode character.
The Untranslatable parameter is available as of [[Sirius Mods]] version 7.5.
<li>If the value is the null string, any untranslatable EBCDIC characters are removed from the input string.
It provides the functionality formerly provided by the EbcdicTranslateNonUnicode and the EbcdicRemoveNonUnicode methods, which are invalid as of [[Sirius Mods]] 7.5.
</ul>
                                                           
The <var>Untranslatable</var> parameter is optional. If it is omitted and an EBCDIC character is encountered that is not translatable to Unicode, a <var>[[CharacterTranslationException class|CharacterTranslationException]]</var> exception is thrown.<p>The <var>Untranslatable</var> parameter is available as of <var class="product">Sirius Mods</var> version 7.5. It provides the functionality formerly provided by the <var>EbcdicTranslateNonUnicode</var> and the <var>EbcdicRemoveNonUnicode</var> methods, which are invalid as of <var class="product">Sirius Mods</var> 7.5.</p></td></tr>
</dl>                                                            
</table>
===Exceptions===                                                                                               
 
This [[Intrinsic classes|intrinsic]] function can throw the following exception:
==Exceptions==
<dl>                                                                                                          
<var>EbcdicToUnicode</var> can throw the following exception:
<dt>CharacterTranslationException                                                                              
<dl>
<dd>If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem.  
<dt><var>[[CharacterTranslationException class|CharacterTranslationException]]</var>
</dl>                                                                                                          
<dd>If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem.
</dl>
 
==Usage notes==
==Usage notes==
*You can find the list of XHTML entities on the Internet at the following URL:
<ul>
    http://www.w3.org/TR/xhtml1/dtds.html#h-A2
<li>Using <var>EbcdicToUnicode</var> with <code>CharacterDecode=True</code> (or using the <var>[[U (String function)|U]]</var> function) is necessary if the string you want to convert to Unicode may contain a hexadecimal or XHTML entity character reference which you want converted to the corresponding Unicode character.
*More information is available about the [[Unicode Tables|Unicode tables]].
<li><var>EbcdicToUnicode</var> is available as of <var class="product">Sirius Mods</var> version 7.3.
*The [[EbcdicToAscii (String function)|EbcdicToAscii]] method converts an EBCDIC string to ASCII.                                                                         
</ul>
*The [[U (String function)|U]] function is a compile-time-only equivalent of the EbcdicToUnicode method (with CharacterDecode argument implicitly set to ''''True'''').
*Using EbcdicToUnicode (or the U function) is necessary if the string you want to convert to Unicode may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise.


==Examples==
==Examples==
The following fragment shows four calls of EbcdicToUnicode: respectively against translatable EBCDIC characters, a string with a character reference, a string with an entity reference, and a string with an EBCDIC character that cannot be translated to Unicode.  
The following fragment shows four calls of <var>EbcdicToUnicode</var>: respectively against translatable EBCDIC characters, a string with a character reference, a string with an entity reference, and a string with an EBCDIC character that cannot be translated to Unicode. The <var>[[X (String function)|X]]</var> constant function is used in the example.
The [[X (String function)|X]] constant function is used in the example.
<p class="code">%e string Len 20
    %e String Len 20                                        
%u unicode
    %u Unicode                         
%e = '12'
    %e = '12'                      
%u = %e:EbcdicToUnicode
    %u = %e:EbcdicToUnicode                
Print %u
    Print %u                      
Print %u:UnicodeToUtf16:StringToHex
    Print %u:UnicodeToUtf16:StringToHex  
 
                                                                                                           
%e = '1&amp;#x2122;2'
    %e = '1&amp;#x2122;2'
%u = %e:EbcdicToUnicode(CharacterDecode=True)
    %u = %e:EbcdicToUnicode(CharacterDecode=True)  
Print %u:UnicodeToUtf16:StringToHex
    Print %u:UnicodeToUtf16:StringToHex  
 
                                                                                                           
%e = '&amp;copy;'
    %e = '&amp;copy;'                                  
%u = %e:EbcdicToUnicode(CharacterDecode=True)
    %u = %e:EbcdicToUnicode(CharacterDecode=True)  
Print %u
    Print %u  
 
                                                                                                           
%e = 'F1FFF2':X
    %e = 'F1FFF2':X
%u = %e:EbcdicToUnicode
    %u = %e:EbcdicToUnicode                                
                                                                                                       
The result of the above fragment is:     
    12                                                                                                 
    00310032                                                                                           
    003121220032                                                                                       
    &copy;                                                                                         
    CANCELLING REQUEST: MSIR.0751: Class STRING, function
      EBCDICTOUNICODE: CHARACTER TRANSLATIONEXCEPTION
      exception: EBCDIC character X'FF' without valid
      translation to Unicode at byte position 2 ...
====Note====                                                                                                 
The initial ''''Print %u'''' statement in the example above is not very revealing because it is
equivalent to specifying ''''Print %u:[[UnicodeToEbcdic (Unicode function)|UnicodeToEbcdic]]'''' &mdash;
a Unicode string is implicitly converted to EBCDIC
when it is used in an EBCDIC context like a Print statement.
The [[UnicodeToUtf16 (Unicode function)|UnicodeToUtf16]] method, however, converts the Unicode variable to a byte-stream string, which the [[StringToHex (String function)|StringToHex]] method converts to its hex representation.


===See also===                                                                                       
</p>
[[List of intrinsic String methods]]
The result of the above fragment is:
<p class="output">12
00310032
003121220032
&copy;
CANCELLING REQUEST: MSIR.0751: Class STRING, function EBCDICTOUNICODE: CHARACTER TRANSLATIONEXCEPTION
  exception: EBCDIC character X'FF' without valid translation to Unicode at byte position 2 ...
</p><p>
<b>Note:</b> The initial <code>Print %u</code> statement in the example above is not very revealing because it is equivalent to specifying <code>Print %u:[[UnicodeToEbcdic (Unicode function)|UnicodeToEbcdic]]</code> &mdash; a <var>Unicode</var> string is implicitly converted to EBCDIC when it is used in an EBCDIC context like a Print statement.  <var>[[UnicodeToUtf16 (Unicode function)|UnicodeToUtf16]]</var>, however, converts the <var>Unicode</var> variable to a byte-stream string, which <var>[[StringToHex (String function)|StringToHex]]</var> converts to its hex representation.</p>


[[Category:Intrinsic String methods|EbcdicToUnicode function]]
==See also==
[[Category:Intrinsic methods]]
<ul><li><var>[[U (String function)|U]]</var> is a compile-time-only equivalent of the <var>EbcdicToUnicode</var> method (with the <var>CharacterDecode</var> argument implicitly set to <code>True</code>).
<li>You can find the list of XHTML entities on the Internet at the following URL:
<p class="code">http://www.w3.org/TR/xhtml1/dtds.html#h-A2
</p>
<li>More information is available about [[Unicode]].
<li>The <var>[[EbcdicToAscii (String function)|EbcdicToAscii]]</var> method converts an EBCDIC string to ASCII.</ul>
{{Template:String:EbcdicToUnicode footer}}

Latest revision as of 19:16, 6 November 2012

Convert EBCDIC string to Unicode (String class)


EbcdicToUnicode is an intrinsic function that converts an EBCDIC string to Unicode using the current Unicode tables. Options are available to control:

  • The conversion to the represented Unicode character of XML-style hexadecimal character references, XHTML entity references, and &amp; references
  • How to handle untranslatable EBCDIC characters

Syntax

%unicode = string:EbcdicToUnicode[( [CharacterDecode= boolean], - [Untranslatable= unicode])] Throws CharacterTranslationException

Syntax terms

%unicode A Unicode variable to receive the method object string translated to Unicode.
string An EBCDIC character string.
CharacterDecode The optional, name required, argument CharacterDecode is a Boolean enumeration:
  • If its value is True, an ampersand (&) in the input EBCDIC string is allowed only as the beginning of one of these types of character or entity reference:
    • The substring &amp;. This substring is converted to a single & character.
    • A hexadecimal character reference (for example, the eight characters &#x201C; for the Unicode Left double quotation mark, “). The character reference is converted to the referenced character.
    • As of Sirius Mods version 7.6, an XHTML entity reference (for example, the six characters &copy; for the copyright character, ©). The entity reference is converted to the referenced character.

    A decimal character reference (for example, &#172;) is not allowed.

  • If its value is False, the default, an ampersand is treated only as a normal character.
Untranslatable The optional, name required, argument Untranslatable is a single character or a null string that specifies how to handle EBCDIC input characters that are not translatable to Unicode:
  • If the value is a single Unicode character, any untranslatable EBCDIC characters are replaced with that Unicode character.
  • If the value is the null string, any untranslatable EBCDIC characters are removed from the input string.
The Untranslatable parameter is optional. If it is omitted and an EBCDIC character is encountered that is not translatable to Unicode, a CharacterTranslationException exception is thrown.

The Untranslatable parameter is available as of Sirius Mods version 7.5. It provides the functionality formerly provided by the EbcdicTranslateNonUnicode and the EbcdicRemoveNonUnicode methods, which are invalid as of Sirius Mods 7.5.

Exceptions

EbcdicToUnicode can throw the following exception:

CharacterTranslationException
If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem.

Usage notes

  • Using EbcdicToUnicode with CharacterDecode=True (or using the U function) is necessary if the string you want to convert to Unicode may contain a hexadecimal or XHTML entity character reference which you want converted to the corresponding Unicode character.
  • EbcdicToUnicode is available as of Sirius Mods version 7.3.

Examples

The following fragment shows four calls of EbcdicToUnicode: respectively against translatable EBCDIC characters, a string with a character reference, a string with an entity reference, and a string with an EBCDIC character that cannot be translated to Unicode. The X constant function is used in the example.

%e string Len 20 %u unicode %e = '12' %u = %e:EbcdicToUnicode Print %u Print %u:UnicodeToUtf16:StringToHex %e = '1&#x2122;2' %u = %e:EbcdicToUnicode(CharacterDecode=True) Print %u:UnicodeToUtf16:StringToHex %e = '&copy;' %u = %e:EbcdicToUnicode(CharacterDecode=True) Print %u %e = 'F1FFF2':X %u = %e:EbcdicToUnicode

The result of the above fragment is:

12 00310032 003121220032 © CANCELLING REQUEST: MSIR.0751: Class STRING, function EBCDICTOUNICODE: CHARACTER TRANSLATIONEXCEPTION exception: EBCDIC character X'FF' without valid translation to Unicode at byte position 2 ...

Note: The initial Print %u statement in the example above is not very revealing because it is equivalent to specifying Print %u:UnicodeToEbcdic — a Unicode string is implicitly converted to EBCDIC when it is used in an EBCDIC context like a Print statement. UnicodeToUtf16, however, converts the Unicode variable to a byte-stream string, which StringToHex converts to its hex representation.

See also

  • U is a compile-time-only equivalent of the EbcdicToUnicode method (with the CharacterDecode argument implicitly set to True).
  • You can find the list of XHTML entities on the Internet at the following URL:

    http://www.w3.org/TR/xhtml1/dtds.html#h-A2

  • More information is available about Unicode.
  • The EbcdicToAscii method converts an EBCDIC string to ASCII.