EbcdicToUnicode (String function): Difference between revisions
m (1 revision) |
m (first pass, match syntax diagram to template and fix tags) |
||
Line 1: | Line 1: | ||
{{Template:String:EbcdicToUnicode subtitle}} | {{Template:String:EbcdicToUnicode subtitle}} | ||
<var>EbcdicToUnicode</var> is an [[Intrinsic classes|intrinsic]] function that converts an EBCDIC string to <var>Unicode</var> using the current <var>[[Unicode]]</var> tables. Options are available to control:<ul><li>the conversion of XML style hexadecimal character references, XHTML entity references, and ''''&amp;'''' references are converted to the represented <var>Unicode</var> character;<li>how to handle untranslatable EBCDIC characters</ul> | |||
using the current <var>Unicode</var> tables. | |||
and ''''&amp;'''' references are converted to the represented <var>Unicode</var> character | |||
==Syntax== | ==Syntax== | ||
{{Template:String:EbcdicToUnicode syntax}} | {{Template:String:EbcdicToUnicode syntax}} | ||
Line 13: | Line 8: | ||
<table class="syntaxTable"> | <table class="syntaxTable"> | ||
<tr><th>%unicode</th> | <tr><th>%unicode</th> | ||
<td>A string variable to receive the method object string translated to <var>Unicode</var>.</td></tr> | <td>A string variable to receive the method object <var class="term">string</var> translated to <var>Unicode</var>.</td></tr> | ||
<tr><th>string</th> | <tr><th><var class="term">string</var></th> | ||
<td>An EBCDIC character string.</td></tr> | <td>An EBCDIC character string.</td></tr> | ||
<tr><th>CharacterDecode | <tr><th>CharacterDecode</th> | ||
<td>The optional (name required) CharacterDecode argument is a [[Boolean]]: | <td>The optional (name required) <var class="term">CharacterDecode</var> argument is a <var>[[Boolean enumeration]]</var>: | ||
<ul><li>If its value is <code>True</code>, an ampersand (<code>&</code>) in the input EBCDIC string is allowed <b><i>only</i></b> as the beginning of one of these types of character or entity reference:<ul><li>The substring ''''&amp;''''. This substring is converted to a single ''''&'''' character.<li>A hexadecimal character reference (for example, the eight characters '&#x201C;' for the <var>Unicode</var> <i>Left double quotation mark</i> <code>'“'</code>). The character reference is converted to the referenced character.<li>As of <var class=product>Sirius Mods</var> version 7.6, an XHTML entity reference (for example, the six characters '&copy;' for the <i>copyright</i> character <code>'©'</code>). The entity reference is converted to the referenced character.</ul><p> A decimal character reference (for example, &#172;) is <b><i>not</i></b> allowed.</p> | |||
If its value is | <li>If its value is <code>False</code>, the default, an ampersand is treated only as a normal character.</ul></td></tr> | ||
<tr><th>Untranslatable | <tr><th>Untranslatable</th> | ||
<td>The optional (name required) Untranslatable argument is a single character or a null string that specifies how to handle EBCDIC input characters that are not translatable to <var>Unicode</var>: | <td>The optional (name required) <var class="term">Untranslatable</var> argument is a single character or a null string that specifies how to handle EBCDIC input characters that are not translatable to <var>Unicode</var>:<ul><li>If the value is a single <var>Unicode</var> character, any untranslatable EBCDIC characters are replaced with that <var>Unicode</var> character.<li>If the value is the null string, any untranslatable EBCDIC characters are removed from the input string.</ul>The <var class="term">Untranslatable</var> parameter is optional. If it is omitted and an EBCDIC character is encountered that is not translatable to <var>Unicode</var>, a <var>[[CharacterTranslationException class|CharacterTranslationException]]</var> exception is thrown.<p>The <var class="term">Untranslatable</var> parameter is available as of <var class="product">Sirius Mods</var> version 7.5. It provides the functionality formerly provided by the <var>EbcdicTranslateNonUnicode</var> and the <var>EbcdicRemoveNonUnicode</var> methods, which are invalid as of <var class="product">Sirius Mods</var> 7.5.</p></td></tr> | ||
The Untranslatable parameter is available as of <var class=product>Sirius Mods</var> version 7.5. It provides the functionality formerly provided by the | |||
</table> | </table> | ||
==Exceptions== | |||
<var>EbcdicToUnicode</var> can throw the following exception: | |||
<dl> | <dl> | ||
<dt><var>CharacterTranslationException</var> | <dt><var>[[CharacterTranslationException class|CharacterTranslationException]]</var> | ||
<dd>If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem. | <dd>If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem. | ||
</dl> | </dl> | ||
==Usage notes== | ==Usage notes== | ||
<ul><li>Using <var>EbcdicToUnicode</var> (or the <var>[[U (String function)|U]]</var> function) is necessary if the string you want to convert to <var>Unicode</var> may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a <var>Unicode</var> variable otherwise. | |||
< | <li><var>EbcdicToUnicode</var> is available as of <var class="product">Sirius Mods</var>version 7.3.</ul> | ||
</ | |||
==Examples== | ==Examples== | ||
The following fragment shows four calls of <var>EbcdicToUnicode</var>: respectively against translatable EBCDIC characters, a string with a character reference, a string with an entity reference, and a string with an EBCDIC character that cannot be translated to <var>Unicode</var>. | <ol><li>The following fragment shows four calls of <var>EbcdicToUnicode</var>: respectively against translatable EBCDIC characters, a string with a character reference, a string with an entity reference, and a string with an EBCDIC character that cannot be translated to <var>Unicode</var>. The <var>[[X (String function)|X]]</var> constant function is used in the example. | ||
The [[X (String function)|X]] constant function is used in the example. | <p class="code">%e string Len 20 | ||
<p class="code">%e | %u unicode | ||
%u | |||
%e = '12' | %e = '12' | ||
%u = %e: | %u = %e:EbcdicToUnicode | ||
Print %u | Print %u | ||
Print %u: | Print %u:UnicodeToUtf16:StringToHex | ||
%e = '1&#x2122;2' | %e = '1&#x2122;2' | ||
%u = %e: | %u = %e:EbcdicToUnicode(CharacterDecode=True) | ||
Print %u: | Print %u:UnicodeToUtf16:StringToHex | ||
%e = '&copy;' | %e = '&copy;' | ||
%u = %e: | %u = %e:EbcdicToUnicode(CharacterDecode=True) | ||
Print %u | Print %u | ||
%e = 'F1FFF2':X | %e = 'F1FFF2':X | ||
%u = %e: | %u = %e:EbcdicToUnicode | ||
</p> | </p> | ||
Line 74: | Line 60: | ||
exception: EBCDIC character X'FF' without valid | exception: EBCDIC character X'FF' without valid | ||
translation to <var>Unicode</var> at byte position 2 ... | translation to <var>Unicode</var> at byte position 2 ... | ||
</p> | </p><p> | ||
<b><i>Note:</i></b> The initial <code>Print %u</code> statement in the example above is not very revealing because it is equivalent to specifying <code>Print %u:[[UnicodeToEbcdic (Unicode function)|UnicodeToEbcdic]]'—'</code>; a <var>Unicode</var> string is implicitly converted to EBCDIC when it is used in an EBCDIC context like a Print statement. <var>[[UnicodeToUtf16 (Unicode function)|UnicodeToUtf16]]</var>, however, converts the Unicode variable to a byte-stream string, which <var>[[StringToHex (String function)|StringToHex]]</var> converts to its hex representation.</p></ol> | |||
The initial | |||
equivalent to specifying | |||
a <var>Unicode</var> string is implicitly converted to EBCDIC | |||
when it is used in an EBCDIC context like a Print statement. | |||
==See also== | ==See also== | ||
<ul><li><var>[[U (String function)|U]]</var> is a compile-time-only equivalent of the <var>EbcdicToUnicode</var> method (with CharacterDecode argument implicitly set to <code>True</code>). | |||
<li>You can find the list of XHTML entities on the Internet at the following URL: | |||
<p class="code">http://www.w3.org/TR/xhtml1/dtds.html#h-A2 | |||
</p> | |||
<li>More information is available about the <var>[[Unicode Tables|Unicode]]</var>. | |||
<li>The <var>[[EbcdicToAscii (String function)|EbcdicToAscii]]</var> method converts an EBCDIC string to ASCII.</ul> | |||
{{Template:String:EbcdicToUnicode footer}} | {{Template:String:EbcdicToUnicode footer}} |
Revision as of 06:05, 31 January 2011
Convert EBCDIC string to Unicode (String class)
EbcdicToUnicode is an intrinsic function that converts an EBCDIC string to Unicode using the current Unicode tables. Options are available to control:
- the conversion of XML style hexadecimal character references, XHTML entity references, and '&' references are converted to the represented Unicode character;
- how to handle untranslatable EBCDIC characters
Syntax
%unicode = string:EbcdicToUnicode[( [CharacterDecode= boolean], - [Untranslatable= unicode])] Throws CharacterTranslationException
Syntax terms
%unicode | A string variable to receive the method object string translated to Unicode. |
---|---|
string | An EBCDIC character string. |
CharacterDecode | The optional (name required) CharacterDecode argument is a Boolean enumeration:
|
Untranslatable | The optional (name required) Untranslatable argument is a single character or a null string that specifies how to handle EBCDIC input characters that are not translatable to Unicode:
The Untranslatable parameter is available as of Sirius Mods version 7.5. It provides the functionality formerly provided by the EbcdicTranslateNonUnicode and the EbcdicRemoveNonUnicode methods, which are invalid as of Sirius Mods 7.5. |
Exceptions
EbcdicToUnicode can throw the following exception:
- CharacterTranslationException
- If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem.
Usage notes
- Using EbcdicToUnicode (or the U function) is necessary if the string you want to convert to Unicode may contain a hexadecimal character reference. Such a reference cannot be meaningfully assigned to a Unicode variable otherwise.
- EbcdicToUnicode is available as of Sirius Modsversion 7.3.
Examples
- The following fragment shows four calls of EbcdicToUnicode: respectively against translatable EBCDIC characters, a string with a character reference, a string with an entity reference, and a string with an EBCDIC character that cannot be translated to Unicode. The X constant function is used in the example.
%e string Len 20 %u unicode %e = '12' %u = %e:EbcdicToUnicode Print %u Print %u:UnicodeToUtf16:StringToHex %e = '1™2' %u = %e:EbcdicToUnicode(CharacterDecode=True) Print %u:UnicodeToUtf16:StringToHex %e = '©' %u = %e:EbcdicToUnicode(CharacterDecode=True) Print %u %e = 'F1FFF2':X %u = %e:EbcdicToUnicode
The result of the above fragment is:
12 00310032 003121220032 © CANCELLING REQUEST: MSIR.0751: Class STRING, function EBCDICTOUNICODE: CHARACTER TRANSLATIONEXCEPTION exception: EBCDIC character X'FF' without valid translation to Unicode at byte position 2 ...
Note: The initial
Print %u
statement in the example above is not very revealing because it is equivalent to specifyingPrint %u:UnicodeToEbcdic'—'
; a Unicode string is implicitly converted to EBCDIC when it is used in an EBCDIC context like a Print statement. UnicodeToUtf16, however, converts the Unicode variable to a byte-stream string, which StringToHex converts to its hex representation.
See also
- U is a compile-time-only equivalent of the EbcdicToUnicode method (with CharacterDecode argument implicitly set to
True
). - You can find the list of XHTML entities on the Internet at the following URL:
- More information is available about the Unicode.
- The EbcdicToAscii method converts an EBCDIC string to ASCII.