UrlDecodeUnicode and FormUrlDecodeUnicode (String functions): Difference between revisions

Latest revision as of 23:05, 5 November 2012

Decode URL encoded characters to unicode (String class)

[Introduced in Sirius Mods 7.9]

UrlDecodeUnicode and FormUrlDecodeUnicode decode data that's been URL encoded or percent-encoded to a unicode string.

Syntax

%unicode = string:UrlDecodeUnicode[( [AllowUntranslatable= boolean])] Throws CharacterTranslationException

%unicode = string:FormUrlDecodeUnicode[( [AllowUntranslatable= boolean])] Throws CharacterTranslationException

Syntax terms

%unicode	The `Unicode` variable that results when `string` is decoded.
string	The UTF-8 encoded string that contains a URL-encoded representation of a Unicode string.
`AllowUntranslatable`	This optional, name required, argument indicates whether Unicode characters that cannot be translated back to EBCDIC are to be allowed. If `boolean` is set to `False`, a Unicode character that cannot be translated back to EBCDIC results in request cancellation.

Exceptions

UrlDecodeUnicode and FormUrlDecodeUnicode can throw the following exception:

CharacterTranslationException: If the method encounters a translation problem, properties of the exception object may indicate the location and type of problem.

Usage notes

If an EBCDIC version of URL-encoded data needs to be converted to Unicode, the EBCDIC string must be:
1. Assigned to a Unicode variable (using either implicit EBCDIC-to-Unicode translation or doing it explicitly with EbcdicToUnicode)
2. Converted to UTF-8 using UnicodeToUtf8
3. Translated to Unicode using UrlDecodeUnicode
An example below illustrates this process.
The difference between FormUrlDecodeUnicode and UrlDecodeUnicode is that FormUrlDecodeUnicode converts pluses (+) to spaces, while UrlDecodeUnicode expects spaces to be percent encoded as %20. UrlDecodeUnicode will throw a CharacterTranslationException exception if it encounters a plus character. Typically form URL encoding is only used in HTML form posts when the form is posted using the x-www-form-urlencoded content type.
URL encoding is mostly used in web applications or for encoding a URI, possibly in an XML namespace declaration.
The inverses of UrlDecodeUnicode and FormUrlDecodeUnicode are UnicodeUrlEncode and UnicodeFormUrlEncode, respectively.

Examples

URL decoding an EBCDIC string

The following example URL decodes a URL encoded EBCDIC string to Unicode:

b %ebcdic is longstring %unicode is unicode %ebcdic = 'I%20like%20apple%20%CF%80%20and%20eat%20it%204%20%C3%97%20a%20day.' %unicode = %ebcdic:ebcdicToUnicode:unicodeToUtf8:urlDecodeUnicode printText {~} = {%unicode} end

and outputs:

%unicode = I like apple π and eat it 4 × a day.

Catching an exception when URL decoding

The following is an example of catching a URL decoding exception. In the example, the supposedly URL encoded string contains a %2x, which is not a valid percent encoded hexadecimal value. This results in an exception:

begin %url is longstring %u is unicode %err is object characterTranslationException %url = 'What%20the%20deuce%2xis%20going%20on%20here?':u:unicodeToUtf8 printText ************** {~}="{%url:utf8ToUnicode}" try %u = %url:urlDecodeUnicode printText It worked! {~=%u} catch CharacterTranslationException to %err printText Caught CharacterTranslationException: printText Reason: {%err:reason} printText HexValue: {%err:hexValue} printText Position: {%err:bytePosition} printText Problem: {%err:description} end try end

This outputs:

************** %url:utf8ToUnicode="What%20the%20deuce%2xis%20going%20on%20here?" Caught CharacterTranslationException: Reason: InvalidUrlEncoding HexValue: 78 Position: 21 Problem: invalid URL encoding X'78' at byte position 21: Hexadecimal digit expected

@@ Line 5: / Line 5: @@
 {{Template:String:UrlDecodeUnicode syntax}}
 {{Template:String:FormUrlDecodeUnicode syntax}}
 ===Syntax terms===
 <table class="syntaxTable">
 <tr><th>%unicode</th><td>The <var>Unicode</var> variable that results when <var class="term">string</var> is decoded.</td></tr>
-<tr><th>string</th><td>The [http://en.wikipedia.org/wiki/Utf-8 UTF-8 encoded] string that contains a URL encoded representation of a unicode string.</td></tr>
-<tr><th>boolean</th>
+<tr><th>string</th>
-<td>Indicates whether unicode characters that cannot be translated back to EBCDIC are to be allowed. If <var class="term">boolean</var> is set to <code>False</code> a unicode character that cannot be translated back to EBCDIC results in request cancellation.</td></tr>
+<td>The [http://en.wikipedia.org/wiki/Utf-8 UTF-8 encoded] string that contains a URL-encoded representation of a Unicode string.</td></tr>
+<tr><th><var>AllowUntranslatable</var></th>
+<td>This optional, [[Notation conventions for methods#Named parameters|name required]], argument indicates whether Unicode characters that cannot be translated back to EBCDIC are to be allowed. If <var class="term">boolean</var> is set to <code>False</code>, a Unicode character that cannot be translated back to EBCDIC results in request cancellation.</td></tr>
 </table>
 ==Exceptions==
 <var>UrlDecodeUnicode</var> and  <var>FormUrlDecodeUnicode</var> can throw the following exception:
 <dl>
@@ Line 21: / Line 25: @@
 ==Usage notes==
+<ul>
+<li>If an EBCDIC version of URL-encoded data needs to be converted to Unicode, the EBCDIC string must be:
+<ol>
+<li>Assigned to a Unicode variable (using either implicit EBCDIC-to-Unicode translation or doing it explicitly with <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var>)
+<li>Converted to UTF-8 using <var>[[UnicodeToUtf8 (Unicode function)|UnicodeToUtf8]]</var>
+<li>Translated to Unicode using <var>UrlDecodeUnicode</var>
+</ol>
+An example below illustrates this process.
+<li>The difference between <var>FormUrlDecodeUnicode</var> and <var>UrlDecodeUnicode</var> is that <var>FormUrlDecodeUnicode</var> converts pluses (<tt>+</tt>) to spaces, while <var>UrlDecodeUnicode</var> expects spaces to be percent encoded as <code>%20</code>. <var>UrlDecodeUnicode</var> will throw a <var>[[CharacterTranslationException class|CharacterTranslationException]]</var> exception if it encounters a plus character. Typically form URL encoding is only used in HTML form posts when the form is posted using the <code>x-www-form-urlencoded</code> content type.
-<ul>
-<li>If an EBCDIC version of URL encoded data needs to be converted to unicode, the EBCDIC string must first be assigned to a unicode variable using either implicit EBCDIC to Unicode translation or doing it explicitly with <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var>, converted to UTF-8 using <var>[[UnicodeToUtf8 (Unicode function)|UnicodeToUtf8]]</var>, and finally translated to unicode using <var>UrlDecodeUnicode. And example below illustrates this process.<li>The difference between <var>FormUrlDecodeUnicode</var> and <var>UrlDecodeUnicode</var> is that<var>FormUrlDecodeUnicode</var> converts pluses (<code>+</code>) to spaces while <var>UrlDecodeUnicode</var> expects spaces to be percent encoded as <code>%20</code>. <var>UrlDecodeUnicode</var> will throw a [[CharacterTranslationException]] exception if it encounters a plus (<code>+</code>). Typically form URL encoding is only used in HTML form posts when the form is posted using the x-www-form-urlencoded content type.
 <li>URL encoding is mostly used in web applications or for encoding a URI, possibly in an XML namespace declaration.
-<li>The inverse of <var>UrlDecodeUnicode</var> and <var>FormUrlDecodeUnicode</var> are <var>[[UnicodeUrlEncode (Unicode function)|UnicodeUrlEncode]]</var> and <var>[[UnicodeFormUrlEncode (Unicode function)|UnicodeUrlEncode]]</var>, respectively.</ul>
+<li>The inverses of <var>UrlDecodeUnicode</var> and <var>FormUrlDecodeUnicode</var> are <var>[[UnicodeUrlEncode (Unicode function)|UnicodeUrlEncode]]</var> and <var>[[UnicodeFormUrlEncode (Unicode function)|UnicodeFormUrlEncode]]</var>, respectively.
+</ul>
 ==Examples==
-===URL decoding an EBCDIC string===
+====URL decoding an EBCDIC string====
+The following example URL decodes a URL encoded EBCDIC string to <var>Unicode</var>:
-The following example URL decodes a URL encoded EBCDIC string to unicode:
 <p class="code">b
 %ebcdic   is longstring
@@ Line 44: / Line 57: @@
 end
 </p>
+and outputs:
+<p class="output">%unicode = I like apple &amp;#x03C0; and eat it 4 × a day.
+</p>
+====Catching an exception when URL decoding====
+The following is an example of catching a URL decoding exception. In the example, the supposedly URL encoded string contains a <code>%2x</code>, which is not a valid percent encoded hexadecimal value. This results in an exception:
+<p class="code">begin
+%url   is longstring
+%u     is unicode
+%err   is object characterTranslationException
-and outputs
+%url = 'What%20the%20deuce%2xis%20going%20on%20here?':u:unicodeToUtf8
+printText ************** {~}="{%url:utf8ToUnicode}"
+try %u = %url:urlDecodeUnicode
+    printText It worked! {~=%u}
+catch CharacterTranslationException to %err
+   printText Caught CharacterTranslationException:
+   printText    Reason:   {%err:reason}
+   printText    HexValue: {%err:hexValue}
+   printText    Position: {%err:bytePosition}
+   printText    Problem:  {%err:description}
+end try
-<p class="output">%unicode = I like apple &amp;#x03C0; and eat it 4 × a day.
+end
+</p>
+This outputs:
+<p class="output">************** %url:utf8ToUnicode="What%20the%20deuce%2xis%20going%20on%20here?"
+Caught CharacterTranslationException:
+   Reason:   InvalidUrlEncoding
+   HexValue: 78
+   Position: 21
+   Problem:  invalid URL encoding X'78' at byte position 21:   Hexadecimal digit expected
 </p>
 ==See also==
 {{Template:Unicode:UnicodeUrlEncode and UnicodeFormUrlEncode footer}}

Float class String class Unicode class	List of Float methods List of String methods List of Unicode methods List of Intrinsic methods	Float methods syntax String methods syntax Unicode methods syntax
Notation conventions for methods