ParseXml (HttpResponse function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
mNo edit summary
m (match syntax table to syntax template; edits, tags and links (part 1))
Line 1: Line 1:
{{Template:HttpResponse:ParseXml subtitle}}
{{Template:HttpResponse:ParseXml subtitle}}
The <var>ParseXml</var> method helps you access the document/content from <var>[[HttpRequest_class|HttpRequest]]</var> <var>[[Get (HttpRequest function)|Get]]</var>, <var>[[Post (HttpRequest function)|Post]]</var>, and <var>[[Send (HttpRequest function)|Send]]</var> operations.  ParseXml deserializes returned XML data into the <var class="product">[[Janus SOAP]]</var> <var>[[XmlDoc_class#The_XmlDoc_class|XmlDoc]]</var> object you specify.


This method helps you access the document/content from <var>[[Get (HttpRequest function)|Get]]</var>, <var>[[Post (HttpRequest function)|Post]]</var>, and <var>[[Send (HttpRequest function)|Send]]</var>
operations.
ParseXml deserializes returned XML data into the <var class="product">[[Janus SOAP]]</var> <var>XmlDoc</var>
object you specify.
Working with <var class="product">Janus SOAP</var> <var>XmlDoc</var> objects is described in detail in [[XML processing in Janus SOAP|"XML processing in Janus SOAP"]].
==Syntax==
==Syntax==
{{Template:HttpResponse:ParseXml syntax}}
{{Template:HttpResponse:ParseXml syntax}}
===Syntax terms===
===Syntax terms===
<table class="syntaxTable">
<table class="syntaxTable">
<tr><th>%rc</th>
<tr><th>%number</th>
<td>A numeric variable. Successful deserialization returns 0; failure returns a non-zero value.
<td>A numeric variable. Successful deserialization returns 0; failure returns a non-zero value.
</td></tr>
</td></tr>
<tr><th>%httpresp</th>
<tr><th>%httpResponse</th>
<td>A reference to an HTTPResponse object that was returned by a Get, Post, or Send method of an HTTPRequest object.
<td>A reference to an <var>[[HttpResponse_class|HttpResponse]]</var> object that was returned by a <var>Get</var>, <var>Post</var>, or <var>Send</var> method of an <var>HttpRequest</var> object.</td></tr>
<tr><th>doc</th>
<td>A previously instantiated <var>XmlDoc</var> object which must be EMPTY.
</td></tr>
</td></tr>
<tr><th>%xmldoc</th>
<tr><th>string</th>
<td>An object of type [[Janus SOAP]] XmlDoc that contains an instantiated XmlDoc object, which must be EMPTY.
<td>Any valid combination of the following (which are the same as the options of the <var>[[LoadXml_(XmlDoc/XmlNode_function)|LoadXML</var> method):  
</td></tr>
<tr><th>xmloptions</th>
<td>Any valid combination of the following (which are the same as the options of the LoadXML method of the [[Janus SOAP]] XmlDoc object):  
<ul>  
<ul>  
<li><b>AllowUntranslatable</b>
<li><b>AllowUntranslatable</b>
Allows all valid Unicode strings into the XML document. When this option is not specified, Unicode strings that are not translatable to EBCDIC are not allowed.
Allows all valid Unicode strings into the XML document. When this option is not specified, Unicode strings that are not translatable to EBCDIC are not allowed. As described in [[##Deserializing Unicode strings|"Deserializing Unicode strings"]], it is recommended that you use <var>AllowUntranslatable</var> only if the application checks for translatability when accessing parts of the <var>XmlDoc</var> that may have untranslatable Unicode content.
As described in [[##Deserializing Unicode strings|"Deserializing Unicode strings"]], it is recommended that you use <var>AllowUntranslatable</var> only if the application checks for translatability when accessing parts of the <var>XmlDoc</var> that may have untranslatable Unicode content.
<p>The <var>AllowUntranslatable</var> option is available as of <var class="product">[[Sirius Mods]]</var> Version 7.6.</p>
The <var>AllowUntranslatable</var> option is available as of version 7.6 of the <var class="product">Sirius Mods</var>. <li><b>CrPreserve</b>
<li><b>CrPreserve</b>
All whitespace characters in Element content are preserved, including carriage return. Unlike all other deserialization options, a carriage return in Element content does ''not'' undergo the normalization specified in the XML standard (and described in [[XML processing in Janus SOAP#Normalizing whitespace characters|"Normalizing whitespace characters"]]).
All whitespace characters in Element content are preserved, including carriage return. Unlike all other deserialization options, a carriage return in Element content does ''not'' undergo the normalization specified in the XML standard (and described in [[XML processing in Janus SOAP#Normalizing whitespace characters|"Normalizing whitespace characters"]]). <var>CrPreserve</var> is mutually exclusive with the <var>WspNewline</var>, <var>WspToken</var>, and <var>WspPreserve</var> options, and with the <var>LinefeedNoTrailingTabs</var> option.
<var>CrPreserve</var> is mutually exclusive with the <var>WspNewline</var>, <var>WspToken</var>, and <var>WspPreserve</var> options, and with the <var>LinefeedNoTrailingTabs</var> option.
<p>The <var>CrPreserve</var> option was added in <var class="product">Sirius Mods</var> Version 7.5, as well as implemented with a maintenance zap <var class="product">Sirius Mods</var> Version 7.4.</p>
The <var>CrPreserve</var> option was added in version 7.5 of the <var class="product">Sirius Mods</var>, as well as implemented with a maintenance zap to the 7.4 <var class="product">Sirius Mods</var>.  
<li><b>DTDIgnore</b>
<li><b>DTDIgnore</b>
A &amp;ldquo;<!DOCTYPE&amp;thinsp....>&amp;rdquo; clause may be present in the document, and it should be ignored. In any case, the DTD is not processed. The default behavior, that is, if <var>DTDIgnore</var> is not present, is to treat &amp;ldquo;<!DOCTYPE&amp;thinsp....>&amp;rdquo; as a syntax error.
A &ldquo;<!DOCTYPE&amp;thinsp....>&rdquo; clause may be present in the document, and it should be ignored. In any case, the DTD is not processed. The default behavior, that is, if <var>DTDIgnore</var> is not present, is to treat &ldquo;<!DOCTYPE&amp;thinsp....>&rdquo; as a syntax error.
<var>DTD_Ignore</var> is a synonym for <var>DTDIgnore</var>.  
<p><var>DTD_Ignore</var> is a synonym for <var>DTDIgnore</var>.</p>
<li><b>ErrRet</b>
<li><b>ErrRet</b>
Errors during deserialization are tolerated, the method object is not updated (retains its pre-call state), and the request continues. If <var>ErrRet</var> is not present, any error will cancel the request. Note that some errors still cancel the request; errors tolerated when <var>ErrRet</var> is specified are:  
Errors during deserialization are tolerated, the method object is not updated (retains its pre-call state), and the request continues. If <var>ErrRet</var> is not present, any error will cancel the request. Note that some errors still cancel the request; errors tolerated when <var>ErrRet</var> is specified are:
<ul>  
<ul>  
<li>A syntax error in the text string representation of the XML document  
<li>A syntax error in the text string representation of the XML document  
Line 39: Line 33:
<li><b>LinefeedNoTrailingTabs</b>
<li><b>LinefeedNoTrailingTabs</b>
For a Text node that consists of an initial line-end character and one or more tab characters, this option normalizes the content so the result is a single line-end character. The initial line-end (also called "newline") character can be a linefeed character (LF) or a carriage-return (CR) by itself, or a carriage-return followed by a linefeed (CRLF), since (within Text nodes) all of these are normalized by the XML specification into a single line-end character.
For a Text node that consists of an initial line-end character and one or more tab characters, this option normalizes the content so the result is a single line-end character. The initial line-end (also called "newline") character can be a linefeed character (LF) or a carriage-return (CR) by itself, or a carriage-return followed by a linefeed (CRLF), since (within Text nodes) all of these are normalized by the XML specification into a single line-end character.
This option, added in <var class="product">Sirius Mods</var> version 7.0, is compatible with, but takes precedence over, any of the other whitespace-handling options (<var>WspNewline</var>, <var>WspToken</var>, <var>WspPreserve</var>) except <var>CrPreserve</var>.
<p>This option, added in <var class="product">Sirius Mods</var> Version 7.0, is compatible with, but takes precedence over, any of the other whitespace-handling options (<var>WspNewline</var>, <var>WspToken</var>, <var>WspPreserve</var>) except <var>CrPreserve</var>.</p>
See [[#Whitespace handling|"Whitespace handling"]], below, for more information about this option and about whitespace handling. <li><b>ReplaceUnicode</b>
<p>See [[#Whitespace handling|"Whitespace handling"]], below, for more information about this option and about whitespace handling.</p>
<li><b>ReplaceUnicode</b>
Converts Unicode characters using the replacements (if any) specified at your site by <var>[[Unicode|The UNICODE command|UNICODE]]</var> updating commands that use the <var>Rep</var> subcommand (for example, <code>UNICODE Table Standard Rep U=2122 '(TM)'</code>).
Converts Unicode characters using the replacements (if any) specified at your site by <var>[[Unicode|The UNICODE command|UNICODE]]</var> updating commands that use the <var>Rep</var> subcommand (for example, <code>UNICODE Table Standard Rep U=2122 '(TM)'</code>).
The replacement is performed on all names, element and attribute values, comments, and PI &amp;ldquo;values&amp;rdquo; in the document, after any entity and character references have been converted to characters.
<p>The replacement is performed on all names, element and attribute values, comments, and PI &ldquo;values&rdquo; in the document, after any entity and character references have been converted to characters.</p>
For further discussion and examples, see the <var>ReplaceUnicode</var> discussion in [[#Usage Notes|"Usage Notes"]], below.
<p>For further discussion and examples, see the <var>ReplaceUnicode</var> discussion in [[#Usage Notes|"Usage Notes"]], below.</p>
 
<li><b>WspNewline</b>
<li><b>WspNewline</b>
This option is designed to remove any whitespace inserted to make the structure of an XML document easier (for a person) to read. <var>WspNewline</var> removes the leading or trailing whitespace in the value of a Text node, if the whitespace sequence contains a newline (carriage return or linefeed) character. <br>'''Note:''' This handling, the default whitespace option for this method, applies to the "physical value" of the representation of a Text node. In particular, markup such as a character reference (even of whitespace, for example, <code>&amp;amp;#32;</code>), a CDATA section, or any non-whitespace character delimits leading or trailing whitespace and is not affected.
This option is designed to remove any whitespace inserted to make the structure of an XML document easier (for a person) to read. <var>WspNewline</var> removes the leading or trailing whitespace in the value of a Text node, if the whitespace sequence contains a newline (carriage return or linefeed) character.
See [[Whitespace handling|"Whitespace handling"]], below, for more information about whitespace handling.
<p>'''Note:''' This handling, the default whitespace option for this method, applies to the "physical value" of the representation of a Text node. In particular, markup such as a character reference (even of whitespace, for example, <code>&amp;amp;#32;</code>), a CDATA section, or any non-whitespace character delimits leading or trailing whitespace and is not affected.</p>
 
<p>See [[Whitespace handling|"Whitespace handling"]], below, for more information about whitespace handling.</p>
<li><b>WspPreserve</b>
<li><b>WspPreserve</b>
All whitespace characters in element content are preserved (after end-of-line normalization), as described below in [[Whitespace handling|"Whitespace handling"]]. <br>'''Note:''' <var>Wsp_Preserve</var> is a synonym for <var>WspPreserve</var>. <li><b>WspToken</b>
All whitespace characters in element content are preserved (after end-of-line normalization), as described below in [[Whitespace handling|"Whitespace handling"]].
<p>'''Note:''' <var>Wsp_Preserve</var> is a synonym for <var>WspPreserve</var>.</p>
<li><b>WspToken</b>
Whitespace in element content is normalized using the XPath <code>normalize()</code> function (leading and trailing whitespace removed, intermediate strings of whitespace replaced by a single blank character). <var>WspToken</var> is a good substitute for <var>WspNewline</var> to remove leading and trailing whitespace in cases where blanks (or tabs) and not line-end characters were used to make the document structure more readable &amp;mdash; if it is tolerable to collapse intermediate whitespace sequences to single space characters.
Whitespace in element content is normalized using the XPath <code>normalize()</code> function (leading and trailing whitespace removed, intermediate strings of whitespace replaced by a single blank character). <var>WspToken</var> is a good substitute for <var>WspNewline</var> to remove leading and trailing whitespace in cases where blanks (or tabs) and not line-end characters were used to make the document structure more readable &amp;mdash; if it is tolerable to collapse intermediate whitespace sequences to single space characters.
See [[Whitespace handling|"Whitespace handling"]] for more information about whitespace handling. </ul>
<p>See [[Whitespace handling|"Whitespace handling"]] for more information about whitespace handling.</p></ul>
</td></tr></table>
</td></tr></table>


==Usage notes==
==Usage notes==
<ul>
<ul>
<li><var>ParseXml</var> is equivalent to, but faster than, issuing the <var class="product">Janus SOAP</var> XML
<li><var>ParseXml</var> is equivalent to, but faster than, issuing the <var class="product">Janus SOAP</var> XML text-conversion method <var>[[LoadXml (XmlDoc/XMLNODE FUNCTION)|LOADXml]]</var> against the longstring returned by <var>[[Content (HttpResponse function)|Content]]</var>.
text-conversion
<li>If the content type header indicates HTML, XML, or plain text, ASCII-to-EBCDIC translation is performed. If the <var>Clsock</var> port definition specifies a translation table for ASCII-to-EBCDIC, that table is used instead of the default.
method <var>[[LoadXml (XmlDoc/XMLNODE FUNCTION)|LOADXml]]</var> against the longstring returned
by <var>[[Content (HttpResponse function)|Content]]</var>.bot
<li>If the content type header indicates HTML, XML, or plain text,
ASCII-to-EBCDIC translation is performed.
If the <var>Clsock</var> port definition specifies a translation table for ASCII-to-EBCDIC,
that table is used instead of the default.
<li>None of the ''options'' may be specified twice.
<li>None of the ''options'' may be specified twice.
<li>The ''options'' may be specified in any case.
<li>The ''options'' may be specified in any case. For example, you can use <code>WspPreserve</code> and <code>wsppreserve</code>, interchangeably.
For example, you can use
<li>Whitespace handling:
<code>WspPreserve</code> and <code>wsppreserve</code>, interchangeably.
<li>Whitespace handling
<ul>
<ul>
<li>The &amp;ldquo;Wsp&amp;rdquo; whitespace-handling ''options''
<li>The &ldquo;Wsp&rdquo; whitespace-handling ''options'' (<var>WspPreserve</var>, <var>WspNewline</var>, and <var>WspToken</var>) and the <var>CrPreserve</var> whitespace option are mutually exclusive; if none of them is specified, <var>WspNewline</var> is in effect. Although the <var>LinefeedNoTrailingTabs</var> option is also concerned with whitespace, it is distinct from, yet compatible with, any of the three &ldquo;Wsp&rdquo; options, but it is not compatible with the <var>CrPreserve</var> option.
(<var>WspPreserve</var>, <var>WspNewline</var>, and <var>WspToken</var>) and the <var>CrPreserve</var> whitespace option
<li>Except for <var>CrPreserve</var>, the whitespace-handling ''options'' are applied after the XML standard whitespace conversions that <var class="product">[[Janus SOAP]]</var> always applies. As described in [[XML processing in Janus SOAP#Normalizing whitespace characters|"Normalizing whitespace characters"]], the standard specifies that '''all''' carriage return/linefeed sequences and carriage return sequences are to be converted to linefeeds when deserializing. Using the CrPreserve option bypasses this rule.
are mutually exclusive; if none of them is specified, <var>WspNewline</var> is in effect.
<li>The whitespace-handling ''options'' do '''no''' whitespace conversion (beyond the XML standard conversions) on element content that is &ldquo;protected&rdquo; by the <code>xml:space="preserve"</code> attribute.
Although the <var>LinefeedNoTrailingTabs</var> option is also concerned with whitespace,
it is distinct from, yet compatible with, any of the three &amp;ldquo;Wsp&amp;rdquo; options,
but it is not compatible with the <var>CrPreserve</var> option.
<li>Except for <var>CrPreserve</var>, the whitespace-handling ''options'' are
applied after the XML standard whitespace conversions that <var class="product">[[Janus SOAP]]</var> always applies.
As described in [[XML processing in Janus SOAP#Normalizing whitespace characters|"Normalizing whitespace characters"]],
the standard specifies that '''all''' carriage return/linefeed
sequences and carriage return sequences are to be converted to linefeeds
when deserializing.
Using the CrPreserve option bypasses this rule.
<li>The whitespace-handling ''options'' do '''no''' whitespace
conversion (beyond the XML standard conversions) on element content that is
&amp;ldquo;protected&amp;rdquo; by the <code>xml:space="preserve"</code> attribute.


&amp;ldquo;Protected&amp;rdquo; by the xml:space="preserve" attribute
&ldquo;Protected&rdquo; by the xml:space="preserve" attribute means an element <i><b>E</b></i> that either:
means an element <i><b>E</b></i> that either:
<ul>
<ul>
<li>has the <code>xml:space</code> attribute with the value <code>preserve</code>
<li>has the <code>xml:space</code> attribute with the value <code>preserve</code>
<li>is contained in
<li>is contained in an element <i><b>A</b></i> with that attribute and value, and there is no element that is a descendent of <i><b>A</b></i> and an ancestor of <i><b>E</b></i> with the <code>xml:space</code> attribute with the value <code>default</code>
an element <i><b>A</b></i> with that attribute and value, and there is no
element that is a descendent of <i><b>A</b></i> and an ancestor of <i><b>E</b></i>
with the <code>xml:space</code> attribute with the value <code>default</code>
</ul>
</ul>


Elements that are
Elements that are not protected by the <code>xml:space="preserve"</code> attribute have whitespace handled according to the option in effect for the deserialization.
not protected by the <code>xml:space="preserve"</code> attribute
<li>Using <var>WspNewline</var> or <var>WspToken</var> reduces the space consumed by individual Text nodes, and in some cases collapses all whitespace content between markup to the null string, so it is not stored as a <var>Text</var> node. This reduces the storage required by the <var>XmlDoc</var>, speeds up XPath and node access processing, and makes the output of the <var>[[Print (HttpResponse subroutine)|Print]]</var> subroutine easier to read.
have whitespace handled
<li>The <var>LinefeedNoTrailingTabs</var> option only affects Text nodes that contain an initial line-end character followed by any number of tabs and nothing else. The <var>LinefeedNoTrailingTabs</var> effect on such a Text node, whether it is specified with or without any of the &ldquo;Wsp&rdquo; options, is to store the value of the node as a single line-end character.
according to the option in effect for the deserialization.
<li>Using <var>WspNewline</var> or <var>WspToken</var>
reduces the space consumed by individual Text nodes,
and in some cases
collapses all whitespace content between markup to the null
string, so it is not stored as a <var>Text</var> node.
This reduces the storage required by the <var>XmlDoc</var>, speeds up
XPath and node access processing, and makes the output of the
<var>[[Print (HttpResponse subroutine)|Print]]</var> subroutine easier to read.
<li>The <var>LinefeedNoTrailingTabs</var> option only affects Text nodes that contain
an initial line-end character followed by any number of tabs and nothing else.
The <var>LinefeedNoTrailingTabs</var> effect on such a Text node,
whether it is specified with or without any of the &amp;ldquo;Wsp&amp;rdquo; options,
is to store the value of the node as a single line-end character.


One example of the use of the <var>LinefeedNoTrailingTabs</var> option is
One example of the use of the <var>LinefeedNoTrailingTabs</var> option is an input XML document to be deserialized for which both of the following are true:
an input XML document to be deserialized for which
both of the following are true:
<ul>
<ul>
<li>A digital signature is needed of a subtree in the document.
<li>A digital signature is needed of a subtree in the document.
<li>The input subtree contains a linefeed and one or more tabs that
<li>The input subtree contains a linefeed and one or more tabs that separate markup, and the linefeed must be kept but the tabs discarded for the signature.
separate markup, and the linefeed must be kept but the tabs discarded
for the signature.
</ul>
</ul>


For information about exclusive canonicalization,
For information about exclusive canonicalization, serialization expressly designed for digital signatures, see [[XmlDoc API serialization options#Canonicalization|"Canonicalization"]].
serialization expressly designed for digital signatures,
see [[XmlDoc API serialization options#Canonicalization|"Canonicalization"]].


<li>Deserializing Unicode strings
<li>Deserializing Unicode strings
<ul>
<ul>
<li>The <var>ParseXml</var> <var>AllowUntranslatable</var> option
<li>The <var>ParseXml</var> <var>AllowUntranslatable</var> option lets you deserialize Unicode strings that contain characters that are not translatable to EBCDIC. Otherwise, such characters in an input XML document are detected, an <var>XmlParseError</var> exception with reason <code>UntranslatableUnicode</code> is thrown, and the request is canceled.
lets you deserialize Unicode strings that contain characters
that are not translatable to EBCDIC.
Otherwise, such characters in an input XML document are detected,
an <var>XmlParseError</var> exception
with reason <code>UntranslatableUnicode</code> is thrown, and
the request is canceled.


This default detection of non-translatable characters may suit your purposes.
This default detection of non-translatable characters may suit your purposes. That is, it ensures that subsequent access to the deserialized content is performed without any Unicode to EBCDIC translation errors. For example:
That is, it ensures that subsequent
access to the deserialized content is performed
without any Unicode to EBCDIC translation errors.
For example:
<p class="code"> %doc:ParseXml
<p class="code"> %doc:ParseXml
   ...
   ...
Line 151: Line 92:
</p>
</p>


The assignment to the EBCDIC string <code>%val</code> will not fail due
The assignment to the EBCDIC string <code>%val</code> will not fail due to a Unicode translation problem: if there is any untranslatable Unicode (including, of course, strings in the XML document which your application never accesses), the <var>ParseXml</var> operation fails.
to a Unicode translation problem: if there is any untranslatable
Unicode (including, of course, strings in the XML document which your
application never accesses), the <var>ParseXml</var> operation fails.


If you use <var>AllowUntranslatable</var>,
If you use <var>AllowUntranslatable</var>, all Unicode characters in a serialized input XML document are allowed and stored in the <var>XmlDoc</var>. Your stored data may contain content that is not translatable to
all Unicode characters in a serialized input XML document are
EBCDIC, however. A subsequent attempt to access such content that performs Unicode to EBCDIC translation might cause request cancellation. You should therefore use <var>AllowUntranslatable</var> only if there is also a check for translatability when parts of the <var>XmlDoc</var> that may have non-translatable Unicode content are accessed.
allowed and stored in the <var>XmlDoc</var>.
Your stored data may contain content that is not translatable to
EBCDIC, however.
A subsequent attempt to access such content that
performs Unicode to EBCDIC translation might cause
request cancellation.
You should therefore use <var>AllowUntranslatable</var>
only if there is also a check for translatability
when parts of the <var>XmlDoc</var> that may
have non-translatable Unicode content are accessed.


The code below shows a way to get the benefit of specifying <var>AllowUntranslatable</var>
The code below shows a way to get the benefit of specifying <var>AllowUntranslatable</var> while limiting the risk of request cancellation. In the example, it is believed that only the element <code>comments</code> might contain untranslatable Unicode among all the data accessed from the XML document:
while limiting the risk of request cancellation.
In the example, it is believed that only the element <code>comments</code> might
contain untranslatable Unicode among all the data accessed from the XML document:
<p class="code"> %resp:ParseXml(%doc, 'AllowUntranslatable')
<p class="code"> %resp:ParseXml(%doc, 'AllowUntranslatable')
   ...
   ...
Line 186: Line 111:
</p>
</p>
</ul>
</ul>
'''Note:'''
'''Note:''' Unicode values, untranslatable or not, are always allowed when they are added to an <var>XmlDoc</var> using one of the <var class="product">Janus SOAP</var> XML Add or Insert methods which &ldquo;directly store&rdquo; into an <var>XmlDoc</var>. For example, the following fragment adds an <var>Element</var> node with a value that is the Unicode trademark sign:
Unicode values, untranslatable or not, are always allowed
when they are added to an <var>XmlDoc</var> using one of the <var class="product">Janus SOAP</var> XML
Add or Insert methods which &amp;ldquo;directly store&amp;rdquo; into an <var>XmlDoc</var>.
For example, the following fragment adds an <var>Element</var> node with
a value that is the Unicode trademark sign:
<p class="code"> %node:AddElement('notation', '&amp;amp;#x2122;':U)
<p class="code"> %node:AddElement('notation', '&amp;amp;#x2122;':U)
</p>
</p>
<li>The <var>ReplaceUnicode</var> option lets you replace certain
<li>The <var>ReplaceUnicode</var> option lets you replace certain Unicode input characters with those characters you have explicitly specified (by <var>[[Unicode#The UNICODE command|UNICODE]]</var> commands in your site's <var class="product">Model 204</var> CCAIN stream).
Unicode input characters
with those characters you have explicitly specified (by <var>[[Unicode#The UNICODE command|UNICODE]]</var> commands
in your site's <var class="product">Model 204</var> CCAIN stream).


For example, assume the following command is in CCAIN:
For example, assume the following command is in CCAIN:
Line 213: Line 130:
</p>
</p>


In the preceeding example, the stream of input characters to <var>LoadXml</var>
In the preceeding example, the stream of input characters to <var>LoadXml</var> contains the Unicode character U+2122. Since the <var>ReplaceUnicode</var> option applies to both the stream of input characters and to the character value of character references, consider the following fragment (assuming the same CCAIN line as above):  
contains the Unicode character U+2122.
Since the <var>ReplaceUnicode</var> option applies to both the stream of input characters
and to the character value of character references, consider the following fragment
(assuming the same CCAIN line as above):
<p class="code"> %d:LoadXml('<a>&amp;amp;#x2122;</a>', 'ReplaceUnicode')
<p class="code"> %d:LoadXml('<a>&amp;amp;#x2122;</a>', 'ReplaceUnicode')
</p>
</p>
Line 225: Line 138:
</p>
</p>


In this case, U+2122 does not occur in the input character stream, but it is
In this case, U+2122 does not occur in the input character stream, but it is the value of the character reference.
the value of the character reference.


Notes:
Notes:
<ul>
<ul>
<li>It is an error to be processing a replacement string within a character
<li>It is an error to be processing a replacement string within a character reference. For example, assume the following two lines are in CCAIN:  
reference.
For example, assume the following two lines are in CCAIN:
<p class="code"> * Replace superscript 2 with digit '2':
<p class="code"> * Replace superscript 2 with digit '2':
  UNICODE Table Standard Rep U=00B2 '2'
  UNICODE Table Standard Rep U=00B2 '2'
</p>
</p>
Given the above command, the following fragment gets a parse error,
Given the above command, the following fragment gets a parse error, because the replacement string is being used as part of a character reference:
because the replacement string is being used as part of a
character reference:
<p class="code"> %d:LoadXml('<a>&amp;amp;#x' With '&amp;amp;#xB2;':U With ';</a>', -
<p class="code"> %d:LoadXml('<a>&amp;amp;#x' With '&amp;amp;#xB2;':U With ';</a>', -
     'ReplaceUnicode')
     'ReplaceUnicode')
</p>
</p>
As a consequence of this rule, a replacement string should not
As a consequence of this rule, a replacement string should not contain an ampersand character (assuming that the <var>ReplaceUnicode</var> option will be used).
contain an ampersand character (assuming that the <var>ReplaceUnicode</var> option will
<li>Replacement of a Unicode character due to the <var>ReplaceUnicode</var> option is only done while processing names and values in the XML document. It is an error if the end of the name or value occurs and the replacement string has not been exhausted. In other words (again assuming that the <var>ReplaceUnicode</var> option will be used), a replacement string should not have &ldquo;XML markup&rdquo; that might end a string, such as a quotation mark or a left angle bracket (<code><</code>). For example, assume the following line is in CCAIN:
be used).
<li>Replacement of a Unicode character due to the
<var>ReplaceUnicode</var> option is only done while processing names and
values in the XML document.
It is an error if the end of the
name or value occurs and the replacement string has not been exhausted.
In other words (again assuming that the <var>ReplaceUnicode</var> option will be
used), a replacement string should not have &amp;ldquo;XML markup&amp;rdquo; that might
end a string, such as a quotation mark or a left angle bracket (<tt><</tt>).
For example, assume the following line is in CCAIN:
<p class="code"> UNICODE Table Standard Rep U=2122 '(trademark)<tm>'
<p class="code"> UNICODE Table Standard Rep U=2122 '(trademark)<tm>'
</p>
</p>
Given the above command, the following fragment gets a parsing error,
Given the above command, the following fragment gets a parsing error, because the '<' that is encountered in the replacement string ends the element content:
because the '<' that is encountered in the replacement string
ends the element content:
<p class="code"> %d:LoadXml('<a>&amp;amp;#x2122;</a>':U, 'ReplaceUnicode')
<p class="code"> %d:LoadXml('<a>&amp;amp;#x2122;</a>':U, 'ReplaceUnicode')
</p>
</p>
<li>If a parsing error occurs after processing a Unicode character that
<li>If a parsing error occurs after processing a Unicode character that has been replaced, the error display of the input stream will contain the replacement string, and the replaced character will not be displayed.
has been replaced, the error display of the input stream will contain the
However, if the character being replaced was introduced as a character reference, the character reference remains in the display of the input stream.
replacement string, and the replaced character will not be displayed.
However, if the character being replaced was introduced as a character
reference, the character reference remains in the display of the input
stream.
</ul>
</ul>
</ul>
</ul>
</ul>
</ul>
==Example==
==Example==
 
<ol><li>In the following example, <var>ParseXml</var> deserializes into <code>%doc</code> the XML data (stored in <code>%httpresp</code>) returned from a <var>Post</var> call:
In the following example, <var>ParseXml</var> deserializes into <code>%doc</code>
<p class="code"> %httpreq is object httpRequest
the XML data (stored in <code>%httpresp</code>) returned from a Post call:
  %httpreq = [[New_(HttpRequest_constructor)|new]]
<p class="code"> %httpreq is Object HttpRequest
  %httpreq = new
   ...
   ...
  %httpresp = %httpreq:Post('HTTPCLI')
  %httpresp = %httpreq:post('HTTPCLI')
  %doc is Object XmlDoc
  %doc is object xmlDoc
  %doc = New
  %doc = [[New_(XmlDoc_constructor)|new]]
  %httpresp:ParseXml(%doc, 'ErrRet')
  %httpresp:parseXml(%doc, 'ErrRet')
  %doc:Print
  %doc:[[Print_(XmlDoc/XmlNode_subroutine)|print]]
   ...
   ...
</p>
</p></ol>
 
==See also==
==See also==
<ul>
<li>Working with <var class="product">Janus SOAP</var> <var>XmlDoc</var> objects is described in detail in [[XML processing in Janus SOAP|"XML processing in Janus SOAP"]].
</ul>
{{Template:HttpResponse:ParseXml footer}}
{{Template:HttpResponse:ParseXml footer}}

Revision as of 06:42, 19 June 2011

Deserialize response data to XmlDoc (HttpResponse class)

[Requires Janus SOAP]

The ParseXml method helps you access the document/content from HttpRequest Get, Post, and Send operations. ParseXml deserializes returned XML data into the Janus SOAP XmlDoc object you specify.

Syntax

[%number =] httpResponse:ParseXml( doc, [options]) Throws XmlParseError

Syntax terms

%number A numeric variable. Successful deserialization returns 0; failure returns a non-zero value.
%httpResponse A reference to an HttpResponse object that was returned by a Get, Post, or Send method of an HttpRequest object.
doc A previously instantiated XmlDoc object which must be EMPTY.
string Any valid combination of the following (which are the same as the options of the [[LoadXml_(XmlDoc/XmlNode_function)|LoadXML method):
  • AllowUntranslatable Allows all valid Unicode strings into the XML document. When this option is not specified, Unicode strings that are not translatable to EBCDIC are not allowed. As described in "Deserializing Unicode strings", it is recommended that you use AllowUntranslatable only if the application checks for translatability when accessing parts of the XmlDoc that may have untranslatable Unicode content.

    The AllowUntranslatable option is available as of Sirius Mods Version 7.6.

  • CrPreserve All whitespace characters in Element content are preserved, including carriage return. Unlike all other deserialization options, a carriage return in Element content does not undergo the normalization specified in the XML standard (and described in "Normalizing whitespace characters"). CrPreserve is mutually exclusive with the WspNewline, WspToken, and WspPreserve options, and with the LinefeedNoTrailingTabs option.

    The CrPreserve option was added in Sirius Mods Version 7.5, as well as implemented with a maintenance zap Sirius Mods Version 7.4.

  • DTDIgnore A “<!DOCTYPE&thinsp....>” clause may be present in the document, and it should be ignored. In any case, the DTD is not processed. The default behavior, that is, if DTDIgnore is not present, is to treat “<!DOCTYPE&thinsp....>” as a syntax error.

    DTD_Ignore is a synonym for DTDIgnore.

  • ErrRet Errors during deserialization are tolerated, the method object is not updated (retains its pre-call state), and the request continues. If ErrRet is not present, any error will cancel the request. Note that some errors still cancel the request; errors tolerated when ErrRet is specified are:
    • A syntax error in the text string representation of the XML document
  • LinefeedNoTrailingTabs For a Text node that consists of an initial line-end character and one or more tab characters, this option normalizes the content so the result is a single line-end character. The initial line-end (also called "newline") character can be a linefeed character (LF) or a carriage-return (CR) by itself, or a carriage-return followed by a linefeed (CRLF), since (within Text nodes) all of these are normalized by the XML specification into a single line-end character.

    This option, added in Sirius Mods Version 7.0, is compatible with, but takes precedence over, any of the other whitespace-handling options (WspNewline, WspToken, WspPreserve) except CrPreserve.

    See "Whitespace handling", below, for more information about this option and about whitespace handling.

  • ReplaceUnicode Converts Unicode characters using the replacements (if any) specified at your site by The UNICODE command|UNICODE updating commands that use the Rep subcommand (for example, UNICODE Table Standard Rep U=2122 '(TM)').

    The replacement is performed on all names, element and attribute values, comments, and PI “values” in the document, after any entity and character references have been converted to characters.

    For further discussion and examples, see the ReplaceUnicode discussion in "Usage Notes", below.

  • WspNewline This option is designed to remove any whitespace inserted to make the structure of an XML document easier (for a person) to read. WspNewline removes the leading or trailing whitespace in the value of a Text node, if the whitespace sequence contains a newline (carriage return or linefeed) character.

    Note: This handling, the default whitespace option for this method, applies to the "physical value" of the representation of a Text node. In particular, markup such as a character reference (even of whitespace, for example, &amp;#32;), a CDATA section, or any non-whitespace character delimits leading or trailing whitespace and is not affected.

    See "Whitespace handling", below, for more information about whitespace handling.

  • WspPreserve All whitespace characters in element content are preserved (after end-of-line normalization), as described below in "Whitespace handling".

    Note: Wsp_Preserve is a synonym for WspPreserve.

  • WspToken Whitespace in element content is normalized using the XPath normalize() function (leading and trailing whitespace removed, intermediate strings of whitespace replaced by a single blank character). WspToken is a good substitute for WspNewline to remove leading and trailing whitespace in cases where blanks (or tabs) and not line-end characters were used to make the document structure more readable &mdash; if it is tolerable to collapse intermediate whitespace sequences to single space characters.

    See "Whitespace handling" for more information about whitespace handling.

Usage notes

  • ParseXml is equivalent to, but faster than, issuing the Janus SOAP XML text-conversion method LOADXml against the longstring returned by Content.
  • If the content type header indicates HTML, XML, or plain text, ASCII-to-EBCDIC translation is performed. If the Clsock port definition specifies a translation table for ASCII-to-EBCDIC, that table is used instead of the default.
  • None of the options may be specified twice.
  • The options may be specified in any case. For example, you can use WspPreserve and wsppreserve, interchangeably.
  • Whitespace handling:
    • The “Wsp” whitespace-handling options (WspPreserve, WspNewline, and WspToken) and the CrPreserve whitespace option are mutually exclusive; if none of them is specified, WspNewline is in effect. Although the LinefeedNoTrailingTabs option is also concerned with whitespace, it is distinct from, yet compatible with, any of the three “Wsp” options, but it is not compatible with the CrPreserve option.
    • Except for CrPreserve, the whitespace-handling options are applied after the XML standard whitespace conversions that Janus SOAP always applies. As described in "Normalizing whitespace characters", the standard specifies that all carriage return/linefeed sequences and carriage return sequences are to be converted to linefeeds when deserializing. Using the CrPreserve option bypasses this rule.
    • The whitespace-handling options do no whitespace conversion (beyond the XML standard conversions) on element content that is “protected” by the xml:space="preserve" attribute. “Protected” by the xml:space="preserve" attribute means an element E that either:
      • has the xml:space attribute with the value preserve
      • is contained in an element A with that attribute and value, and there is no element that is a descendent of A and an ancestor of E with the xml:space attribute with the value default

      Elements that are not protected by the xml:space="preserve" attribute have whitespace handled according to the option in effect for the deserialization.

    • Using WspNewline or WspToken reduces the space consumed by individual Text nodes, and in some cases collapses all whitespace content between markup to the null string, so it is not stored as a Text node. This reduces the storage required by the XmlDoc, speeds up XPath and node access processing, and makes the output of the Print subroutine easier to read.
    • The LinefeedNoTrailingTabs option only affects Text nodes that contain an initial line-end character followed by any number of tabs and nothing else. The LinefeedNoTrailingTabs effect on such a Text node, whether it is specified with or without any of the “Wsp” options, is to store the value of the node as a single line-end character. One example of the use of the LinefeedNoTrailingTabs option is an input XML document to be deserialized for which both of the following are true:
      • A digital signature is needed of a subtree in the document.
      • The input subtree contains a linefeed and one or more tabs that separate markup, and the linefeed must be kept but the tabs discarded for the signature.

      For information about exclusive canonicalization, serialization expressly designed for digital signatures, see "Canonicalization".

    • Deserializing Unicode strings
      • The ParseXml AllowUntranslatable option lets you deserialize Unicode strings that contain characters that are not translatable to EBCDIC. Otherwise, such characters in an input XML document are detected, an XmlParseError exception with reason UntranslatableUnicode is thrown, and the request is canceled. This default detection of non-translatable characters may suit your purposes. That is, it ensures that subsequent access to the deserialized content is performed without any Unicode to EBCDIC translation errors. For example:

        %doc:ParseXml ... %val Longstring %val = %doc:Value(%xpath)

        The assignment to the EBCDIC string %val will not fail due to a Unicode translation problem: if there is any untranslatable Unicode (including, of course, strings in the XML document which your application never accesses), the ParseXml operation fails.

        If you use AllowUntranslatable, all Unicode characters in a serialized input XML document are allowed and stored in the XmlDoc. Your stored data may contain content that is not translatable to EBCDIC, however. A subsequent attempt to access such content that performs Unicode to EBCDIC translation might cause request cancellation. You should therefore use AllowUntranslatable only if there is also a check for translatability when parts of the XmlDoc that may have non-translatable Unicode content are accessed.

        The code below shows a way to get the benefit of specifying AllowUntranslatable while limiting the risk of request cancellation. In the example, it is believed that only the element comments might contain untranslatable Unicode among all the data accessed from the XML document:

        %resp:ParseXml(%doc, 'AllowUntranslatable') ... %uVal Unicode %val Longstring %uVal = %node:Value('comments') Try %val = %uVal:UnicodeToEbcdic Catch CharacterTranslationException %val = %uVal:UnicodeToEbcdic(CharacterEncode=True) Print 'Untranslatable Unicode, character encoded:' - And %val End Try

      Note: Unicode values, untranslatable or not, are always allowed when they are added to an XmlDoc using one of the Janus SOAP XML Add or Insert methods which “directly store” into an XmlDoc. For example, the following fragment adds an Element node with a value that is the Unicode trademark sign:

      %node:AddElement('notation', '&amp;#x2122;':U)

    • The ReplaceUnicode option lets you replace certain Unicode input characters with those characters you have explicitly specified (by UNICODE commands in your site's Model 204 CCAIN stream). For example, assume the following command is in CCAIN:

      UNICODE Table Standard Rep U=2122 '(tm)'

      Given the above command, the ReplaceUnicode option for LoadXml is shown in the following fragment:

      %u Unicode Initial('<a>') %u = %u:UnicodeWith('2122':X:Utf16ToUnicode) %u = %u:UnicodeWith('</a>':U) %d:LoadXml(%u, 'ReplaceUnicode') %d:Print

      The result is:

      <a>(tm)</a>

      In the preceeding example, the stream of input characters to LoadXml contains the Unicode character U+2122. Since the ReplaceUnicode option applies to both the stream of input characters and to the character value of character references, consider the following fragment (assuming the same CCAIN line as above):

      %d:LoadXml('<a>&amp;#x2122;</a>', 'ReplaceUnicode')

      The result is also:

      <a>(tm)</a>

      In this case, U+2122 does not occur in the input character stream, but it is the value of the character reference.

      Notes:

      • It is an error to be processing a replacement string within a character reference. For example, assume the following two lines are in CCAIN:

        * Replace superscript 2 with digit '2': UNICODE Table Standard Rep U=00B2 '2'

        Given the above command, the following fragment gets a parse error, because the replacement string is being used as part of a character reference:

        %d:LoadXml('<a>&amp;#x' With '&amp;#xB2;':U With ';</a>', - 'ReplaceUnicode')

        As a consequence of this rule, a replacement string should not contain an ampersand character (assuming that the ReplaceUnicode option will be used).

      • Replacement of a Unicode character due to the ReplaceUnicode option is only done while processing names and values in the XML document. It is an error if the end of the name or value occurs and the replacement string has not been exhausted. In other words (again assuming that the ReplaceUnicode option will be used), a replacement string should not have “XML markup” that might end a string, such as a quotation mark or a left angle bracket (<). For example, assume the following line is in CCAIN:

        UNICODE Table Standard Rep U=2122 '(trademark)<tm>'

        Given the above command, the following fragment gets a parsing error, because the '<' that is encountered in the replacement string ends the element content:

        %d:LoadXml('<a>&amp;#x2122;</a>':U, 'ReplaceUnicode')

      • If a parsing error occurs after processing a Unicode character that has been replaced, the error display of the input stream will contain the replacement string, and the replaced character will not be displayed. However, if the character being replaced was introduced as a character reference, the character reference remains in the display of the input stream.

Example

  1. In the following example, ParseXml deserializes into %doc the XML data (stored in %httpresp) returned from a Post call:

    %httpreq is object httpRequest %httpreq = new ... %httpresp = %httpreq:post('HTTPCLI') %doc is object xmlDoc %doc = new %httpresp:parseXml(%doc, 'ErrRet') %doc:print ...

See also