LoadXml (XmlDoc/XmlNode function): Difference between revisions
m (1 revision) |
m (1 revision) |
||
Line 4: | Line 4: | ||
converts a text string representation of an | converts a text string representation of an | ||
XML document | XML document | ||
into an empty XmlDoc, or of an XML fragment as one or more | into an empty <var>XmlDoc</var>, or of an XML fragment as one or more | ||
children of an Element XmlNode. | children of an Element <var>XmlNode</var>. | ||
This process is called '''deserialization''', | This process is called '''deserialization''', | ||
because the text representation of a document is called the '''serial''' | because the text representation of a document is called the '''serial''' | ||
form. | form. | ||
LoadXml returns a zero value if the | <var>LoadXml</var> returns a zero value if the | ||
deserialization is successful; it returns a non-zero value if deserialization is | deserialization is successful; it returns a non-zero value if deserialization is | ||
unsuccessful, the ErrRet option is used, and the particular error is tolerated. | unsuccessful, the ErrRet option is used, and the particular error is tolerated. | ||
Line 20: | Line 20: | ||
<td>A %variable set to 0 if the deserialization is successful. If the ErrRet option is used, this %variable is set to the position within <i>input</i> at which an error is found. </td></tr> | <td>A %variable set to 0 if the deserialization is successful. If the ErrRet option is used, this %variable is set to the position within <i>input</i> at which an error is found. </td></tr> | ||
<tr><th>nr</th> | <tr><th>nr</th> | ||
<td>An expression that points to the XmlDoc or XmlNode to contain the deserialized representation of the XML document or fragment, respectively. | <td>An expression that points to the <var>XmlDoc</var> or <var>XmlNode</var> to contain the deserialized representation of the XML document or fragment, respectively. | ||
<p class="code">If an XmlDoc, it must be EMPTY (see [[??]] refid=dstates.) prior to invoking LoadXml. If an XmlNode that is the root node of an XmlDoc, the XmlDoc must be EMPTY. | <p class="code">If an <var>XmlDoc</var>, it must be EMPTY (see [[??]] refid=dstates.) prior to invoking <var>LoadXml</var>. If an <var>XmlNode</var> that is the root node of an <var>XmlDoc</var>, the <var>XmlDoc</var> must be EMPTY. | ||
Prior to ''Sirius Mods'' version 6.8, this method object had to be an XmlDoc; as of version 6.8, LoadXml is also available in the XmlNode class. </td></tr> | Prior to ''Sirius Mods'' version 6.8, this method object had to be an <var>XmlDoc</var>; as of version 6.8, <var>LoadXml</var> is also available in the <var>XmlNode</var> class. </td></tr> | ||
</p> | </p> | ||
<tr><th>input</th> | <tr><th>input</th> | ||
<td>The text string, Longstring, or (as of ''Sirius Mods'' version 6.8) Stringlist to be deserialized. If a Stringlist, ''input'' consists of the concatenation of the Stringlist items with no insertion of line-end characters at the end of each item. | <td>The text string, <var>Longstring</var>, or (as of ''Sirius Mods'' version 6.8) <var>Stringlist</var> to be deserialized. If a <var>Stringlist</var>, ''input'' consists of the concatenation of the <var>Stringlist</var> items with no insertion of line-end characters at the end of each item. | ||
<p class="code">If the ''nr'' method object is an XmlDoc or the root node of an XmlDoc, ''input'' must be valid as an entire XML document (for example, only one top-level element). If ''nr'' is a non-root XmlNode, ''input'' must be an '''XML fragment''', that is, a substring of a serialized XML document, such that: <ul> <li>The fragment may contain undeclared prefixes. Any such prefixes must have declarations which are in effect at the Element node referred to by the method object of LoadXml; these declarations (along with that Element's default namespace) are inherited by the inserted fragment. <li>In all other respects, the fragment, if "wrapped" within a simple element start tag and end tag (such as <tt><w></tt> and <tt></w></tt>, respectively), is a legal XML document. The fragment can contain leading and/or trailing character content and/or multiple "top-level" elements; all of these become children of the method object XmlNode. </ul> </td></tr> | <p class="code">If the ''nr'' method object is an <var>XmlDoc</var> or the root node of an <var>XmlDoc</var>, ''input'' must be valid as an entire XML document (for example, only one top-level element). If ''nr'' is a non-root <var>XmlNode</var>, ''input'' must be an '''XML fragment''', that is, a substring of a serialized XML document, such that: <ul> <li>The fragment may contain undeclared prefixes. Any such prefixes must have declarations which are in effect at the Element node referred to by the method object of <var>LoadXml</var>; these declarations (along with that Element's default namespace) are inherited by the inserted fragment. <li>In all other respects, the fragment, if "wrapped" within a simple element start tag and end tag (such as <tt><w></tt> and <tt></w></tt>, respectively), is a legal XML document. The fragment can contain leading and/or trailing character content and/or multiple "top-level" elements; all of these become children of the method object <var>XmlNode</var>. </ul> </td></tr> | ||
</p> | </p> | ||
<tr><th>options</th> | <tr><th>options</th> | ||
<td>Any valid combination of the following terms: <ul> <li><b>AllowUntranslatable</b> | <td>Any valid combination of the following terms: <ul> <li><b>AllowUntranslatable</b> | ||
<p class="code">Allows all valid Unicode strings into the XML document. When this option is not specified, only Unicode strings that are not translatable to EBCDIC are disallowed. | <p class="code">Allows all valid <var>Unicode</var> strings into the XML document. When this option is not specified, only <var>Unicode</var> strings that are not translatable to EBCDIC are disallowed. | ||
AllowUntranslatable allows untranslatable Unicode characters, but it does not affect untranslatable EBCDIC characters. | AllowUntranslatable allows untranslatable <var>Unicode</var> characters, but it does not affect untranslatable EBCDIC characters. | ||
As described in the [[??]] refid=ununic., it is recommended that you use AllowUntranslatable only if the application checks for translatability when accessing parts of the XmlDoc that may have untranslatable Unicode content. | As described in the [[??]] refid=ununic., it is recommended that you use AllowUntranslatable only if the application checks for translatability when accessing parts of the <var>XmlDoc</var> that may have untranslatable <var>Unicode</var> content. | ||
The AllowUntranslatable option is available as of version 7.6 of the ''Sirius Mods''. <li><b>CrPreserve</b> | The AllowUntranslatable option is available as of version 7.6 of the ''Sirius Mods''. <li><b>CrPreserve</b> | ||
All whitespace characters in Element content are preserved, including carriage return. Unlike all other deserialization options, a carriage return in Element content does ''not'' undergo the normalization specified in the XML standard (and referred to in [[??]] refid=unlxml.). | All whitespace characters in Element content are preserved, including carriage return. Unlike all other deserialization options, a carriage return in Element content does ''not'' undergo the normalization specified in the XML standard (and referred to in [[??]] refid=unlxml.). | ||
Line 43: | Line 43: | ||
For a Text node that consists of an initial line-end character and one or more tab characters, this option normalizes the content so the result is a single line-end character. The initial line-end (also called "newline") character can be a linefeed character (LF) or a carriage-return (CR) by itself, or a carriage-return followed by a linefeed (CRLF), since (within Text nodes) all of these are normalized by the XML specification into a single line-end character. | For a Text node that consists of an initial line-end character and one or more tab characters, this option normalizes the content so the result is a single line-end character. The initial line-end (also called "newline") character can be a linefeed character (LF) or a carriage-return (CR) by itself, or a carriage-return followed by a linefeed (CRLF), since (within Text nodes) all of these are normalized by the XML specification into a single line-end character. | ||
This option, added in ''Sirius Mods'' version 7.0, is compatible with, but takes precedence over, any of the other whitespace-handling options (WspNewline, WspToken, WspPreserve) except CrPreserve. | This option, added in ''Sirius Mods'' version 7.0, is compatible with, but takes precedence over, any of the other whitespace-handling options (WspNewline, WspToken, WspPreserve) except CrPreserve. | ||
See [[??]] refid=unwhite. for more information about this option and about whitespace handling. <li><b> | See [[??]] refid=unwhite. for more information about this option and about whitespace handling. <li><b>Replace<var>Unicode</var></b> | ||
Converts Unicode characters using the replacements (if any) specified at your site by UNICODE updating commands that use the <tt>Rep</tt> subcommand (for example, <tt>UNICODE Table Standard Rep U=2122 '(TM)'</tt>). | Converts <var>Unicode</var> characters using the replacements (if any) specified at your site by UNICODE updating commands that use the <tt>Rep</tt> subcommand (for example, <tt>UNICODE Table Standard Rep U=2122 '(TM)'</tt>). | ||
The replacement is performed on all names, element and attribute values, comments, and PI "values" in the document, after any entity and character references have been converted to characters. | The replacement is performed on all names, element and attribute values, comments, and PI "values" in the document, after any entity and character references have been converted to characters. | ||
For further discussion and examples, see the | For further discussion and examples, see the Replace<var>Unicode</var> discussion in the "Usage Notes" [[??]] reftxt=* refid=unreplu.. | ||
For more information about the UNICODE command, see [[??]] refid=ucmd.. | For more information about the UNICODE command, see [[??]] refid=ucmd.. | ||
:noteh. <li><b>WspNewline</b> | :noteh. <li><b>WspNewline</b> | ||
Line 62: | Line 62: | ||
This function can throw the following exception: | This function can throw the following exception: | ||
<dl> | <dl> | ||
<dt>XmlParseError | <dt><var>XmlParseError</var> | ||
<dd>If the method encounters a parsing error, | <dd>If the method encounters a parsing error, | ||
properties of the exception object may indicate the location and type of problem. | properties of the exception object may indicate the location and type of problem. | ||
Line 77: | Line 77: | ||
For example, you can use | For example, you can use | ||
<tt>WspPreserve</tt> and <tt>wsppreserve</tt>, interchangeably. | <tt>WspPreserve</tt> and <tt>wsppreserve</tt>, interchangeably. | ||
<li>If the LoadXml method object is an XmlDoc or a root XmlNode, | <li>If the <var>LoadXml</var> method object is an <var>XmlDoc</var> or a root <var>XmlNode</var>, | ||
LoadXml will accept any of the input character sets specified below. | <var>LoadXml</var> will accept any of the input character sets specified below. | ||
The correspondence between these input character sets and the | The correspondence between these input character sets and the | ||
value of <tt>encoding</tt> in the XML declaration is explained | value of <tt>encoding</tt> in the XML declaration is explained | ||
in the description of the <tt>Encoding</tt> property of an XmlDoc, | in the description of the <tt>Encoding</tt> property of an <var>XmlDoc</var>, | ||
under its [[??]] refid=enctyp., and is also shown in the following two tables. | under its [[??]] refid=enctyp., and is also shown in the following two tables. | ||
Line 103: | Line 103: | ||
<dt>EBCDIC | <dt>EBCDIC | ||
<dd>UTF-8, ISO-8859-<i>n</i>, or <i>none</i> | <dd>UTF-8, ISO-8859-<i>n</i>, or <i>none</i> | ||
<dt>Unicode (User Language) | <dt><var>Unicode</var> (User Language) | ||
<dd>UTF-8, ISO-8859-<i>n</i>, or <i>none</i> | <dd>UTF-8, ISO-8859-<i>n</i>, or <i>none</i> | ||
</dl> | </dl> | ||
Line 110: | Line 110: | ||
<dd>Input bytestream | <dd>Input bytestream | ||
<dt>UTF-8 | <dt>UTF-8 | ||
<dd>UTF-8 (which includes ASCII codes below X'80'), EBCDIC, or Unicode | <dd>UTF-8 (which includes ASCII codes below X'80'), EBCDIC, or <var>Unicode</var> | ||
<dt>UTF-16 | <dt>UTF-16 | ||
<dd>UTF-16 (including a two byte order mark bytes as a preamble to the XML document | <dd>UTF-16 (including a two byte order mark bytes as a preamble to the XML document | ||
input stream) | input stream) | ||
<dt>ISO-8859-<i><b>n</b></i> | <dt>ISO-8859-<i><b>n</b></i> | ||
<dd>ASCII codes up to X'FF', EBCDIC, or Unicode | <dd>ASCII codes up to X'FF', EBCDIC, or <var>Unicode</var> | ||
<dt><i><b>none</b></i> | <dt><i><b>none</b></i> | ||
<dd>UTF-8, UTF-16, ASCII, EBCDIC, or Unicode | <dd>UTF-8, UTF-16, ASCII, EBCDIC, or <var>Unicode</var> | ||
</dl> | </dl> | ||
In certain LoadXml error cases (for example, an input containing an | In certain <var>LoadXml</var> error cases (for example, an input containing an | ||
ASCII code above X'7F' without an XML declaration containing ISO-8859-<i>n</i>), | ASCII code above X'7F' without an XML declaration containing ISO-8859-<i>n</i>), | ||
a group of error messages is issued to display: | a group of error messages is issued to display: | ||
Line 140: | Line 140: | ||
MSIR.0665: | | MSIR.0665: | | ||
</p> | </p> | ||
<li>If the LoadXml method object is a non-root XmlNode, LoadXml accepts | <li>If the <var>LoadXml</var> method object is a non-root <var>XmlNode</var>, <var>LoadXml</var> accepts | ||
a Unicode string or a bytestream it treats as EBCDIC. | a <var>Unicode</var> string or a bytestream it treats as EBCDIC. | ||
<li>As described in [[??]] refid=u80., serializing with LoadXml | <li>As described in [[??]] refid=u80., serializing with <var>LoadXml</var> | ||
may require translation of the input document using the Unicode tables. | may require translation of the input document using the <var>Unicode</var> tables. | ||
This depends on the version of the ''Sirius Mods'' (that is, whether | This depends on the version of the ''Sirius Mods'' (that is, whether <var>XmlDoc</var>s | ||
are maintained in EBCDIC or Unicode) and on which | are maintained in EBCDIC or <var>Unicode</var>) and on which | ||
of the input character sets described above is used. | of the input character sets described above is used. | ||
<li>An XML fragment '''does not''' provide for inserting an Attribute | <li>An XML fragment '''does not''' provide for inserting an Attribute | ||
into an Element node. | into an Element node. | ||
For example, the following does not achieve it: | For example, the following does not achieve it: | ||
<p class="code">%d Object XmlDoc Auto New | <p class="code">%d <var>Object</var> <var>XmlDoc</var> Auto New | ||
%n Object XmlNode | %n <var>Object</var> <var>XmlNode</var> | ||
%n = %d:AddElement('top') | %n = %d:AddElement('top') | ||
%n:LoadXml('foo="bar"') | %n:<var>LoadXml</var>('foo="bar"') | ||
%d:Print | %d:Print | ||
</p> | </p> | ||
The input to LoadXml above is simply stored as the character content | The input to <var>LoadXml</var> above is simply stored as the character content | ||
of the Element containing the fragment, so the result is: | of the Element containing the fragment, so the result is: | ||
<p class="output"><top>foo="bar"</top> | <p class="output"><top>foo="bar"</top> | ||
Line 162: | Line 162: | ||
To add the Attribute <tt>foo</tt> with the value "<tt>bar</tt>", | To add the Attribute <tt>foo</tt> with the value "<tt>bar</tt>", | ||
replace LoadXml in the example above with the | replace <var>LoadXml</var> in the example above with the | ||
[[AddAttribute (XmlNode function)|AddAttribute]] method. | [[AddAttribute (XmlNode function)|AddAttribute]] method. | ||
<br> | <br> | ||
<li>If the method object refers to the Root node of an XmlDoc, the | <li>If the method object refers to the Root node of an <var>XmlDoc</var>, the | ||
LoadXml method in the XmlNode class behaves | <var>LoadXml</var> method in the <var>XmlNode</var> class behaves | ||
exactly as the LoadXml method in the XmlDoc class. | exactly as the <var>LoadXml</var> method in the <var>XmlDoc</var> class. | ||
For example: | For example: | ||
<p class="code">%d Object XmlDoc Auto New | <p class="code">%d <var>Object</var> <var>XmlDoc</var> Auto New | ||
%n Object XmlNode | %n <var>Object</var> <var>XmlNode</var> | ||
%n = %d:SelectSingleNode | %n = %d:SelectSingleNode | ||
%n:LoadXml('<?xml version="1.0"?><top><inner/></top>') | %n:<var>LoadXml</var>('<?xml version="1.0"?><top><inner/></top>') | ||
</p> | </p> | ||
When the Root node is the method object, the serialized input must | When the Root node is the method object, the serialized input must | ||
be a legal XML document (for example, the XmlDoc must be Empty, and the | be a legal XML document (for example, the <var>XmlDoc</var> must be Empty, and the | ||
serialized input must contain exactly one top-level element). | serialized input must contain exactly one top-level element). | ||
<li>Whitespace handling | <li>Whitespace handling | ||
Line 212: | Line 212: | ||
according to the option in effect for the deserialization. | according to the option in effect for the deserialization. | ||
<li>There is no whitespace normalization | <li>There is no whitespace normalization | ||
comparable to the LoadXml whitespace-handling ''options'' | comparable to the <var>LoadXml</var> whitespace-handling ''options'' | ||
for the Add and | for the Add and | ||
Insert...Before functions that create a Text node | Insert...Before functions that create a Text node | ||
Line 223: | Line 223: | ||
In [[??]] refid=exload., | In [[??]] refid=exload., | ||
see the fourth example (which contains <tt>&#x09;</tt>). | see the fourth example (which contains <tt>&#x09;</tt>). | ||
<li>If ''input'' is a Stringlist, LoadXml inserts a linefeed character | <li>If ''input'' is a <var>Stringlist</var>, <var>LoadXml</var> inserts a linefeed character | ||
after each item in the Stringlist as part of concatenation prior to | after each item in the <var>Stringlist</var> as part of concatenation prior to | ||
deserialization. | deserialization. | ||
The linefeed is then subject to the method's whitespace handling options, | The linefeed is then subject to the method's whitespace handling options, | ||
Line 233: | Line 233: | ||
collapses all whitespace content between markup to the null | collapses all whitespace content between markup to the null | ||
string, so it is not stored as a Text node. | string, so it is not stored as a Text node. | ||
This reduces the storage required by the XmlDoc, speeds up | This reduces the storage required by the <var>XmlDoc</var>, speeds up | ||
XPath and node access processing, and makes the output of, say, the | XPath and node access processing, and makes the output of, say, the | ||
Print subroutine easier to read. | Print subroutine easier to read. | ||
Line 257: | Line 257: | ||
</ul> | </ul> | ||
<br> | <br> | ||
<li>Deserializing Unicode strings | <li>Deserializing <var>Unicode</var> strings | ||
<ul> | <ul> | ||
<li>The LoadXml <tt>AllowUntranslatable</tt> option | <li>The <var>LoadXml</var> <tt>AllowUntranslatable</tt> option | ||
lets you deserialize Unicode strings that contain characters | lets you deserialize <var>Unicode</var> strings that contain characters | ||
that are not translatable to EBCDIC. | that are not translatable to EBCDIC. | ||
For example, LoadXml accepts the Unicode trademark character (U+2122) | For example, <var>LoadXml</var> accepts the <var>Unicode</var> trademark character (U+2122) | ||
only if you specify <tt>allowUntranslatable</tt>, as in the following. | only if you specify <tt>allowUntranslatable</tt>, as in the following. | ||
The U function below is described [[??]] refid=umeth.. | The U function below is described [[??]] refid=umeth.. | ||
<p class="code">%u is unicode Initial('&#x2122;':U) | <p class="code">%u is unicode Initial('&#x2122;':U) | ||
%nod:LoadXml(%u, 'allowUntranslatable') | %nod:<var>LoadXml</var>(%u, 'allowUntranslatable') | ||
</p> | </p> | ||
If you remove <tt>allowUntranslatable</tt>, this LoadXml statement fails, | If you remove <tt>allowUntranslatable</tt>, this <var>LoadXml</var> statement fails, | ||
because the Unicode trademark character does not translate to an EBCDIC character. | because the <var>Unicode</var> trademark character does not translate to an EBCDIC character. | ||
By default, the method detects any untranslatable characters in the serialized | By default, the method detects any untranslatable characters in the serialized | ||
input XML document; it also throws an XmlParseError exception | input XML document; it also throws an <var>XmlParseError</var> exception | ||
([[??]] refid=xmlpars.) with reason <tt> | ([[??]] refid=xmlpars.) with reason <tt>Untranslatable<var>Unicode</var></tt> | ||
(unless the <tt>ErrRet</tt> option is specified). | (unless the <tt>ErrRet</tt> option is specified). | ||
Line 279: | Line 279: | ||
That is, it ensures that subsequent | That is, it ensures that subsequent | ||
access to the deserialized content is performed | access to the deserialized content is performed | ||
without any Unicode to EBCDIC translation errors. | without any <var>Unicode</var> to EBCDIC translation errors. | ||
For example: | For example: | ||
<p class="code">%doc:LoadXml | <p class="code">%doc:<var>LoadXml</var> | ||
... | ... | ||
%val Longstring | %val <var>Longstring</var> | ||
%val = %doc:Value(%xpath) | %val = %doc:Value(%xpath) | ||
</p> | </p> | ||
Although the Value method returns a Unicode string, | Although the Value method returns a <var>Unicode</var> string, | ||
the assignment to the EBCDIC string <tt>%val</tt> will not fail due | the assignment to the EBCDIC string <tt>%val</tt> will not fail due | ||
to a Unicode translation problem: if there is any untranslatable | to a <var>Unicode</var> translation problem: if there is any untranslatable | ||
Unicode (including, of course, strings in the XML document which your | <var>Unicode</var> (including, of course, strings in the XML document which your | ||
application never accesses), the LoadXml operation fails. | application never accesses), the <var>LoadXml</var> operation fails. | ||
If you use AllowUntranslatable, | If you use AllowUntranslatable, | ||
all Unicode characters in a serialized input XML document are | all <var>Unicode</var> characters in a serialized input XML document are | ||
allowed and stored in the XmlDoc. | allowed and stored in the <var>XmlDoc</var>. | ||
Your stored data may contain content that is not translatable to | Your stored data may contain content that is not translatable to | ||
EBCDIC, however. | EBCDIC, however. | ||
A subsequent attempt to access such content that | A subsequent attempt to access such content that | ||
performs Unicode to EBCDIC translation (like the Value method statement above) | performs <var>Unicode</var> to EBCDIC translation (like the Value method statement above) | ||
might cause request cancellation. | might cause request cancellation. | ||
You should therefore use AllowUntranslatable | You should therefore use AllowUntranslatable | ||
only if there is also a check for translatability | only if there is also a check for translatability | ||
when parts of the XmlDoc that may | when parts of the <var>XmlDoc</var> that may | ||
have non-translatable Unicode content are accessed. | have non-translatable <var>Unicode</var> content are accessed. | ||
The code below, for example, | The code below, for example, | ||
shows a way to get the benefit of specifying AllowUntranslatable | shows a way to get the benefit of specifying AllowUntranslatable | ||
Line 312: | Line 312: | ||
In the following example, it is believed that only the | In the following example, it is believed that only the | ||
element <tt>comments</tt> might | element <tt>comments</tt> might | ||
contain untranslatable Unicode among all the data accessed from the XML document: | contain untranslatable <var>Unicode</var> among all the data accessed from the XML document: | ||
<p class="code">%resp:LoadXml(%doc, 'AllowUntranslatable') | <p class="code">%resp:<var>LoadXml</var>(%doc, 'AllowUntranslatable') | ||
... | ... | ||
%uVal Unicode | %uVal <var>Unicode</var> | ||
%val Longstring | %val <var>Longstring</var> | ||
%uVal = %node:Value('comments') | %uVal = %node:Value('comments') | ||
Try %val = %uVal: | Try %val = %uVal:<var>Unicode</var>ToEbcdic | ||
Catch CharacterTranslationException | Catch <var>CharacterTranslationException</var> | ||
%val = %uVal: | %val = %uVal:<var>Unicode</var>ToEbcdic(CharacterEncode=True) | ||
Print 'Untranslatable Unicode, character encoded:' - | Print 'Untranslatable <var>Unicode</var>, character encoded:' - | ||
And %val | And %val | ||
End Try | End Try | ||
</p> | </p> | ||
'''Note:''' | '''Note:''' | ||
Unicode values, untranslatable or not, are always allowed | <var>Unicode</var> values, untranslatable or not, are always allowed | ||
when they are added to an XmlDoc using one of the Add or Insert methods which | when they are added to an <var>XmlDoc</var> using one of the Add or Insert methods which | ||
"directly store" into an XmlDoc. | "directly store" into an <var>XmlDoc</var>. | ||
For example, the following fragment adds an Element node with | For example, the following fragment adds an Element node with | ||
a value that is the Unicode trademark sign: | a value that is the <var>Unicode</var> trademark sign: | ||
<p class="code">%node:AddElement('notation', '&#x2122;':U) | <p class="code">%node:AddElement('notation', '&#x2122;':U) | ||
</p> | </p> | ||
'''Note:''' | '''Note:''' | ||
LoadXml accepts input that contains XML hex character references | <var>LoadXml</var> accepts input that contains XML hex character references | ||
but not input that contains XHTML entity references (other than | but not input that contains XHTML entity references (other than | ||
the five predefined entities described in [[??]] refid=entrefs.). | the five predefined entities described in [[??]] refid=entrefs.). | ||
For example, this statement successfully loads a copyright character: | For example, this statement successfully loads a copyright character: | ||
<p class="code">%d:LoadXml('<a>&#xA9;</a>') | <p class="code">%d:<var>LoadXml</var>('<a>&#xA9;</a>') | ||
</p> | </p> | ||
And this statement successfully loads a greater-than sign: | And this statement successfully loads a greater-than sign: | ||
<p class="code">%d:LoadXml('<a>&gt;</a>') | <p class="code">%d:<var>LoadXml</var>('<a>&gt;</a>') | ||
</p> | </p> | ||
But this statement with a copyright character entity fails: | But this statement with a copyright character entity fails: | ||
<p class="code">%d:LoadXml('<a>&copy;</a>') | <p class="code">%d:<var>LoadXml</var>('<a>&copy;</a>') | ||
</p> | </p> | ||
You can load a copyright character, however, if you | You can load a copyright character, however, if you | ||
decode the reference and convert to Unicode before the deserialization. | decode the reference and convert to <var>Unicode</var> before the deserialization. | ||
For example: | For example: | ||
<p class="code">%d:LoadXml('<a>&copy;</a>':U) | <p class="code">%d:<var>LoadXml</var>('<a>&copy;</a>':U) | ||
</p> | </p> | ||
For more information about working with Unicode characters, | For more information about working with <var>Unicode</var> characters, | ||
see [[Strings and Unicode]]. | see [[Strings and Unicode]]. | ||
<li>The | <li>The Replace<var>Unicode</var> option lets you replace certain | ||
Unicode input characters | <var>Unicode</var> input characters | ||
with those characters you have explicitly specified (by UNICODE commands | with those characters you have explicitly specified (by UNICODE commands | ||
in your site's ''Model 204'' CCAIN stream). | in your site's ''Model 204'' CCAIN stream). | ||
Line 367: | Line 367: | ||
<p class="code">UNICODE Table Standard Rep U=2122 '(tm)' | <p class="code">UNICODE Table Standard Rep U=2122 '(tm)' | ||
</p> | </p> | ||
Given the above command, the | Given the above command, the Replace<var>Unicode</var> option for <var>LoadXml</var> is shown | ||
in the following fragment (the | in the following fragment (the <var>Unicode</var>With method is | ||
described [[??]] reftxt=* refid=uniwith., and | described [[??]] reftxt=* refid=uniwith., and | ||
Utf16To<var>Unicode</var> [[??]] reftxt=* refid=fu162u.): | |||
<p class="code">%u Unicode Initial('<a>') | <p class="code">%u <var>Unicode</var> Initial('<a>') | ||
%u = %u: | %u = %u:<var>Unicode</var>With('2122':X:Utf16To<var>Unicode</var>) | ||
%u = %u: | %u = %u:<var>Unicode</var>With('</a>':U) | ||
%d:LoadXml(%u, ' | %d:<var>LoadXml</var>(%u, 'Replace<var>Unicode</var>') | ||
%d:Print | %d:Print | ||
</p> | </p> | ||
Line 381: | Line 381: | ||
</p> | </p> | ||
In the preceeding example, the stream of input characters to LoadXml | In the preceeding example, the stream of input characters to <var>LoadXml</var> | ||
contains the Unicode character U+2122. | contains the <var>Unicode</var> character U+2122. | ||
Since the | Since the Replace<var>Unicode</var> option applies to both the stream of input characters | ||
and to the character value of character references, consider the following fragment | and to the character value of character references, consider the following fragment | ||
(assuming the same CCAIN line as above): | (assuming the same CCAIN line as above): | ||
<p class="code">%d:LoadXml('<a>&#x2122;</a>', ' | <p class="code">%d:<var>LoadXml</var>('<a>&#x2122;</a>', 'Replace<var>Unicode</var>') | ||
</p> | </p> | ||
The result is also: | The result is also: | ||
Line 407: | Line 407: | ||
because the replacement string is being used as part of a | because the replacement string is being used as part of a | ||
character reference: | character reference: | ||
<p class="code">%d:LoadXml('<a>&#x' With '&#xB2;':U With ';</a>', - | <p class="code">%d:<var>LoadXml</var>('<a>&#x' With '&#xB2;':U With ';</a>', - | ||
' | 'Replace<var>Unicode</var>') | ||
</p> | </p> | ||
As a consequence of this rule, a replacement string should not | As a consequence of this rule, a replacement string should not | ||
contain an ampersand character (assuming that the | contain an ampersand character (assuming that the Replace<var>Unicode</var> option will | ||
be used). | be used). | ||
<li>Replacement of a Unicode character due to the | <li>Replacement of a <var>Unicode</var> character due to the | ||
Replace<var>Unicode</var> option is only done while processing names and | |||
values in the XML document. | values in the XML document. | ||
It is an error if the end of the | It is an error if the end of the | ||
name or value occurs and the replacement string has not been exhausted. | name or value occurs and the replacement string has not been exhausted. | ||
In other words (again assuming that the | In other words (again assuming that the Replace<var>Unicode</var> option will be | ||
used), a replacement string should not have "XML markup" that might | used), a replacement string should not have "XML markup" that might | ||
end a string, such as a quotation mark or a left angle bracket (<tt><</tt>). | end a string, such as a quotation mark or a left angle bracket (<tt><</tt>). | ||
Line 427: | Line 427: | ||
because the '<' that is encountered in the replacement string | because the '<' that is encountered in the replacement string | ||
ends the element content: | ends the element content: | ||
<p class="code">%d:LoadXml('<a>&#x2122;</a>':U, ' | <p class="code">%d:<var>LoadXml</var>('<a>&#x2122;</a>':U, 'Replace<var>Unicode</var>') | ||
</p> | </p> | ||
<li>If a parsing error occurs after processing a Unicode character that | <li>If a parsing error occurs after processing a <var>Unicode</var> character that | ||
has been replaced, the error display of the input stream will contain the | has been replaced, the error display of the input stream will contain the | ||
replacement string, and the replaced character will not be displayed. | replacement string, and the replaced character will not be displayed. | ||
Line 440: | Line 440: | ||
==Examples== | ==Examples== | ||
<ul> | <ul> | ||
<li>The following code creates the XmlDoc representation of the | <li>The following code creates the <var>XmlDoc</var> representation of the | ||
indicated XML document: | indicated XML document: | ||
<p class="code">%d Object XmlDoc | <p class="code">%d <var>Object</var> <var>XmlDoc</var> | ||
%d = New | %d = New | ||
%d:LoadXml('<zen>The Buddha dog says</zen>') | %d:<var>LoadXml</var>('<zen>The Buddha dog says</zen>') | ||
</p> | </p> | ||
<li>The following code | <li>The following code | ||
creates an XML document as plain text for a test (or for some other | creates an XML document as plain text for a test (or for some other | ||
application): | application): | ||
<p class="code">%d Object XmlDoc | <p class="code">%d <var>Object</var> <var>XmlDoc</var> | ||
%d = New | %d = New | ||
%sl Object Stringlist | %sl <var>Object</var> <var>Stringlist</var> | ||
%sl = New | %sl = New | ||
Text to %sl | Text to %sl | ||
Line 460: | Line 460: | ||
</test> | </test> | ||
End Text | End Text | ||
%d:LoadXml(%sl) | %d:<var>LoadXml</var>(%sl) | ||
</p> | </p> | ||
<li> | <li> | ||
The following code calls a subroutine which uses the ErrRet option: | The following code calls a subroutine which uses the ErrRet option: | ||
<p class="code">%d Object XmlDoc | <p class="code">%d <var>Object</var> <var>XmlDoc</var> | ||
%d = New | %d = New | ||
%s Longstring | %s <var>Longstring</var> | ||
... setup the (serialized) document in %S | ... setup the (serialized) document in %S | ||
%d = New | %d = New | ||
Call IntoXML(%d, %s) | Call IntoXML(%d, %s) | ||
... do interesting things with the XmlDoc | ... do interesting things with the <var>XmlDoc</var> | ||
... setup another document in %S | ... setup another document in %S | ||
Call IntoXML(%d, %s) | Call IntoXML(%d, %s) | ||
... | ... | ||
Subroutine IntoXML(%d Object XmlDoc, %S Longstring) | Subroutine IntoXML(%d <var>Object</var> <var>XmlDoc</var>, %S <var>Longstring</var>) | ||
If %d:LoadXml(%S, 'ErrRet') Then | If %d:<var>LoadXml</var>(%S, 'ErrRet') Then | ||
... error handling code ... | ... error handling code ... | ||
End If | End If | ||
End Subroutine | End Subroutine | ||
</p> | </p> | ||
<li>As stated for the <i>options</i> argument in the LoadXml syntax, | <li>As stated for the <i>options</i> argument in the <var>LoadXml</var> syntax, | ||
whitespace normalization applies to the characters in the input | whitespace normalization applies to the characters in the input | ||
serialized string, not the values after entity substitution. | serialized string, not the values after entity substitution. | ||
Therefore the values of elements | Therefore the values of elements | ||
"foo1" and "foo2" created by the following two | "foo1" and "foo2" created by the following two | ||
LoadXml invocations are different: | <var>LoadXml</var> invocations are different: | ||
<p class="code">%t = $X2C('05') | <p class="code">%t = $X2C('05') | ||
%d:LoadXml('<foo1>' With %t With %t With '</foo1>', - | %d:<var>LoadXml</var>('<foo1>' With %t With %t With '</foo1>', - | ||
'wsptoken') | 'wsptoken') | ||
%d2:LoadXml('<foo2>&#x09;&#x09;' With '</foo2>', - | %d2:<var>LoadXml</var>('<foo2>&#x09;&#x09;' With '</foo2>', - | ||
'wsptoken') | 'wsptoken') | ||
</p> | </p> | ||
Line 498: | Line 498: | ||
is deserialized with the WspToken option, then printed: | is deserialized with the WspToken option, then printed: | ||
<p class="code">... | <p class="code">... | ||
%d Object XmlDoc auto new | %d <var>Object</var> <var>XmlDoc</var> auto new | ||
%le string len 16 | %le string len 16 | ||
%le = $X2C('0D25') | %le = $X2C('0D25') | ||
%d:LoadXml('<a foo=" bar " > x' With %le - | %d:<var>LoadXml</var>('<a foo=" bar " > x' With %le - | ||
With 'y</a>', 'WspToken') | With 'y</a>', 'WspToken') | ||
Print %d:Serial('.', 'EBCDIC') | Print %d:Serial('.', 'EBCDIC') | ||
Line 532: | Line 532: | ||
%le = $X2C('0D25') | %le = $X2C('0D25') | ||
%tb = $X2C('05') | %tb = $X2C('05') | ||
%d:LoadXml('<a foo=" bar " >' With %le With %tb - | %d:<var>LoadXml</var>('<a foo=" bar " >' With %le With %tb - | ||
With %tb With '</a>', 'LinefeedNoTrailingTabs') | With %tb With '</a>', 'LinefeedNoTrailingTabs') | ||
Print %d:Serial('.', 'EBCDIC') | Print %d:Serial('.', 'EBCDIC') | ||
Line 543: | Line 543: | ||
<li>The [[Janus Sockets]]R documents the HttpResponse object, whose | <li>The [[Janus Sockets]]R documents the HttpResponse object, whose | ||
ParseXml method has the same options as LoadXml. | ParseXml method has the same options as <var>LoadXml</var>. | ||
The following fragment requests, receives, and | The following fragment requests, receives, and | ||
deserializes an XML document from a Web server: | deserializes an XML document from a Web server: | ||
<p class="code">%httpreq object HttpRequest | <p class="code">%httpreq object HttpRequest | ||
%httpresp object HttpResponse | %httpresp object HttpResponse | ||
%doc object XmlDoc | %doc object <var>XmlDoc</var> | ||
%httpreq = New | %httpreq = New | ||
%doc = New | %doc = New | ||
Line 560: | Line 560: | ||
</p> | </p> | ||
'''Note:''' | '''Note:''' | ||
If you use $Sock_Recv and LoadXml directly instead of using an HTTP Helper object, | If you use $Sock_Recv and <var>LoadXml</var> directly instead of using an HTTP Helper object, | ||
always use the BINARY option of $Sock_Recv, so that | always use the BINARY option of $Sock_Recv, so that | ||
LoadXml can recognize the character encoding inherent in the | <var>LoadXml</var> can recognize the character encoding inherent in the | ||
serialized XML document. | serialized XML document. | ||
</ul> | </ul> | ||
Line 584: | Line 584: | ||
use [[WebReceive (XmlDoc function)|WebReceive]]. | use [[WebReceive (XmlDoc function)|WebReceive]]. | ||
<li>For other transport APIs, such as [[Janus Sockets]] or ''Model 204'' MQ Series, | <li>For other transport APIs, such as [[Janus Sockets]] or ''Model 204'' MQ Series, | ||
LoadXml can be used to deserialize a document that has been received with | <var>LoadXml</var> can be used to deserialize a document that has been received with | ||
the transport API. | the transport API. | ||
As mentioned in the example above, [[Janus Sockets]] has a convenient ParseXml | As mentioned in the example above, [[Janus Sockets]] has a convenient ParseXml | ||
method for deserializing an HTTP response. | method for deserializing an HTTP response. | ||
<li>The function that serializes an XmlDoc as a UTF-8 or EBCDIC | <li>The function that serializes an <var>XmlDoc</var> as a UTF-8 or EBCDIC | ||
string is [[Serial (XmlDoc/XmlNode function)|Serial]]. | string is [[Serial (XmlDoc/XmlNode function)|Serial]]. | ||
<li>For more information about normalization, see [[??]] refid=normize.. | <li>For more information about normalization, see [[??]] refid=normize.. | ||
</ul> | </ul> |
Revision as of 17:46, 25 January 2011
Deserialize XML document or fragment into XmlDoc Root or into Element XmlNode (XmlDoc and XmlNode classes)
[Requires Janus SOAP]
This callable function
converts a text string representation of an
XML document
into an empty XmlDoc, or of an XML fragment as one or more
children of an Element XmlNode.
This process is called deserialization,
because the text representation of a document is called the serial
form.
LoadXml returns a zero value if the deserialization is successful; it returns a non-zero value if deserialization is unsuccessful, the ErrRet option is used, and the particular error is tolerated.
Syntax
[%errorPosition =] nr:LoadXml( input, [options]) Throws XmlParseError
Syntax terms
%pos | A %variable set to 0 if the deserialization is successful. If the ErrRet option is used, this %variable is set to the position within input at which an error is found. |
---|---|
nr | An expression that points to the XmlDoc or XmlNode to contain the deserialized representation of the XML document or fragment, respectively.
If an XmlDoc, it must be EMPTY (see ?? refid=dstates.) prior to invoking LoadXml. If an XmlNode that is the root node of an XmlDoc, the XmlDoc must be EMPTY. Prior to Sirius Mods version 6.8, this method object had to be an XmlDoc; as of version 6.8, LoadXml is also available in the XmlNode class. |
input | The text string, Longstring, or (as of Sirius Mods version 6.8) Stringlist to be deserialized. If a Stringlist, input consists of the concatenation of the Stringlist items with no insertion of line-end characters at the end of each item.
If the nr method object is an XmlDoc or the root node of an XmlDoc, input must be valid as an entire XML document (for example, only one top-level element). If nr is a non-root XmlNode, input must be an XML fragment, that is, a substring of a serialized XML document, such that:
|
options | Any valid combination of the following terms:
|
Exceptions
This function can throw the following exception:
- XmlParseError
- If the method encounters a parsing error, properties of the exception object may indicate the location and type of problem. See ?? refid=xmlpars..
Usage notes
- As of Sirius Mods version 7.5, version="1.1" is accepted in the input to be deserialized. Formerly, only 1.0 was accepted.
- None of the options terms may be specified twice.
- The options terms may be specified in any case. For example, you can use WspPreserve and wsppreserve, interchangeably.
- If the LoadXml method object is an XmlDoc or a root XmlNode,
LoadXml will accept any of the input character sets specified below.
The correspondence between these input character sets and the
value of encoding in the XML declaration is explained
in the description of the Encoding property of an XmlDoc,
under its ?? refid=enctyp., and is also shown in the following two tables.
In both tables below,
all of the values of encoding in the XML declaration
must be specified in all-uppercase letters, and
n is a digit from 1 to 9 in the encoding value
ISO-8859-n:
- Input bytestream
- XML declaration
- ASCII codes below X'80'
- UTF-8, ISO-8859-n, or none
- ASCII codes up to X'FF'
- ISO-8859-n
- UTF-8 (with characters above X'7F')
- UTF-8 or none
- UTF-16
- UTF-16 or none
- EBCDIC
- UTF-8, ISO-8859-n, or none
- Unicode (User Language)
- UTF-8, ISO-8859-n, or none
- XML declaration
- Input bytestream
- UTF-8
- UTF-8 (which includes ASCII codes below X'80'), EBCDIC, or Unicode
- UTF-16
- UTF-16 (including a two byte order mark bytes as a preamble to the XML document input stream)
- ISO-8859-n
- ASCII codes up to X'FF', EBCDIC, or Unicode
- none
- UTF-8, UTF-16, ASCII, EBCDIC, or Unicode
In certain LoadXml error cases (for example, an input containing an ASCII code above X'7F' without an XML declaration containing ISO-8859-n), a group of error messages is issued to display:
- A line that contains the erroneous string as received
- A line with an ASCII-to-EBCDIC translation of the string
- Additional lines that display the input byte halves in base 16
For example, the following messages are output after an 8-bit ASCII input fails because no accompanying ISO-8859-n encoding was specified:
MSIR.0668: XML doc parse error: invalid first byte of UTF-8 encoding near or before position 14 MSIR.0708: C: ???????????????? MSIR.0708: E: copyright(?)</A> (ASCII to EBCDIC) MSIR.0708: X: 6677766672A23243 MSIR.0708: X: 3F0929784899CF1E MSIR.0665: |
- If the LoadXml method object is a non-root XmlNode, LoadXml accepts a Unicode string or a bytestream it treats as EBCDIC.
- As described in ?? refid=u80., serializing with LoadXml may require translation of the input document using the Unicode tables. This depends on the version of the Sirius Mods (that is, whether XmlDocs are maintained in EBCDIC or Unicode) and on which of the input character sets described above is used.
- An XML fragment does not provide for inserting an Attribute
into an Element node.
For example, the following does not achieve it:
%d Object XmlDoc Auto New %n Object XmlNode %n = %d:AddElement('top') %n:LoadXml('foo="bar"') %d:Print
The input to LoadXml above is simply stored as the character content of the Element containing the fragment, so the result is:
<top>foo="bar"</top>
To add the Attribute foo with the value "bar", replace LoadXml in the example above with the AddAttribute method.
- If the method object refers to the Root node of an XmlDoc, the
LoadXml method in the XmlNode class behaves
exactly as the LoadXml method in the XmlDoc class.
For example:
%d Object XmlDoc Auto New %n Object XmlNode %n = %d:SelectSingleNode %n:LoadXml('<?xml version="1.0"?><top><inner/></top>')
When the Root node is the method object, the serialized input must be a legal XML document (for example, the XmlDoc must be Empty, and the serialized input must contain exactly one top-level element).
- Whitespace handling
- The "Wsp" whitespace-handling options (WspPreserve, WspNewline, and WspToken) and the CrPreserve whitespace option are mutually exclusive; if none of them is specified, WspNewline is in effect. Although the LinefeedNoTrailingTabs option is also concerned with whitespace, it is distinct from, yet compatible with, any of the three "Wsp" options, but it is not compatible with the CrPreserve option.
- Except for CrPreserve, the whitespace-handling options are applied after the XML standard whitespace conversions that Janus SOAP applies in all other cases. As described in ?? refid=nornl., the standard specifies that all carriage return/linefeed sequences and carriage return sequences are to be converted to linefeeds when deserializing. Using the CrPreserve option bypasses this rule.
- The whitespace-handling options do no whitespace
conversion (beyond the XML standard conversions) on Element content that is
"protected" by the xml:space="preserve" attribute.
"Protected" by the xml:space="preserve" attribute
means an element E that either:
- has the xml:space attribute with the value preserve
- is contained in an element A with that attribute and value, and there is no element that is a descendent of A and an ancestor of E with the xml:space attribute with the value default
Elements that are not protected by the xml:space="preserve" attribute have whitespace handled according to the option in effect for the deserialization.
- There is no whitespace normalization comparable to the LoadXml whitespace-handling options for the Add and Insert...Before functions that create a Text node (AddElement, InsertElementBefore, AddText, and InsertTextBefore).
- Whitespace normalization applies to the characters in the input serialized string, not to the values after entity substitution. In ?? refid=exload., see the fourth example (which contains 	).
- If input is a Stringlist, LoadXml inserts a linefeed character after each item in the Stringlist as part of concatenation prior to deserialization. The linefeed is then subject to the method's whitespace handling options, so it is usually removed (as leading or trailing whitespace).
- Using WspNewline or WspToken reduces the space consumed by individual Text nodes, and in some cases collapses all whitespace content between markup to the null string, so it is not stored as a Text node. This reduces the storage required by the XmlDoc, speeds up XPath and node access processing, and makes the output of, say, the Print subroutine easier to read.
- The LinefeedNoTrailingTabs option only affects Text nodes that contain
an initial line-end character followed by any number of tabs and nothing else.
The LinefeedNoTrailingTabs effect on such a Text node,
whether it is specified with or without any of the "Wsp" options,
is to store the value of the node as a single line-end character.
One example of the use of the LinefeedNoTrailingTabs option is
an input XML document to be deserialized for which
both of the following are true:
- A digital signature is needed of a subtree in the document.
- The input subtree contains a linefeed and one or more tabs that separate markup, and the linefeed must be kept but the tabs discarded for the signature.
For information about exclusive canonicalization, serialization expressly designed for digital signatures, see Serial.
- Deserializing Unicode strings
- The LoadXml AllowUntranslatable option
lets you deserialize Unicode strings that contain characters
that are not translatable to EBCDIC.
For example, LoadXml accepts the Unicode trademark character (U+2122)
only if you specify allowUntranslatable, as in the following.
The U function below is described ?? refid=umeth..
%u is unicode Initial('™':U) %nod:LoadXml(%u, 'allowUntranslatable')
If you remove allowUntranslatable, this LoadXml statement fails, because the Unicode trademark character does not translate to an EBCDIC character. By default, the method detects any untranslatable characters in the serialized input XML document; it also throws an XmlParseError exception (?? refid=xmlpars.) with reason UntranslatableUnicode (unless the ErrRet option is specified).
This default detection of non-translatable characters may suit your purposes. That is, it ensures that subsequent access to the deserialized content is performed without any Unicode to EBCDIC translation errors. For example:
%doc:LoadXml ... %val Longstring %val = %doc:Value(%xpath)
Although the Value method returns a Unicode string, the assignment to the EBCDIC string %val will not fail due to a Unicode translation problem: if there is any untranslatable Unicode (including, of course, strings in the XML document which your application never accesses), the LoadXml operation fails.
If you use AllowUntranslatable, all Unicode characters in a serialized input XML document are allowed and stored in the XmlDoc. Your stored data may contain content that is not translatable to EBCDIC, however. A subsequent attempt to access such content that performs Unicode to EBCDIC translation (like the Value method statement above) might cause request cancellation.
You should therefore use AllowUntranslatable only if there is also a check for translatability when parts of the XmlDoc that may have non-translatable Unicode content are accessed. The code below, for example, shows a way to get the benefit of specifying AllowUntranslatable while limiting the risk of request cancellation.
In the following example, it is believed that only the element comments might contain untranslatable Unicode among all the data accessed from the XML document:
%resp:LoadXml(%doc, 'AllowUntranslatable') ... %uVal Unicode %val Longstring %uVal = %node:Value('comments') Try %val = %uVal:UnicodeToEbcdic Catch CharacterTranslationException %val = %uVal:UnicodeToEbcdic(CharacterEncode=True) Print 'Untranslatable Unicode, character encoded:' - And %val End Try
Note: Unicode values, untranslatable or not, are always allowed when they are added to an XmlDoc using one of the Add or Insert methods which "directly store" into an XmlDoc. For example, the following fragment adds an Element node with a value that is the Unicode trademark sign:
%node:AddElement('notation', '™':U)
Note: LoadXml accepts input that contains XML hex character references but not input that contains XHTML entity references (other than the five predefined entities described in ?? refid=entrefs.). For example, this statement successfully loads a copyright character:
%d:LoadXml('<a>©</a>')
And this statement successfully loads a greater-than sign:
%d:LoadXml('<a>></a>')
But this statement with a copyright character entity fails:
%d:LoadXml('<a>©</a>')
You can load a copyright character, however, if you decode the reference and convert to Unicode before the deserialization. For example:
%d:LoadXml('<a>©</a>':U)
For more information about working with Unicode characters, see Strings and Unicode. - The ReplaceUnicode option lets you replace certain
Unicode input characters
with those characters you have explicitly specified (by UNICODE commands
in your site's Model 204 CCAIN stream).
For example, assume the following command is in CCAIN:
UNICODE Table Standard Rep U=2122 '(tm)'
Given the above command, the ReplaceUnicode option for LoadXml is shown in the following fragment (the UnicodeWith method is described ?? reftxt=* refid=uniwith., and Utf16ToUnicode ?? reftxt=* refid=fu162u.):
%u Unicode Initial('<a>') %u = %u:UnicodeWith('2122':X:Utf16ToUnicode) %u = %u:UnicodeWith('</a>':U) %d:LoadXml(%u, 'ReplaceUnicode') %d:Print
The result is:
<a>(tm)</a>
In the preceeding example, the stream of input characters to LoadXml contains the Unicode character U+2122. Since the ReplaceUnicode option applies to both the stream of input characters and to the character value of character references, consider the following fragment (assuming the same CCAIN line as above):
%d:LoadXml('<a>™</a>', 'ReplaceUnicode')
The result is also:
<a>(tm)</a>
In this case, U+2122 does not occur in the input character stream, but it is the value of the character reference.
Notes:
- It is an error to be processing a replacement string within a character
reference.
For example, assume the following two lines are in CCAIN:
UNICODE Table Standard Rep U=00B2 '2'
Given the above command, the following fragment gets a parse error, because the replacement string is being used as part of a character reference:
%d:LoadXml('<a>&#x' With '²':U With ';</a>', - 'ReplaceUnicode')
As a consequence of this rule, a replacement string should not contain an ampersand character (assuming that the ReplaceUnicode option will be used).
- Replacement of a Unicode character due to the
ReplaceUnicode option is only done while processing names and
values in the XML document.
It is an error if the end of the
name or value occurs and the replacement string has not been exhausted.
In other words (again assuming that the ReplaceUnicode option will be
used), a replacement string should not have "XML markup" that might
end a string, such as a quotation mark or a left angle bracket (<).
For example, assume the following line is in CCAIN:
UNICODE Table Standard Rep U=2122 '(trademark)<tm>'
Given the above command, the following fragment gets a parsing error, because the '<' that is encountered in the replacement string ends the element content:
%d:LoadXml('<a>™</a>':U, 'ReplaceUnicode')
- If a parsing error occurs after processing a Unicode character that has been replaced, the error display of the input stream will contain the replacement string, and the replaced character will not be displayed. However, if the character being replaced was introduced as a character reference, the character reference remains in the display of the input stream.
- It is an error to be processing a replacement string within a character
reference.
For example, assume the following two lines are in CCAIN:
- The LoadXml AllowUntranslatable option
lets you deserialize Unicode strings that contain characters
that are not translatable to EBCDIC.
For example, LoadXml accepts the Unicode trademark character (U+2122)
only if you specify allowUntranslatable, as in the following.
The U function below is described ?? refid=umeth..
Examples
- The following code creates the XmlDoc representation of the
indicated XML document:
%d Object XmlDoc %d = New %d:LoadXml('<zen>The Buddha dog says</zen>')
- The following code
creates an XML document as plain text for a test (or for some other
application):
%d Object XmlDoc %d = New %sl Object Stringlist %sl = New Text to %sl <test> <test2> supercalifragilisticexpailodocious </test2> </test> End Text %d:LoadXml(%sl)
-
The following code calls a subroutine which uses the ErrRet option:
%d Object XmlDoc %d = New %s Longstring ... setup the (serialized) document in %S %d = New Call IntoXML(%d, %s) ... do interesting things with the XmlDoc ... setup another document in %S Call IntoXML(%d, %s) ... Subroutine IntoXML(%d Object XmlDoc, %S Longstring) If %d:LoadXml(%S, 'ErrRet') Then ... error handling code ... End If End Subroutine
- As stated for the options argument in the LoadXml syntax,
whitespace normalization applies to the characters in the input
serialized string, not the values after entity substitution.
Therefore the values of elements
"foo1" and "foo2" created by the following two
LoadXml invocations are different:
%t = $X2C('05') %d:LoadXml('<foo1>' With %t With %t With '</foo1>', - 'wsptoken') %d2:LoadXml('<foo2>		' With '</foo2>', - 'wsptoken')
- In the following fragment, element a, which contains
leading and intermediate whitespace,
is deserialized with the WspToken option, then printed:
... %d Object XmlDoc auto new %le string len 16 %le = $X2C('0D25') %d:LoadXml('<a foo=" bar " > x' With %le - With 'y</a>', 'WspToken') Print %d:Serial('.', 'EBCDIC') ...
The result shows that WspToken removes the leading whitespace — in the Element content, not in the Attribute value — and replaces the intermediate linefeed (the initial carriage return removed as the XML standard normalization dictates) with a single blank:
<a foo=" bar ">x y</a>
If WspNewline is used instead, the leading whitespace remains (it contains no line-end characters), and the intermediate linefeed (represented below by a question mark) also remains (it is not leading or trailing):
<a foo=" bar "> x?y</a>
In this example, using the WspPreserve option gives the same result as WspNewline: no whitespace is removed, except for the initial carriage return due to the XML standard normalization.
If the example is changed slightly so the Text node includes only tab characters and a leading line-end character, and LinefeedNoTrailingTabs is specified:
... %le = $X2C('0D25') %tb = $X2C('05') %d:LoadXml('<a foo=" bar " >' With %le With %tb - With %tb With '</a>', 'LinefeedNoTrailingTabs') Print %d:Serial('.', 'EBCDIC') ...
The resulting Text node contains only the line-end character:
<a foo=" bar ">?</a>
- The Janus SocketsR documents the HttpResponse object, whose
ParseXml method has the same options as LoadXml.
The following fragment requests, receives, and
deserializes an XML document from a Web server:
%httpreq object HttpRequest %httpresp object HttpResponse %doc object XmlDoc %httpreq = New %doc = New %httpreq:URL = 'foo.com/bar' %httpresp = %httpreq:Get('HTTP_CLIENT') If %httpresp:ParseXML(%doc, 'ErrRet') Then ... invalid document received from Web server End If
Note: If you use $Sock_Recv and LoadXml directly instead of using an HTTP Helper object, always use the BINARY option of $Sock_Recv, so that LoadXml can recognize the character encoding inherent in the serialized XML document.
Request-Cancellation Errors
- Method object doc is not EMPTY.
- Option is invalid.
- Insufficient free space exists in CCATEMP.
- A syntax error occurred in the representation of the XML document in input (this is tolerated if the ErrRet option is specified).
See also
- To deserialize a document (which has been POSTed or PUT) using Janus Web Server, use WebReceive.
- For other transport APIs, such as Janus Sockets or Model 204 MQ Series, LoadXml can be used to deserialize a document that has been received with the transport API. As mentioned in the example above, Janus Sockets has a convenient ParseXml method for deserializing an HTTP response.
- The function that serializes an XmlDoc as a UTF-8 or EBCDIC string is Serial.
- For more information about normalization, see ?? refid=normize..