LoadXml (XmlDoc/XmlNode function): Difference between revisions

From m204wiki
Jump to navigation Jump to search
mNo edit summary
 
(51 intermediate revisions by 6 users not shown)
Line 1: Line 1:
<span style="font-size:120%; color:black"><b>Deserialize text string into XmlDoc Root or into Element XmlNode</b></span>
{{Template:XmlDoc/XmlNode:LoadXml subtitle}}
[[Category:XmlDoc methods|LoadXml function]]
The <var>LoadXml</var> [[Notation conventions for methods|callable]] function converts a text string representation of an XML document into an <var>XmlDoc</var>, or a text string representation of an XML fragment into one or more children of an <var>[[XmlDoc API#XmlDoc node types|Element]]</var> <var>XmlNode</var>.
[[Category:XmlNode methods|LoadXml function]]
This process is called '''deserialization''', because the text representation of a document is called the '''serial''' form.
[[Category:XmlDoc API methods]]
<!--DPL?? Category:XmlDoc methods|LoadXml function: Deserialize text string into XmlDoc Root or into Element XmlNode-->
<var>LoadXml</var> returns a zero value if the deserialization is successful; it returns a non-zero value if deserialization is unsuccessful, the <var>ErrRet</var> option is used, and the particular error is tolerated.
<!--DPL?? Category:XmlNode methods|LoadXml function: Deserialize text string into XmlDoc Root or into Element XmlNode-->
<!--DPL?? Category:XmlDoc API methods|LoadXml (XmlDoc/XmlNode function): Deserialize text string into XmlDoc Root or into Element XmlNode-->
<!--DPL?? Category:System methods|LoadXml (XmlDoc/XmlNode function): Deserialize text string into XmlDoc Root or into Element XmlNode-->
<p>
LoadXml is a member of the [[XmlDoc class|XmlDoc]] and [[XmlNode class|XmlNode]] classes.
</p>
 
This callable function
converts a text string representation of an
XML document
into an empty XmlDoc, or of an XML fragment as one or more
children of an Element XmlNode.
This process is called '''deserialization''',
because the text representation of a document is called the '''serial'''
form.


LoadXml returns a zero value if the
==Syntax==
deserialization is successful; it returns a non-zero value if deserialization is
{{Template:XmlDoc/XmlNode:LoadXml syntax}}
unsuccessful, the ErrRet option is used, and the particular error is tolerated.
===Syntax===
  [%pos =] nr:LoadXml(input, [options])


====Syntax Terms====
===Syntax terms===
<dl>
<table class="syntaxTable">
<tr><th nowrap>%errorPosition</th>
<td>A %variable set to 0 if the deserialization is successful. If <code>ErrRet</code> is one of the <var class="term">options</var> used, <var class="term">%errorPosition</var> is set to the character position within <var class="term">input</var> at which an error is found.</td></tr>


<dt>%pos
<tr><th>nr</th>
<dd>A %variable set to 0 if the deserialization is successful.
<td>An expression that points to the <var>XmlDoc</var> or <var>XmlNode</var> to contain the deserialized representation of the XML document or fragment, respectively.
If the ErrRet option is used, this %variable is set to the position within
<i>input</i> at which an error is found.
If an <var>XmlDoc</var>, <i>it must be</i> <var>EMPTY</var> (see [[XmlDoc_API#XmlDoc states|XmlDoc states]]) prior to invoking <var>LoadXml</var>. If an <var>XmlNode</var> that is the root node of an <var>XmlDoc</var>, the <var>XmlDoc</var> must be <var>EMPTY</var>.
<dt>nr
</td></tr>
<dd>An expression that points to the XmlDoc or XmlNode to contain the
deserialized representation of the XML document or fragment, respectively.


If an XmlDoc, it must be EMPTY (see [[??]] refid=dstates.) prior to
<tr><th>input</th>
invoking LoadXml.
<td>The byte string or <var>Stringlist</var> to be deserialized. If a <var>Stringlist</var>, <var class="term">input</var> consists of the concatenation of the <var>Stringlist</var> items with insertion of line-end characters at the end of each item.
If an XmlNode that is the root node of an XmlDoc, the XmlDoc must be EMPTY.
If the <var class="term">nr</var> method object is an <var>XmlDoc</var> or the root node of an <var>XmlDoc</var>, <var class="term">input</var> must be valid as an entire XML document (for example, only one top-level element). If <var class="term">nr</var> is a non-root <var>XmlNode</var>, <var class="term">input</var> must be an '''XML fragment''', that is, a substring of a serialized XML document, such that:


Prior to ''Sirius Mods'' version 6.8, this method object had to be an XmlDoc;
as of version 6.8, LoadXml is also available in the XmlNode class.
<dt>input
<dd>The text string, Longstring, or (as of ''Sirius Mods'' version 6.8)
Stringlist to be deserialized.
If a Stringlist, ''input'' consists of the
concatenation of the Stringlist items with no insertion of line-end characters
at the end of each item.
If the ''nr'' method object is an XmlDoc or the root node of an XmlDoc,
''input'' must be valid as an entire XML document (for example, only one
top-level element).
If ''nr'' is a non-root XmlNode,
''input'' must be
an '''XML fragment''', that is, a substring of a serialized XML
document, such that:
<ul>
<li>The fragment may contain undeclared prefixes.
Any such prefixes must have
declarations which are in effect at the Element node referred to by
the method object of LoadXml; these declarations
(along with that Element's default namespace) are inherited by the inserted fragment.
<li>In all other respects, the fragment, if &ldquo;wrapped&rdquo; within a simple element
start tag and end tag (such as <tt><w></tt> and <tt></w></tt>, respectively),
is a legal XML document.
The fragment can contain leading and/or trailing
character content and/or multiple &ldquo;top-level&rdquo; elements; all of these become
children of the method object XmlNode.
</ul>
<dt>options
<dd>Any valid combination of the following terms:
<ul>
<ul>
<li><b>AllowUntranslatable</b>
<li>The fragment may contain undeclared prefixes. Any such prefixes must have declarations that are in effect at the <var>Element</var> node referred to by the method object of <var>LoadXml</var>. These declarations (along with that Element's default namespace) are inherited by the inserted fragment.


Allows all valid Unicode strings into the XML document.
<li>In all other respects, the fragment, if "wrapped" within a simple element start tag and end tag (such as <code><w></code> and <code></w></code>, respectively), is a legal XML document. The fragment can contain leading and/or trailing character content and/or multiple "top-level" elements; all of these become children of the method object <var>XmlNode</var>.
When this option is not specified, only Unicode
</ul></td></tr>
strings that are not translatable to EBCDIC are disallowed.


AllowUntranslatable allows untranslatable Unicode characters, but it does not
<tr><th><div id="options"></div>options</th>
affect untranslatable EBCDIC characters.
<td>Any valid combination of the following terms:


As described in the [[??]] refid=ununic., it is recommended that
<ul><div id="allowUntrans"></div>
you use AllowUntranslatable only if the application
<li><b>AllowUntranslatable</b><br>
checks for translatability when accessing parts of the XmlDoc that may
Allows all valid Unicode strings into the XML document. When this option is not specified, Unicode strings that are not translatable to EBCDIC are disallowed.  <var>AllowUntranslatable</var> allows untranslatable Unicode characters, but it does not affect untranslatable EBCDIC characters.  As described in [[#Deserializing Unicode strings|Deserializing Unicode strings]], it is recommended that you use <var>AllowUntranslatable</var> only if the application checks for translatability when accessing parts of the <var>XmlDoc</var> that may have untranslatable Unicode content.  The <var>AllowUntranslatable</var> option is available as of <var class="product">Sirius Mods</var> Version 7.6.
have untranslatable Unicode content.


The AllowUntranslatable option is available as of version 7.6 of the ''Sirius Mods''.
<li><b>CrPreserve</b><br>
<li><b>CrPreserve</b>
All whitespace characters in Element content are preserved, including carriage return. Unlike all other deserialization options, with <var>CrPreserve</var>, a carriage return in Element content does ''not'' undergo the normalization specified in the XML standard (and described in [[XML_processing_in_Janus_SOAP#Normalized_line-end|Normalized line-end]]).
<p>
<var>CrPreserve</var> is mutually exclusive with the <var>WspNewline</var>, <var>WspToken</var>, and <var>WspPreserve</var> options, and with the <var>LinefeedNoTrailingTabs</var> option.</p>
<p>
The <var>CrPreserve</var> option was added in <var class="product">Sirius Mods</var> Version 7.5, as well as being implemented with a maintenance zap to <var class="product">Sirius Mods</var> Version 7.4.</p>
<li><b>DTDIgnore</b><br>
If a <code><!DOCTYPE&nbsp;...></code> clause is present in the document, it should be ignored. In any case, the DTD is not processed. If <var>DTDIgnore</var> is not present, the default behavior is to treat <code><!DOCTYPE&nbsp;...></code> as a syntax error.  <var>DTD_Ignore</var> is a synonym for <var>DTDIgnore</var>.


All whitespace characters in Element content are preserved, including
<li><b>ErrRet</b><br>
carriage return.
Errors during deserialization are tolerated, the method object is not updated (retains its pre-call state), and the request continues. If <var>ErrRet</var> is not present, any error cancels the request. If <var>ErrRet</var> is present, some errors cancel the request and some are tolerated.
Unlike all other deserialization options, a carriage return
<p class="note">'''Note:''' Errors tolerated when <var>ErrRet</var> is specified are explicitly noted below in [[#Request cancellation errors|Request cancellation errors]], with one exception: CCATEMP full conditions always cause a request cancellation. </p>
in Element content does ''not'' undergo the normalization specified in the
XML standard (and referred to in [[??]] refid=unlxml.).


CrPreserve is mutually exclusive with the WspNewline, WspToken, and WspPreserve
<li><b>HtmlCharEnt</b><br>
options, and with the LinefeedNoTrailingTabs option.
Allow the standard XHTML entities for element and attribute content, and convert them to the corresponding Unicode characters. You can find the list of XHTML entities on the Internet at
http://www.w3.org/TR/xhtml1/dtds.html#h-A2.


The CrPreserve option was added in version 7.5 of the ''Sirius Mods'', as well as
<li><b>LinefeedNoTrailingTabs</b><br>
implemented with a maintenance zap to the 7.4 ''Sirius Mods''.
For a <var>Text</var> node that consists of an initial line-end character and one or more tab characters, this option normalizes the content so the result is a single line-end character. The initial line-end (also called "newline") character can be a linefeed character (LF) or a carriage-return (CR) by itself, or a carriage-return followed by a linefeed (CRLF), since (within <var>Text</var> nodes) all of these are normalized by the XML specification into a single line-end character.  This option, added in <var class="product">Sirius Mods</var> version 7.0, is compatible with, but takes precedence over, any of the other whitespace-handling options (<var>WspNewline</var>, <var>WspToken</var>, <var>WspPreserve</var>) except <var>CrPreserve</var>.  See [[#Whitespace handling|Whitespace handling]] below for more information about this option and about whitespace handling.
<li><b>DTDIgnore</b>


If a &ldquo;<!DOCTYPE&thinsp....>&rdquo;
<li><b>ReplaceUnicode</b><br>
clause is present in the document, it should be ignored.
Converts Unicode characters using the replacements (if any) specified at your site by updating [[Unicode#The UNICODE command|UNICODE]] commands that use the <var>Rep</var> subcommand (for example, <code>UNICODE Table Standard Rep U=2122 '(TM)'</code>).  The replacement is performed on all names, element and attribute values, comments, and PI "values" in the document, after any entity and character references have been converted to characters. For further discussion and examples, see [[#Using the ReplaceUnicode option|Using the ReplaceUnicode option]], below.
In any case, the DTD is not processed.
If DTDIgnore is not present, the default
behavior is to treat
&ldquo;<!DOCTYPE&thinsp....>&rdquo; as a syntax error.


DTD_Ignore is a synonym for DTDIgnore.
<li><b>WspNewline</b><br>
This option is designed to remove any whitespace inserted to make the structure of an XML document easier (for a person) to read. <var>WspNewline</var> removes the leading or trailing whitespace in the value of a <var>Text</var> node, if the whitespace sequence contains a newline (carriage return or linefeed) character.
<p class="note">'''Note:''' This handling, the default whitespace option for this method, applies to the "physical value" of the representation of a <var>Text</var> node. In particular, markup such as a character reference (even of whitespace, for example, <code>&amp;#32;</code>), a CDATA section, or any non-whitespace character delimits leading or trailing whitespace and is not affected.  See [[#Whitespace handling|Whitespace handling]] below for more information. </p>


:noteh
<li><b>WspToken</b><br>
<li><b>ErrRet</b>
Whitespace in <var>Element</var> content is normalized using the XPath <code>normalize()</code> function (leading and trailing whitespace removed, intermediate strings of whitespace replaced by a single blank character). <var>WspToken</var> is a good substitute for <var>WspNewline</var> to remove leading and trailing whitespace in cases where blanks (or tabs) and not line-end characters were used to make the document structure more readable &mdash; if it is tolerable to collapse intermediate whitespace sequencesto single space characters.  See [[#Whitespace handling|Whitespace handling]] below for more information.


Errors during deserialization are tolerated,
<li><b>WspPreserve</b><br>
the method object is not updated (retains its pre-call state), and the request
All whitespace characters in <var>Element</var> content are preserved (after end-of-line normalization, as described in [[#Whitespace handling|Whitespace handling]] below). <var>Wsp_Preserve</var> is a synonym for <var>WspPreserve</var>.
continues.
</ul></td></tr>
If ErrRet is not present, any error cancels the request.
</table>
If ErrRet is present,
some errors cancel the request and some are tolerated.
'''Note:'''
Errors tolerated when ErrRet is specified are explicitly noted in
[[??]] refid=rclxml., with one exception:
CCATEMP full conditions always cause a request cancellation.
<li><b>LinefeedNoTrailingTabs</b>


For a Text node that consists of an initial line-end character and one or more
==Exceptions==
tab characters, this option
<var>LoadXML</var> can throw the following exception:
normalizes the content so the result is a single line-end character.
<dl>
The initial line-end (also called &ldquo;newline&rdquo;) character can be a
<dt><var>[[XmlParseError class|XmlParseError]]</var>
linefeed character (LF) or a
<dd>If the method encounters a parsing error, properties of the exception object may indicate the location and type of problem.
carriage-return (CR) by itself, or a carriage-return followed by a
</dl>
linefeed (CRLF), since (within Text nodes) all of these are
normalized by the XML specification into a single line-end character.


This option, added in ''Sirius Mods'' version 7.0, is compatible with,
==Usage notes==
but takes precedence over, any of the
<ul>
other whitespace-handling options (WspNewline, WspToken, WspPreserve)
<li>As of <var class="product">Sirius Mods</var> version 7.5,
except CrPreserve.
<code>version="1.1"</code> is accepted in the input to be deserialized.
Formerly, only <code>1.0</code> was accepted.
<li>None of the <var class="term">options</var> terms may be specified twice.


See [[??]] refid=unwhite.
<li>The <var class="term">options</var> terms may be specified in any case.
for more information about this option and about whitespace handling.
For example, you can use
<li><b>ReplaceUnicode</b>
<code>WspPreserve</code> and <code>wsppreserve</code>, interchangeably.


Converts Unicode characters using the
<li>If the <var>LoadXml</var> method object is an <var>XmlDoc</var> or a root <var>XmlNode</var>,
replacements (if any) specified at your site by UNICODE updating commands
<var>LoadXml</var> will accept any of the input character sets specified below.
that use the <tt>Rep</tt> subcommand (for example,
<p>
<tt>UNICODE Table Standard Rep U=2122 '(TM)'</tt>).
The correspondence between these input character sets and the
 
value of <code>encoding</code> in the XML declaration is explained
The replacement is performed on all names, element and attribute
in the [[Encoding (XmlDoc property)#Usage notes|Usage notes]] for the <var>Encoding</var>]]
values, comments, and PI &ldquo;values&rdquo; in the document, after any
property of an <var>XmlDoc</var>, and it is also shown in the following two tables. </p>
entity and character references have been converted to characters.
<p>
 
In both tables below,
For further discussion and examples, see the ReplaceUnicode discussion
all of the values of <code>encoding</code> in the XML declaration
in the &ldquo;Usage Notes&rdquo; [[??]] reftxt=* refid=unreplu..
must be specified in all-uppercase letters, and
 
<var class="term">n</var> is a digit from 1 to 9 in the <code>encoding</code> value
For more information about the UNICODE command, see [[??]] refid=ucmd..
<code>ISO-8859-<var class="term">n</var></code>: </p>
 
:noteh.
<table>
<li><b>WspNewline</b>
<tr class="head"><th>Input bytestream
 
</th><th>XML declaration
This option is designed to remove any whitespace inserted
</th></tr>
to make the structure of an XML document easier (for a person) to read.
<tt>WspNewline</tt> removes the leading or trailing whitespace
in the value of a Text node, if the whitespace sequence contains a
newline (carriage return or linefeed) character.
'''Note:'''
This handling, the default whitespace option for this method, applies to the
&ldquo;physical value&rdquo; of the representation of a Text node.
In particular, markup such as a character reference (even of whitespace,
for example, <tt>&amp;#32;</tt>), a
CDATA section, or any non-whitespace character
delimits leading or trailing whitespace and is not affected.


See [[??]] refid=unwhite. for more information.
<tr><td>ASCII codes below X'80'
<li><b>WspToken</b>
</td><td>UTF-8, ISO-8859-<var class="term">n</var>, or <var class="term">none</var>
</td></tr>


Whitespace in Element content
<tr><td>ASCII codes up to X'FF'
is normalized using the XPath <tt>normalize()</tt> function (leading and trailing
</td><td>ISO-8859-<var class="term">n</var>
whitespace removed, intermediate strings of whitespace replaced by
</td></tr>
a single blank character).
<tt>WspToken</tt> is a good substitute for WspNewline to remove leading
and trailing whitespace
in cases where blanks (or tabs) and not line-end characters were used
to make the document structure more readable &mdash; if
it is tolerable to collapse intermediate whitespace
sequences to single space characters.


See [[??]] refid=unwhite. for more information.
<tr><td>UTF-8 (with characters above X'7F')
<li><b>WspPreserve</b>
</td><td>UTF-8 or <var class="term">none</var>
</td></tr>


All whitespace characters in Element content are preserved
<tr><td>UTF-16
(after end-of-line normalization,
</td><td>UTF-16 or <var class="term">none</var>
as described in [[??]] refid=unwhite.).
</td></tr>


Wsp_Preserve is a synonym for WspPreserve.
<tr><td>EBCDIC
</ul>
</td><td>UTF-8, ISO-8859-<var class="term">n</var>, or <var class="term">none</var>
</dl>
</td></tr>


===Exceptions===
<tr><td><var>Unicode</var> (SOUL)
</td><td>UTF-8, ISO-8859-<var class="term">n</var>, or <var class="term">none</var>
</td></tr></table>
<table>
<tr class="head"><th>XML declaration
</th><th>Input bytestream
</th></tr>


This function can throw the following exception:
<tr><td>UTF-8
<dl>
</td><td>UTF-8 (which includes ASCII codes below X'80'), EBCDIC, or Unicode
<dt>XmlParseError
</td></tr>
<dd>If the method encounters a parsing error,
properties of the exception object may indicate the location and type of problem.
See [[??]] refid=xmlpars..
</dl>
===Usage Notes===
 
<ul>
<li>As of ''Sirius Mods'' version 7.5,
<tt>version="1.1"</tt> is accepted in the input to be deserialized.
Formerly, only <tt>1.0</tt> was accepted.
<li>None of the ''options'' terms may be specified twice.
<li>The ''options'' terms may be specified in any case.
For example, you can use
<tt>WspPreserve</tt> and <tt>wsppreserve</tt>, interchangeably.
<li>If the LoadXml method object is an XmlDoc or a root XmlNode,
LoadXml will accept any of the input character sets specified below.


The correspondence between these input character sets and the
<tr><td>UTF-16
value of <tt>encoding</tt> in the XML declaration is explained
</td><td>UTF-16 (including a two byte order mark bytes as a preamble to the XML document input stream)
in the description of the <tt>Encoding</tt> property of an XmlDoc,
</td></tr>
under its [[??]] refid=enctyp., and is also shown in the following two tables.


In both tables below,
<tr><td>ISO-8859-<var class="term">n</var>
all of the values of <tt>encoding</tt> in the XML declaration
</td><td>ASCII codes up to X'FF', EBCDIC, or Unicode
must be specified in all-uppercase letters, and
</td></tr>
<i>n</i> is a digit from 1 to 9 in the <tt>encoding</tt> value
ISO-8859-<i>n</i>:
<dl>
<dt>Input bytestream
<dd>XML declaration
<dt>ASCII codes below X'80'
<dd>UTF-8, ISO-8859-<i>n</i>, or <i>none</i>
<dt>ASCII codes up to X'FF'
<dd>ISO-8859-<i>n</i>
<dt>UTF-8 (with characters above X'7F')
<dd>UTF-8 or <i>none</i>
<dt>UTF-16
<dd>UTF-16 or <i>none</i>
<dt>EBCDIC
<dd>UTF-8, ISO-8859-<i>n</i>, or <i>none</i>
<dt>Unicode (User Language)
<dd>UTF-8, ISO-8859-<i>n</i>, or <i>none</i>
</dl>
<dl>
<dt>XML declaration
<dd>Input bytestream
<dt>UTF-8
<dd>UTF-8 (which includes ASCII codes below X'80'), EBCDIC, or Unicode
<dt>UTF-16
<dd>UTF-16 (including a two byte order mark bytes as a preamble to the XML document
input stream)
<dt>ISO-8859-<i><b>n</b></i>
<dd>ASCII codes up to X'FF', EBCDIC, or Unicode
<dt><i><b>none</b></i>
<dd>UTF-8, UTF-16, ASCII, EBCDIC, or Unicode
</dl>


In certain LoadXml error cases (for example, an input containing an
<tr><td><var class="term">none</var>
ASCII code above X'7F' without an XML declaration containing ISO-8859-<i>n</i>),
</td><td>UTF-8, UTF-16, ASCII, EBCDIC, or Unicode
</td></tr></table>
In certain <var>LoadXml</var> error cases (for example, an input containing an
ASCII code above <code>X'7F'</code> without an XML declaration containing ISO-8859-<var class="term">n</var>),
a group of error messages is issued to display:
a group of error messages is issued to display:
<ul>
<ul>
Line 270: Line 163:
<li>Additional lines that display the input byte halves in base 16
<li>Additional lines that display the input byte halves in base 16
</ul>
</ul>
 
For example,
For example,
the following messages are output after an 8-bit ASCII input
the following messages are output after an 8-bit ASCII input
fails because no accompanying ISO-8859-''n'' encoding was specified:
fails because no accompanying ISO-8859-''n'' encoding was specified:
<pre>
<p class="code">MSIR.0668: XML doc parse error: invalid first byte of
    MSIR.0668: XML doc parse error: invalid first byte of
  UTF-8 encoding near or before position 14
                UTF-8 encoding near or before position 14
MSIR.0708: C: ????????????????
    MSIR.0708: C: ????????????????
MSIR.0708: E: copyright(?)</A> (ASCII to EBCDIC)
    MSIR.0708: E: copyright(?)</A> (ASCII to EBCDIC)
MSIR.0708: X: 6677766672A23243
    MSIR.0708: X: 6677766672A23243
MSIR.0708: X: 3F0929784899CF1E
    MSIR.0708: X: 3F0929784899CF1E
MSIR.0665:              |
    MSIR.0665:              |
</p>
</pre>
 
<li>If the LoadXml method object is a non-root XmlNode, LoadXml accepts
<li>If the <var>LoadXml</var> method object is a non-root <var>XmlNode</var>, <var>LoadXml</var> accepts a Unicode string or a bytestream it treats as EBCDIC.
a Unicode string or a bytestream it treats as EBCDIC.
 
<li>As described in [[??]] refid=u80., serializing with LoadXml
<li>As described in [[Unicode#Support for the ASCII subset of Unicode|Support for the ASCII subset of Unicode]], serializing with <var>LoadXml</var>
may require translation of the input document using the Unicode tables.
may require translation of the input document using the <var>Unicode</var> tables.
This depends on the version of the ''Sirius Mods'' (that is, whether XmlDocs
This depends on the version of the <var class="product">Sirius Mods</var> (that is, whether <var>XmlDoc</var>s are maintained in EBCDIC or <var>Unicode</var>) and on which
are maintained in EBCDIC or Unicode) and on which
of the input character sets described above is used.
of the input character sets described above is used.
<li>An XML fragment '''does not''' provide for inserting an Attribute
 
into an Element node.
<li><var>LoadXml</var> '''does not''' provide for inserting an Attribute
For example, the following does not achieve it:
into an Element node by using an XML fragment.
<pre>
For example, the following '''does not''' achieve it:
    %d Object XmlDoc Auto New
<p class="code">%d Object XmlDoc Auto New
    %n Object XmlNode
%n Object XmlNode
    %n = %d:AddElement('top')
%n = %d:AddElement('top')
    %n:LoadXml('foo="bar"')
%n:LoadXml('foo="bar"')
    %d:Print
%d:Print
</pre>
</p>
The input to LoadXml above is simply stored as the character content
The input to <var>LoadXml</var> above is simply stored as the character content
of the Element containing the fragment, so the result is:
of the Element containing the fragment, so the result is:
<pre>
<p class="output"><top>foo="bar"</top>
    <top>foo="bar"</top>
</p>
</pre>
<p>
To add the Attribute <code>foo</code> with the value <code>bar</code>,
replace <var>LoadXml</var> in the example above with the <var>AddAttribute</var> updating method:
<code>%n:[[AddAttribute (XmlNode function)|AddAttribute]]('foo', 'bar')</code>. </p>
This produces the result:
<p class="output"><top foo="bar"/>  </p>
 
Another situation where an updating method does what <var>LoadXml</var> does not is controlling the formatting of empty XML elements. This is described in [[XmlDoc API#Information form and content|Information form and content]].


To add the Attribute <tt>foo</tt> with the value &ldquo;<tt>bar</tt>&rdquo;,
<li>If the method object refers to the <var>Root</var> node of an <var>XmlDoc</var>, the
replace LoadXml in the example above with the
<var>LoadXml</var> method in the <var>XmlNode</var> class behaves
[[AddAttribute (XmlNode function)|AddAttribute]] method.
exactly as the <var>LoadXml</var> method in the <var>XmlDoc</var> class.
<br>
<li>If the method object refers to the Root node of an XmlDoc, the
LoadXml method in the XmlNode class behaves
exactly as the LoadXml method in the XmlDoc class.
For example:
For example:
<pre>
<p class="code">%d Object XmlDoc Auto New
    %d Object XmlDoc Auto New
%n Object XmlNode
    %n Object XmlNode
%n = %d:SelectSingleNode
    %n = %d:SelectSingleNode
%n:LoadXml('<?xml version="1.0"?><top><inner/></top>')
    %n:LoadXml('<?xml version="1.0"?><top><inner/></top>')
</p>
</pre>
When the <var>Root</var> node is the method object, the serialized input must
When the Root node is the method object, the serialized input must
be a legal XML document (for example, the <var>XmlDoc</var> must be <var>Empty</var>, and the
be a legal XML document (for example, the XmlDoc must be Empty, and the
serialized input must contain exactly one top-level element).
serialized input must contain exactly one top-level element).
<li>Whitespace handling
</ul>
 
===Whitespace handling===
<ul>
<ul>
<li>The &ldquo;Wsp&rdquo; whitespace-handling ''options''
<li>The "Wsp" whitespace-handling <var class="term">options</var>
(WspPreserve, WspNewline, and WspToken) and the CrPreserve whitespace option
(<var>WspPreserve</var>, <var>WspNewline</var>, and <var>WspToken</var>) and the <var>CrPreserve</var> whitespace option
are mutually exclusive; if none of them is specified, WspNewline is in effect.
are mutually exclusive; if none of them is specified, <var>WspNewline</var> is in effect.
Although the LinefeedNoTrailingTabs option is also concerned with whitespace,
Although the <var>LinefeedNoTrailingTabs</var> option is also concerned with whitespace,
it is distinct from, yet compatible with, any of the three &ldquo;Wsp&rdquo; options,
it is distinct from, yet compatible with, any of the three "Wsp" options,
but it is not compatible with the CrPreserve option.
but it is not compatible with the <var>CrPreserve</var> option.
<li>Except for CrPreserve, the whitespace-handling ''options'' are
 
applied after the XML standard whitespace conversions that [[Janus SOAP]] applies
<li>Except for <var>CrPreserve</var>, the whitespace-handling <var class="term">options</var> are applied after the XML standard whitespace conversions that [[Janus SOAP]] applies
in all other cases.
in all other cases.
As described in [[??]] refid=nornl., the standard specifies that
As described in the [[XML_processing_in_Janus_SOAP#Normalized_line-end|Normalized line-end]],
'''all''' carriage return/linefeed
the standard specifies that '''all''' carriage return/linefeed
sequences and carriage return sequences are to be converted to linefeeds
sequences and carriage return sequences are to be converted to linefeeds
when deserializing.
when deserializing.
Using the CrPreserve option bypasses this rule.
Using the <var>CrPreserve</var> option bypasses this rule.
<li>The whitespace-handling ''options'' do '''no''' whitespace
 
<li>The whitespace-handling <var class="term">options</var> do '''no''' whitespace
conversion (beyond the XML standard conversions) on Element content that is
conversion (beyond the XML standard conversions) on Element content that is
&ldquo;protected&rdquo; by the <tt>xml:space="preserve"</tt> attribute.
"protected" by the <code>xml:space="preserve"</code> attribute.
 
&ldquo;Protected&rdquo; by the xml:space="preserve" attribute
"Protected" by the <code>xml:space="preserve"</code> attribute
means an element <i><b>E</b></i> that either:
means an element <code><nowiki><b>E</b></nowiki></code> that either:
<ul>
<ul>
<li>has the <tt>xml:space</tt> attribute with the value <tt>preserve</tt>
<li>has the <code>xml:space</code> attribute with the value <code>preserve</code>
<li>is contained in
<li>is contained in
an element <i><b>A</b></i> with that attribute and value, and there is no
an element <code><nowiki><b>A</b></nowiki></code> with that attribute and value, and there is no element that is a descendent of <code><nowiki><b>A</b></nowiki></code> and an ancestor of <code><nowiki><b>E</b></nowiki></code> with the <code>xml:space</code> attribute with the value <code>default</code>
element that is a descendent of <i><b>A</b></i> and an ancestor of <i><b>E</b></i>
with the <tt>xml:space</tt> attribute with the value <tt>default</tt>
</ul>
</ul>
 
Elements that are
Elements that are
not protected by the <tt>xml:space="preserve"</tt> attribute
not protected by the <code>xml:space="preserve"</code> attribute
have whitespace handled
have whitespace handled
according to the option in effect for the deserialization.
according to the option in effect for the deserialization.
<li>There is no whitespace normalization
<li>There is no whitespace normalization
comparable to the LoadXml whitespace-handling ''options''
comparable to the <var>LoadXml</var> whitespace-handling <var class="term">options</var>
for the Add and
for the Add and
Insert...Before functions that create a Text node
Insert...Before functions that create a <var>Text</var> node
([[AddElement (XmlDoc/XmlNode function)|AddElement]],
(<var>[[AddElement (XmlDoc/XmlNode function)|AddElement]]</var>,
[[InsertElementBefore (XmlNode function)|InsertElementBefore]],
<var>[[InsertElementBefore (XmlNode function)|InsertElementBefore]]</var>,
[[AddText (XmlNode function)|AddText]], and
<var>[[AddText (XmlNode function)|AddText]]</var>, and
[[InsertTextBefore (XmlNode function)|InsertTextBefore]]).
<var>[[InsertTextBefore (XmlNode function)|InsertTextBefore]]</var>).
 
<li>Whitespace normalization applies to the characters in the input
<li>Whitespace normalization applies to the characters in the input
serialized string, not to the values after entity substitution.
serialized string, not to the values after entity substitution.
In [[??]] refid=exload.,
See [[#xmp4|Example 4]], below.
see the fourth example (which contains <tt>&amp;#x09;</tt>).
 
<li>If ''input'' is a Stringlist, LoadXml inserts a linefeed character
<li>If <var class="term">input</var> is a <var>Stringlist</var>, <var>LoadXml</var> inserts a linefeed character
after each item in the Stringlist as part of concatenation prior to
after each item in the <var>Stringlist</var> as part of concatenation prior to
deserialization.
deserialization.
The linefeed is then subject to the method's whitespace handling options,
The linefeed is then subject to the method's whitespace handling options,
so it is usually removed (as leading or trailing whitespace).
so it is usually removed (as leading or trailing whitespace).
<li>Using WspNewline or WspToken
 
reduces the space consumed by individual Text nodes,
<li>Using <var>WspNewline</var> or <var>WspToken</var>
reduces the space consumed by individual <var>Text</var> nodes,
and in some cases
and in some cases
collapses all whitespace content between markup to the null
collapses all whitespace content between markup to the null
string, so it is not stored as a Text node.
string, so it is not stored as a <var>Text</var> node.
This reduces the storage required by the XmlDoc, speeds up
This reduces the storage required by the <var>XmlDoc</var>, speeds up
XPath and node access processing, and makes the output of, say, the
XPath and node access processing, and makes the output of, say, the
Print subroutine easier to read.
<var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var> subroutine easier to read.
<li>The LinefeedNoTrailingTabs option only affects Text nodes that contain
 
<li>The <var>LinefeedNoTrailingTabs</var> option only affects Text nodes that contain
an initial line-end character followed by any number of tabs and nothing else.
an initial line-end character followed by any number of tabs and nothing else.
The LinefeedNoTrailingTabs effect on such a Text node,
The <var>LinefeedNoTrailingTabs</var> effect on such a Text node,
whether it is specified with or without any of the &ldquo;Wsp&rdquo; options,
whether it is specified with or without any of the "Wsp" options,
is to store the value of the node as a single line-end character.
is to store the value of the node as a single line-end character.
 
<p>
One example of the use of the LinefeedNoTrailingTabs option is
One example of the use of the <var>LinefeedNoTrailingTabs</var> option is
an input XML document to be deserialized for which
an input XML document to be deserialized for which
both of the following are true:
both of the following are true: </p>
<ul>
<ul>
<li>A digital signature is needed of a subtree in the document.
<li>A digital signature is needed of a subtree in the document.
Line 397: Line 296:
for the signature.
for the signature.
</ul>
</ul>
 
<p>
For information about exclusive canonicalization,
For information about exclusive canonicalization,
serialization expressly designed for digital signatures,
serialization expressly designed for digital signatures,
see [[Serial (XmlDoc/XmlNode function)|Serial]].
see the <var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var> function. </p>
</ul>
</ul>
<br>
<li>Deserializing Unicode strings
===Deserializing Unicode strings===
<ul>
The <var>LoadXml</var> <var>AllowUntranslatable</var> option
<li>The LoadXml <tt>AllowUntranslatable</tt> option
lets you deserialize Unicode strings that contain characters
lets you deserialize Unicode strings that contain characters
that are not translatable to EBCDIC.
that are not translatable to EBCDIC.
For example, LoadXml accepts the Unicode trademark character (U+2122)
For example, <var>LoadXml</var> accepts the Unicode trademark character (U+2122)
only if you specify <tt>allowUntranslatable</tt>, as in the following.
only if you specify <var>AllowUntranslatable</var>, as in the following.
The U function below is described [[??]] refid=umeth..
<p class="code">%u is unicode Initial('&amp;#x2122;':[[U (String function)|U]])
<pre>
%nod:LoadXml(%u, 'AllowUntranslatable')
    %u is unicode Initial('&amp;#x2122;':U)
</p>
    %nod:LoadXml(%u, 'allowUntranslatable')
</pre>
If you remove <var>AllowUntranslatable</var>, this <var>LoadXml</var> statement fails,
 
If you remove <tt>allowUntranslatable</tt>, this LoadXml statement fails,
because the Unicode trademark character does not translate to an EBCDIC character.
because the Unicode trademark character does not translate to an EBCDIC character.
By default, the method detects any untranslatable characters in the serialized
By default, the method detects any untranslatable characters in the serialized
input XML document; it also throws an XmlParseError exception
input XML document; it also throws an <var>[[XmlParseError class|XmlParseError]]</var> exception
([[??]] refid=xmlpars.) with reason <tt>UntranslatableUnicode</tt>
with reason <var>UntranslatableUnicode</var>
(unless the <tt>ErrRet</tt> option is specified).
(unless the <var>ErrRet</var> option is specified).
 
This default detection of non-translatable characters may suit your purposes.
This default detection of non-translatable characters may suit your purposes.
That is, it ensures that subsequent
That is, it ensures that subsequent
Line 428: Line 324:
without any Unicode to EBCDIC translation errors.
without any Unicode to EBCDIC translation errors.
For example:
For example:
<pre>
<p class="code">%doc:LoadXml
    %doc:LoadXml
...
    ...
%val Longstring
    %val Longstring
%val = %doc:Value(%xpath)
    %val = %doc:Value(%xpath)
</p>
</pre>
 
Although the <var>[[Value (XmlDoc/XmlNode property)|Value]]</var> property returns a <var>Unicode</var> string,
Although the Value method returns a Unicode string,
the assignment to the EBCDIC string <code>%val</code> will not fail due
the assignment to the EBCDIC string <tt>%val</tt> will not fail due
to a Unicode translation problem: if there is any untranslatable
to a Unicode translation problem: if there is any untranslatable
Unicode (including, of course, strings in the XML document which your
Unicode (including, of course, strings in the XML document which your
application never accesses), the LoadXml operation fails.
application never accesses), the <var>LoadXml</var> operation fails.
 
If you use AllowUntranslatable,
If you use <var>AllowUntranslatable</var>,
all Unicode characters in a serialized input XML document are
all Unicode characters in a serialized input XML document are
allowed and stored in the XmlDoc.
allowed and stored in the <var>XmlDoc</var>.
Your stored data may contain content that is not translatable to
Your stored data may contain content that is not translatable to
EBCDIC, however.
EBCDIC, however.
A subsequent attempt to access such content that
A subsequent attempt to access such content that
performs Unicode to EBCDIC translation (like the Value method statement above)
performs Unicode to EBCDIC translation (like the <var>Value</var> property statement above)
might cause request cancellation.
might cause request cancellation.
 
You should therefore use AllowUntranslatable
You should therefore use <var>AllowUntranslatable</var>
only if there is also a check for translatability
only if there is also a check for translatability
when parts of the XmlDoc that may
when parts of the <var>XmlDoc</var> that may
have non-translatable Unicode content are accessed.
have non-translatable Unicode content are accessed.
The code below, for example,
The code below, for example,
shows a way to get the benefit of specifying AllowUntranslatable
shows a way to get the benefit of specifying <var>AllowUntranslatable</var>
while limiting the risk of request cancellation.
while limiting the risk of request cancellation.
 
In the following example, it is believed that only the
In the following example, it is believed that only the
element <tt>comments</tt> might
element <code>comments</code> might
contain untranslatable Unicode among all the data accessed from the XML document:
contain untranslatable Unicode among all the data accessed from the XML document:
<pre>
<p class="code">%resp:LoadXml(%doc, 'AllowUntranslatable')
    %resp:LoadXml(%doc, 'AllowUntranslatable')
...
    ...
%uVal Unicode
    %uVal Unicode
%val Longstring
    %val Longstring
%uVal = %node:Value('comments')
Try %val = %uVal:UnicodeToEbcdic
Catch CharacterTranslationException
  %val = %uVal:UnicodeToEbcdic(CharacterEncode=True)
  Print 'Untranslatable Unicode, character encoded:' -
      And %val
End Try
</p>
<blockquote class="note">
<p>'''Note:'''
<var>Unicode</var> values, untranslatable or not, are always allowed
when they are added to an <var>XmlDoc</var> using one of the "Add" or "Insert" methods that
"directly store" into an <var>XmlDoc</var>.
For example, the following fragment adds an <var>Element</var> node with
a value that is the Unicode trademark sign: </p>
<p class="code">%node:AddElement('notation', '&amp;#x2122;':U)
</p>
</blockquote>


    %uVal = %node:Value('comments')
<blockquote class="note">
    Try %val = %uVal:UnicodeToEbcdic
<p>'''Note:'''
    Catch CharacterTranslationException
<var>LoadXml</var> accepts input that contains XML hex character references
      %val = %uVal:UnicodeToEbcdic(CharacterEncode=True)
      Print 'Untranslatable Unicode, character encoded:' -
        And %val
    End Try
</pre>
'''Note:'''
Unicode values, untranslatable or not, are always allowed
when they are added to an XmlDoc using one of the Add or Insert methods which
&ldquo;directly store&rdquo; into an XmlDoc.
For example, the following fragment adds an Element node with
a value that is the Unicode trademark sign:
<pre>
    %node:AddElement('notation', '&amp;#x2122;':U)
</pre>
'''Note:'''
LoadXml accepts input that contains XML hex character references
but not input that contains XHTML entity references (other than
but not input that contains XHTML entity references (other than
the five predefined entities described in [[??]] refid=entrefs.).
the five predefined entities described in [[XML processing in Janus SOAP#Entity references|"Entity references"]].
For example, this statement successfully loads a copyright character:
For example, this statement successfully loads a copyright character: </p>
<pre>
<p class="code">%d:LoadXml('<a>&amp;#xA9;</a>')
    %d:LoadXml('<a>&amp;#xA9;</a>')
</p>
</pre>
 
And this statement successfully loads a greater-than sign:
And this statement successfully loads a greater-than sign:
<pre>
<p class="code">%d:LoadXml('<a>&amp;gt;</a>')
    %d:LoadXml('<a>&amp;gt;</a>')
</p>
</pre>
 
But this statement with a copyright character entity fails:
But this statement with a copyright character entity fails:
<pre>
<p class="code">%d:LoadXml('<a>&amp;copy;</a>')
    %d:LoadXml('<a>&amp;copy;</a>')
</p>
</pre>
 
You can load a copyright character, however, if you
You can load a copyright character, however, if you
decode the reference and convert to Unicode before the deserialization.
decode the reference and convert to <var>Unicode</var> before the deserialization.
For example:
For example:
<pre>
<p class="code">%d:LoadXml('<a>&amp;copy;</a>':U)
    %d:LoadXml('<a>&amp;copy;</a>':U)
</p>
</pre>
</blockquote>
For more information about working with Unicode characters, see [[XmlDoc API#Strings and Unicode with the XmlDoc API|Strings and Unicode with the XmlDoc API]].


 
===<b id="ucdRep"></b>Using the ReplaceUnicode option===
For more information about working with Unicode characters,
The <var>ReplaceUnicode</var> option lets you replace certain Unicode input characters with those characters you have explicitly specified (by <var>UNICODE</var> commands in your site's <var class="product">Model 204</var> CCAIN stream).
see [[Strings and Unicode]].
<li>The ReplaceUnicode option lets you replace certain
For example, assume the following command is in CCAIN:  
Unicode input characters
<p class="code">UNICODE Table Standard Rep U=2122 '(tm)'
with those characters you have explicitly specified (by UNICODE commands
</p>
in your site's ''Model 204'' CCAIN stream).
Given the above command, the <var>ReplaceUnicode</var> option for <var>LoadXml</var> is shown in the following fragment:
 
<p class="code">%u unicode initial('<a>')
For example, assume the following command is in CCAIN:
%u = %u:[[UnicodeWith_(Unicode_function)|unicodeWith]]('2122':X:[[Utf16ToUnicode_(String_function)|utf16ToUnicode]])
<pre>
%u = %u:unicodeWith('</a>':U)
    UNICODE Table Standard Rep U=2122 '(tm)'
%d:loadXml(%u, 'ReplaceUnicode')
</pre>
%d:print
Given the above command, the ReplaceUnicode option for LoadXml is shown
</p>
in the following fragment (the UnicodeWith method is
described [[??]] reftxt=* refid=uniwith., and
Utf16ToUnicode [[??]] reftxt=* refid=fu162u.):
<pre>
    %u Unicode Initial('<a>')
    %u = %u:UnicodeWith('2122':X:Utf16ToUnicode)
    %u = %u:UnicodeWith('</a>':U)
    %d:LoadXml(%u, 'ReplaceUnicode')
    %d:Print
</pre>
The result is:
The result is:
<pre>
<p class="output"><a>(tm)</a>
    <a>(tm)</a>
</p>
</pre>
In the preceeding example, the stream of input characters to <var>LoadXml</var> contains the Unicode character U+2122. Since the <var>ReplaceUnicode</var> option applies to both the stream of input characters and to the character value of character references, consider the following fragment (assuming the same CCAIN line as above):
 
<p class="code">%d:LoadXml('<a>&amp;#x2122;</a>', 'ReplaceUnicode')
In the preceeding example, the stream of input characters to LoadXml
</p>
contains the Unicode character U+2122.
Since the ReplaceUnicode option applies to both the stream of input characters
and to the character value of character references, consider the following fragment
(assuming the same CCAIN line as above):
<pre>
    %d:LoadXml('<a>&amp;#x2122;</a>', 'ReplaceUnicode')
</pre>
The result is also:
The result is also:
<p class="output"><a>(tm)</a>
</p>
In this case, <code>U+2122</code> does not occur in the input character stream, but it is the value of the character reference.
<blockquote class="note">
<p><b>Notes:</b> </p>
<ul>
<li>It is an error to be processing a replacement string within a character reference.  For example, assume the following two lines are in CCAIN:
<p class="code">UNICODE Table Standard Rep U=00B2 '2'
</p>
Given the above command, the following fragment gets a parse error, because the replacement string is being used as part of a character reference:
<p class="code">%d:LoadXml('<a>&amp;#x' With '&amp;#xB2;':U With ';</a>', 'ReplaceUnicode')
</p>
As a consequence of this rule, a replacement string should not contain an ampersand character (assuming that the <var>ReplaceUnicode</var> option will be used).


<pre>
<li>Replacement of a Unicode character due to the <var>ReplaceUnicode</var> option is only done while processing names and values in the XML document.  It is an error if the end of the name or value occurs and the replacement string has not been exhausted. In other words (again assuming that the <var>ReplaceUnicode</var> option will be used), a replacement string should not have "XML markup" that might end a string, such as a quotation mark or a left angle bracket (<tt><</tt>).  For example, assume the following line is in CCAIN:
    <a>(tm)</a>
<p class="code">UNICODE Table Standard Rep U=2122 '(trademark)<tm>'
</pre>
</p>
Given the above command, the following fragment gets a parsing error, because the "less than" character (<TT><</tt>) that is encountered in the replacement string ends the element content:
<p class="code">%d:LoadXml('<a>&amp;#x2122;</a>':U, 'ReplaceUnicode')
</p>


In this case, U+2122 does not occur in the input character stream, but it is
<li>If a parsing error occurs after processing a Unicode character that has been replaced, the error display of the input stream will contain the replacement string, and the replaced character will not be displayed. However, if the character being replaced was introduced as a character reference, the character reference remains in the display of the input stream.
the value of the character reference.
 
Notes:
 
<ul>
<li>It is an error to be processing a replacement string within a character
reference.
For example, assume the following two lines are in CCAIN:
<pre>
    UNICODE Table Standard Rep U=00B2 '2'
</pre>
Given the above command, the following fragment gets a parse error,
because the replacement string is being used as part of a
character reference:
<pre>
    %d:LoadXml('<a>&amp;#x' With '&amp;#xB2;':U With ';</a>', -
      'ReplaceUnicode')
</pre>
As a consequence of this rule, a replacement string should not
contain an ampersand character (assuming that the ReplaceUnicode option will
be used).
<li>Replacement of a Unicode character due to the
ReplaceUnicode option is only done while processing names and
values in the XML document.
It is an error if the end of the
name or value occurs and the replacement string has not been exhausted.
In other words (again assuming that the ReplaceUnicode option will be
used), a replacement string should not have &ldquo;XML markup&rdquo; that might
end a string, such as a quotation mark or a left angle bracket (<tt><</tt>).
For example, assume the following line is in CCAIN:
<pre>
    UNICODE Table Standard Rep U=2122 '(trademark)<tm>'
</pre>
Given the above command, the following fragment gets a parsing error,
because the '<' that is encountered in the replacement string
ends the element content:
<pre>
    %d:LoadXml('<a>&amp;#x2122;</a>':U, 'ReplaceUnicode')
</pre>
<li>If a parsing error occurs after processing a Unicode character that
has been replaced, the error display of the input stream will contain the
replacement string, and the replaced character will not be displayed.
However, if the character being replaced was introduced as a character
reference, the character reference remains in the display of the input
stream.
</ul>
</ul>
</ul>
</ul>
===Examples===
</blockquote>
<ul>
<li>The following code creates the XmlDoc representation of the
indicated XML document:
<pre>
    %d Object XmlDoc
    %d = New
    %d:LoadXml('<zen>The Buddha dog says</zen>')
</pre>
<li>The following code
creates an XML document as plain text for a test (or for some other
application):
<pre>
    %d Object XmlDoc
    %d = New
    %sl Object Stringlist
    %sl = New
    Text to %sl
      <test>
          <test2>
            supercalifragilisticexpailodocious
          </test2>
      </test>
    End Text
    %d:LoadXml(%sl)
</pre>
<li>
The following code calls a subroutine which uses the ErrRet option:
<pre>
    %d Object XmlDoc
    %d = New
    %s Longstring
    ... setup the (serialized) document in %S
    %d = New
    Call IntoXML(%d, %s)
    ... do interesting things with the XmlDoc
    ... setup another document in %S
    Call IntoXML(%d, %s)
    ...
    Subroutine IntoXML(%d Object XmlDoc, %S Longstring)
    If %d:LoadXml(%S, 'ErrRet') Then
      ... error handling code ...
    End If
    End Subroutine
</pre>
<li>As stated for the <i>options</i> argument in the LoadXml syntax,
whitespace normalization applies to the characters in the input
serialized string, not the values after entity substitution.
Therefore the values of elements
&ldquo;foo1&rdquo; and &ldquo;foo2&rdquo; created by the following two
LoadXml invocations are different:
<pre>
    %t = $X2C('05')


    %d:LoadXml('<foo1>' With %t With %t With '</foo1>',  -
==Examples==
                'wsptoken')
<ol>
<li>The following code creates the <var>XmlDoc</var> representation of the indicated XML document:
<p class="code">%d object xmlDoc
%d = new
%d:loadXml('<zen>The Buddha dog says</zen>')
</p>


    %d2:LoadXml('<foo2>&amp;#x09;&amp;#x09;' With '</foo2>',      -
<li>The following code creates an XML document as plain text for a test (or for some other application):
                'wsptoken')
<p class="code">%d object xmlDoc
</pre>
%d = new
<li>In the following fragment, element <tt>a</tt>, which contains
%sl object stringlist
leading and intermediate whitespace,
%sl = new
is deserialized with the WspToken option, then printed:
text to %sl
<pre>
<test>
    ...
  <test2>
    %d Object XmlDoc auto new
      supercalifragilisticexpailodocious
    %le string len 16
  </test2>
    %le = $X2C('0D25')
</test>
    %d:LoadXml('<a foo="  bar  "  >   x' With %le      -
end text
                With 'y</a>', 'WspToken')
%d:loadXml(%sl)
    Print %d:Serial('.', 'EBCDIC')
</p>
    ...
</pre>


The result shows that WspToken removes the leading whitespace &mdash;
<li>The following code calls a subroutine which uses the <var>ErrRet</var> option:
in the Element content, not in the Attribute value &mdash;
<p class="code">%d object xmlDoc
and replaces the intermediate linefeed (the initial carriage return
%d = new
removed as the XML standard normalization dictates)
%s longstring
with a single blank:
... setup the (serialized) document in %s
<pre>
%d = new
    <a foo="  bar  ">x y</a>
call intoXML(%d, %s)
</pre>
... do interesting things with the XmlDoc
... setup another document in %S
call intoXML(%d, %s)
...
subroutine intoXML(%d object xmlDoc, %s longstring)
if %d:loadXml(%s, 'ErrRet') then
  ... error handling code ...
end if
end subroutine
</p>


If WspNewline is used instead, the leading whitespace remains
<div id="xmp4"></div>
(it contains no line-end characters), and the intermediate
<li>As stated for the <var class="term">options</var> argument in the <var>LoadXml</var> syntax, whitespace normalization applies to the characters in the input serialized string, not the values after entity substitution.  Therefore the values of elements <code>foo1</code> and <code>foo2</code> created by the following two <var>LoadXml</var> invocations are different:
linefeed (represented below by a question mark) also remains
<p class="code">%t = $X2C('05')
(it is not leading or trailing):
%d:loadXml('<foo1>' With %t With %t with '</foo1>',  -
<pre>
  'wsptoken')
    <a foo="  bar  ">   x?y</a>
%d2:loadXml('<foo2>&amp;#x09;&amp;#x09;' with '</foo2>',      -
</pre>
  'wsptoken')
</p>


In this example, using the WspPreserve option gives the same result as WspNewline:
<li>In the following fragment, element <code>a</code>, which contains leading and intermediate whitespace, is deserialized with the <var>WspToken</var> option, then printed:
no whitespace is removed, except for
<p class="code">...
the initial carriage return due to the XML standard normalization.
%d  object xmlDoc auto new
%le string len 16
%le = $X2C('0D25')
%d:loadXml('<a foo="  bar  "  >  x' with %le      -
  with 'y</a>', 'WspToken')
print %d:serial('.', 'EBCDIC')
...
</p>
The result shows that <var>WspToken</var> removes the leading whitespace &mdash; in the <var>Element</var> content, not in the <var>Attribute</var> value &mdash; and replaces the intermediate linefeed (the initial carriage return removed as the XML standard normalization dictates) with a single blank:
<p class="code"><a foo="  bar  ">x y</a>
</p>
If <var>WspNewline</var> is used instead, the leading whitespace remains (it contains no line-end characters), and the intermediate linefeed (represented below by a question mark) also remains (it is not leading or trailing):
<p class="code"><a foo="  bar  ">  x?y</a>
</p>
In this example, using the <var>WspPreserve</var> option gives the same result as <var>WspNewline</var>: no whitespace is removed, except for the initial carriage return due to the XML standard normalization.
<p>
If the example is changed slightly so the Text node includes only tab characters and a leading line-end character, and <var>LinefeedNoTrailingTabs</var> is specified: </p>
<p class="code">...
%le = $x2c('0D25')
%tb = $x2c('05')
%d:loadXml('<a foo="  bar  "  >' With %le with %tb  -
  with %tb with '</a>', 'LinefeedNoTrailingTabs')
print %d:serial('.', 'EBCDIC')
...
</p>
The resulting <var>Text</var> node contains only the line-end character:
<p class="code"><a foo=" bar ">?</a>
</p>


If the example is changed slightly so the Text node includes only tab
<li>The <var>[[ParseXml (HttpResponse function)|ParseXml]]</var> function in the <var>[[HttpResponse class|HttpResponse]]</var> class has the same options as <var>LoadXml</var>. The following fragment requests, receives, and deserializes an XML document from a Web server:
characters and a leading line-end character, and LinefeedNoTrailingTabs
<p class="code">%httpreq object httpRequest
is specified:
%httpresp object httpResponse
<pre>
%doc object xmlDoc
    ...
%httpreq = new
    %le = $X2C('0D25')
%doc = new
    %tb = $X2C('05')
%httpreq:URL = 'foo.com/bar'
    %d:LoadXml('<a foo="  bar  "  >' With %le With %tb  -
              With %tb With '</a>', 'LinefeedNoTrailingTabs')
%httpresp = %httpreq:Get('HTTP_CLIENT')
    Print %d:Serial('.', 'EBCDIC')
    ...
if %httpresp:ParseXML(%doc, 'ErrRet') then
</pre>
... invalid document received from Web server
 
end If
The resulting Text node contains only the line-end character:
</p>
<pre>
<p class="note">'''Note:'''
    <a foo=" bar ">?</a>
If you use <var>$Sock_Recv</var> and <var>LoadXml</var> directly instead of using an HTTP Helper object, always use the <var>BINARY</var> option of <var>$Sock_Recv</var>, so that <var>LoadXml</var> can recognize the character encoding inherent in the serialized XML document. </p>
</pre>
</ol>
 
<li>The [[Janus Sockets]]R documents the HttpResponse object, whose
ParseXml method has the same options as LoadXml.
The following fragment requests, receives, and
deserializes an XML document from a Web server:
<pre>
    %httpreq object HttpRequest
    %httpresp object HttpResponse
    %doc object XmlDoc
    %httpreq = New
    %doc = New
    %httpreq:URL = 'foo.com/bar'
 
    %httpresp = %httpreq:Get('HTTP_CLIENT')


    If %httpresp:ParseXML(%doc, 'ErrRet') Then
==Request cancellation errors==
      ... invalid document received from Web server
This list is not exhaustive: it does <i>not</i> include all the errors that are request  cancelling.
    End If
</pre>
'''Note:'''
If you use $Sock_Recv and LoadXml directly instead of using an HTTP Helper object,
always use the BINARY option of $Sock_Recv, so that
LoadXml can recognize the character encoding inherent in the
serialized XML document.
</ul>
===Request-Cancellation Errors===
<ul>
<ul>
<li>Method object ''doc'' is not EMPTY.
<li>Method object <var class="term">doc</var> is not <var>EMPTY</var>.
<li><i>Option</i> is invalid.
<li>An <var class="term">options</var> argument is invalid.
<li>Insufficient free space exists in CCATEMP.
<li>Insufficient free space exists in CCATEMP.
<li>A syntax error occurred in
<li>A syntax error occurred in the representation of the XML document in <var class="term">input</var> (this is tolerated if the <var>ErrRet</var> <var class="term">options</var> value is specified).
the representation of the XML document in ''input''
(this is tolerated if the ErrRet option is specified).
</ul>
</ul>
==See also==
<ul>
<li>To deserialize a document (which has been POSTed or PUT) using <var class="product">[[Janus Web Server]]</var>, use <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>. </li>


<li>For other transport APIs, such as <var class="product">Janus Sockets</var> or <var class="product">Model 204</var> MQ Series, <var>LoadXml</var> can be used to deserialize a document that has been received with the transport API.  As mentioned in the example above, <var class="product">Janus Sockets</var> has a convenient <var>[[ParseXml (HttpResponse function)|ParseXml]]</var> method for deserializing an HTTP response. </li>


===See Also===
<li>The function that serializes an <var>XmlDoc</var> as a UTF-8 or EBCDIC string is <var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var>. </li>


<ul>
<li>For more information about normalization, see [[XML processing in Janus SOAP#Normalization during deserialization|Normalization during deserialization]]. </li>
<li>To deserialize a document (which has been POSTed or PUT)
 
using [[Janus Web Server]],
<li>For additional discussion about deserialization, see [[XmlDoc API#Transport: receiving and sending XML|Transport: receiving and sending XML]]. </li>
use [[WebReceive (XmlDoc function)|WebReceive]].
<li>For other transport APIs, such as [[Janus Sockets]] or ''Model 204'' MQ Series,
LoadXml can be used to deserialize a document that has been received with
the transport API.
As mentioned in the example above, [[Janus Sockets]] has a convenient ParseXml
method for deserializing an HTTP response.
<li>The function that serializes an XmlDoc as a UTF-8 or EBCDIC
string is [[Serial (XmlDoc/XmlNode function)|Serial]].
<li>For more information about normalization, see [[??]] refid=normize..
</ul>
</ul>
{{Template:XmlDoc/XmlNode:LoadXml footer}}

Latest revision as of 20:30, 29 November 2018

Deserialize XML document or fragment into XmlDoc Root or into Element XmlNode (XmlDoc and XmlNode classes)

[Requires Janus SOAP]

The LoadXml callable function converts a text string representation of an XML document into an XmlDoc, or a text string representation of an XML fragment into one or more children of an Element XmlNode. This process is called deserialization, because the text representation of a document is called the serial form.

LoadXml returns a zero value if the deserialization is successful; it returns a non-zero value if deserialization is unsuccessful, the ErrRet option is used, and the particular error is tolerated.

Syntax

[%errorPosition =] nr:LoadXml( input, [options]) Throws XmlParseError

Syntax terms

%errorPosition A %variable set to 0 if the deserialization is successful. If ErrRet is one of the options used, %errorPosition is set to the character position within input at which an error is found.
nr An expression that points to the XmlDoc or XmlNode to contain the deserialized representation of the XML document or fragment, respectively.

If an XmlDoc, it must be EMPTY (see XmlDoc states) prior to invoking LoadXml. If an XmlNode that is the root node of an XmlDoc, the XmlDoc must be EMPTY.

input The byte string or Stringlist to be deserialized. If a Stringlist, input consists of the concatenation of the Stringlist items with insertion of line-end characters at the end of each item.

If the nr method object is an XmlDoc or the root node of an XmlDoc, input must be valid as an entire XML document (for example, only one top-level element). If nr is a non-root XmlNode, input must be an XML fragment, that is, a substring of a serialized XML document, such that:

  • The fragment may contain undeclared prefixes. Any such prefixes must have declarations that are in effect at the Element node referred to by the method object of LoadXml. These declarations (along with that Element's default namespace) are inherited by the inserted fragment.
  • In all other respects, the fragment, if "wrapped" within a simple element start tag and end tag (such as <w> and </w>, respectively), is a legal XML document. The fragment can contain leading and/or trailing character content and/or multiple "top-level" elements; all of these become children of the method object XmlNode.
options
Any valid combination of the following terms:
  • AllowUntranslatable
    Allows all valid Unicode strings into the XML document. When this option is not specified, Unicode strings that are not translatable to EBCDIC are disallowed. AllowUntranslatable allows untranslatable Unicode characters, but it does not affect untranslatable EBCDIC characters. As described in Deserializing Unicode strings, it is recommended that you use AllowUntranslatable only if the application checks for translatability when accessing parts of the XmlDoc that may have untranslatable Unicode content. The AllowUntranslatable option is available as of Sirius Mods Version 7.6.
  • CrPreserve
    All whitespace characters in Element content are preserved, including carriage return. Unlike all other deserialization options, with CrPreserve, a carriage return in Element content does not undergo the normalization specified in the XML standard (and described in Normalized line-end).

    CrPreserve is mutually exclusive with the WspNewline, WspToken, and WspPreserve options, and with the LinefeedNoTrailingTabs option.

    The CrPreserve option was added in Sirius Mods Version 7.5, as well as being implemented with a maintenance zap to Sirius Mods Version 7.4.

  • DTDIgnore
    If a <!DOCTYPE ...> clause is present in the document, it should be ignored. In any case, the DTD is not processed. If DTDIgnore is not present, the default behavior is to treat <!DOCTYPE ...> as a syntax error. DTD_Ignore is a synonym for DTDIgnore.
  • ErrRet
    Errors during deserialization are tolerated, the method object is not updated (retains its pre-call state), and the request continues. If ErrRet is not present, any error cancels the request. If ErrRet is present, some errors cancel the request and some are tolerated.

    Note: Errors tolerated when ErrRet is specified are explicitly noted below in Request cancellation errors, with one exception: CCATEMP full conditions always cause a request cancellation.

  • HtmlCharEnt
    Allow the standard XHTML entities for element and attribute content, and convert them to the corresponding Unicode characters. You can find the list of XHTML entities on the Internet at http://www.w3.org/TR/xhtml1/dtds.html#h-A2.
  • LinefeedNoTrailingTabs
    For a Text node that consists of an initial line-end character and one or more tab characters, this option normalizes the content so the result is a single line-end character. The initial line-end (also called "newline") character can be a linefeed character (LF) or a carriage-return (CR) by itself, or a carriage-return followed by a linefeed (CRLF), since (within Text nodes) all of these are normalized by the XML specification into a single line-end character. This option, added in Sirius Mods version 7.0, is compatible with, but takes precedence over, any of the other whitespace-handling options (WspNewline, WspToken, WspPreserve) except CrPreserve. See Whitespace handling below for more information about this option and about whitespace handling.
  • ReplaceUnicode
    Converts Unicode characters using the replacements (if any) specified at your site by updating UNICODE commands that use the Rep subcommand (for example, UNICODE Table Standard Rep U=2122 '(TM)'). The replacement is performed on all names, element and attribute values, comments, and PI "values" in the document, after any entity and character references have been converted to characters. For further discussion and examples, see Using the ReplaceUnicode option, below.
  • WspNewline
    This option is designed to remove any whitespace inserted to make the structure of an XML document easier (for a person) to read. WspNewline removes the leading or trailing whitespace in the value of a Text node, if the whitespace sequence contains a newline (carriage return or linefeed) character.

    Note: This handling, the default whitespace option for this method, applies to the "physical value" of the representation of a Text node. In particular, markup such as a character reference (even of whitespace, for example, &#32;), a CDATA section, or any non-whitespace character delimits leading or trailing whitespace and is not affected. See Whitespace handling below for more information.

  • WspToken
    Whitespace in Element content is normalized using the XPath normalize() function (leading and trailing whitespace removed, intermediate strings of whitespace replaced by a single blank character). WspToken is a good substitute for WspNewline to remove leading and trailing whitespace in cases where blanks (or tabs) and not line-end characters were used to make the document structure more readable — if it is tolerable to collapse intermediate whitespace sequencesto single space characters. See Whitespace handling below for more information.
  • WspPreserve
    All whitespace characters in Element content are preserved (after end-of-line normalization, as described in Whitespace handling below). Wsp_Preserve is a synonym for WspPreserve.

Exceptions

LoadXML can throw the following exception:

XmlParseError
If the method encounters a parsing error, properties of the exception object may indicate the location and type of problem.

Usage notes

  • As of Sirius Mods version 7.5, version="1.1" is accepted in the input to be deserialized. Formerly, only 1.0 was accepted.
  • None of the options terms may be specified twice.
  • The options terms may be specified in any case. For example, you can use WspPreserve and wsppreserve, interchangeably.
  • If the LoadXml method object is an XmlDoc or a root XmlNode, LoadXml will accept any of the input character sets specified below.

    The correspondence between these input character sets and the value of encoding in the XML declaration is explained in the Usage notes for the Encoding]] property of an XmlDoc, and it is also shown in the following two tables.

    In both tables below, all of the values of encoding in the XML declaration must be specified in all-uppercase letters, and n is a digit from 1 to 9 in the encoding value ISO-8859-n:

    Input bytestream XML declaration
    ASCII codes below X'80' UTF-8, ISO-8859-n, or none
    ASCII codes up to X'FF' ISO-8859-n
    UTF-8 (with characters above X'7F') UTF-8 or none
    UTF-16 UTF-16 or none
    EBCDIC UTF-8, ISO-8859-n, or none
    Unicode (SOUL) UTF-8, ISO-8859-n, or none
    XML declaration Input bytestream
    UTF-8 UTF-8 (which includes ASCII codes below X'80'), EBCDIC, or Unicode
    UTF-16 UTF-16 (including a two byte order mark bytes as a preamble to the XML document input stream)
    ISO-8859-n ASCII codes up to X'FF', EBCDIC, or Unicode
    none UTF-8, UTF-16, ASCII, EBCDIC, or Unicode

    In certain LoadXml error cases (for example, an input containing an ASCII code above X'7F' without an XML declaration containing ISO-8859-n), a group of error messages is issued to display:

    • A line that contains the erroneous string as received
    • A line with an ASCII-to-EBCDIC translation of the string
    • Additional lines that display the input byte halves in base 16

    For example, the following messages are output after an 8-bit ASCII input fails because no accompanying ISO-8859-n encoding was specified:

    MSIR.0668: XML doc parse error: invalid first byte of UTF-8 encoding near or before position 14 MSIR.0708: C: ???????????????? MSIR.0708: E: copyright(?)</A> (ASCII to EBCDIC) MSIR.0708: X: 6677766672A23243 MSIR.0708: X: 3F0929784899CF1E MSIR.0665: |

  • If the LoadXml method object is a non-root XmlNode, LoadXml accepts a Unicode string or a bytestream it treats as EBCDIC.
  • As described in Support for the ASCII subset of Unicode, serializing with LoadXml may require translation of the input document using the Unicode tables. This depends on the version of the Sirius Mods (that is, whether XmlDocs are maintained in EBCDIC or Unicode) and on which of the input character sets described above is used.
  • LoadXml does not provide for inserting an Attribute into an Element node by using an XML fragment. For example, the following does not achieve it:

    %d Object XmlDoc Auto New %n Object XmlNode %n = %d:AddElement('top') %n:LoadXml('foo="bar"') %d:Print

    The input to LoadXml above is simply stored as the character content of the Element containing the fragment, so the result is:

    <top>foo="bar"</top>

    To add the Attribute foo with the value bar, replace LoadXml in the example above with the AddAttribute updating method: %n:AddAttribute('foo', 'bar').

    This produces the result:

    <top foo="bar"/>

    Another situation where an updating method does what LoadXml does not is controlling the formatting of empty XML elements. This is described in Information form and content.

  • If the method object refers to the Root node of an XmlDoc, the LoadXml method in the XmlNode class behaves exactly as the LoadXml method in the XmlDoc class. For example:

    %d Object XmlDoc Auto New %n Object XmlNode %n = %d:SelectSingleNode %n:LoadXml('<?xml version="1.0"?><top><inner/></top>')

    When the Root node is the method object, the serialized input must be a legal XML document (for example, the XmlDoc must be Empty, and the serialized input must contain exactly one top-level element).

Whitespace handling

  • The "Wsp" whitespace-handling options (WspPreserve, WspNewline, and WspToken) and the CrPreserve whitespace option are mutually exclusive; if none of them is specified, WspNewline is in effect. Although the LinefeedNoTrailingTabs option is also concerned with whitespace, it is distinct from, yet compatible with, any of the three "Wsp" options, but it is not compatible with the CrPreserve option.
  • Except for CrPreserve, the whitespace-handling options are applied after the XML standard whitespace conversions that Janus SOAP applies in all other cases. As described in the Normalized line-end, the standard specifies that all carriage return/linefeed sequences and carriage return sequences are to be converted to linefeeds when deserializing. Using the CrPreserve option bypasses this rule.
  • The whitespace-handling options do no whitespace conversion (beyond the XML standard conversions) on Element content that is "protected" by the xml:space="preserve" attribute. "Protected" by the xml:space="preserve" attribute means an element <b>E</b> that either:
    • has the xml:space attribute with the value preserve
    • is contained in an element <b>A</b> with that attribute and value, and there is no element that is a descendent of <b>A</b> and an ancestor of <b>E</b> with the xml:space attribute with the value default

    Elements that are not protected by the xml:space="preserve" attribute have whitespace handled according to the option in effect for the deserialization.

  • There is no whitespace normalization comparable to the LoadXml whitespace-handling options for the Add and Insert...Before functions that create a Text node (AddElement, InsertElementBefore, AddText, and InsertTextBefore).
  • Whitespace normalization applies to the characters in the input serialized string, not to the values after entity substitution. See Example 4, below.
  • If input is a Stringlist, LoadXml inserts a linefeed character after each item in the Stringlist as part of concatenation prior to deserialization. The linefeed is then subject to the method's whitespace handling options, so it is usually removed (as leading or trailing whitespace).
  • Using WspNewline or WspToken reduces the space consumed by individual Text nodes, and in some cases collapses all whitespace content between markup to the null string, so it is not stored as a Text node. This reduces the storage required by the XmlDoc, speeds up XPath and node access processing, and makes the output of, say, the Print subroutine easier to read.
  • The LinefeedNoTrailingTabs option only affects Text nodes that contain an initial line-end character followed by any number of tabs and nothing else. The LinefeedNoTrailingTabs effect on such a Text node, whether it is specified with or without any of the "Wsp" options, is to store the value of the node as a single line-end character.

    One example of the use of the LinefeedNoTrailingTabs option is an input XML document to be deserialized for which both of the following are true:

    • A digital signature is needed of a subtree in the document.
    • The input subtree contains a linefeed and one or more tabs that separate markup, and the linefeed must be kept but the tabs discarded for the signature.

    For information about exclusive canonicalization, serialization expressly designed for digital signatures, see the Serial function.

Deserializing Unicode strings

The LoadXml AllowUntranslatable option lets you deserialize Unicode strings that contain characters that are not translatable to EBCDIC. For example, LoadXml accepts the Unicode trademark character (U+2122) only if you specify AllowUntranslatable, as in the following.

%u is unicode Initial('&#x2122;':U) %nod:LoadXml(%u, 'AllowUntranslatable')

If you remove AllowUntranslatable, this LoadXml statement fails, because the Unicode trademark character does not translate to an EBCDIC character. By default, the method detects any untranslatable characters in the serialized input XML document; it also throws an XmlParseError exception with reason UntranslatableUnicode (unless the ErrRet option is specified).

This default detection of non-translatable characters may suit your purposes. That is, it ensures that subsequent access to the deserialized content is performed without any Unicode to EBCDIC translation errors. For example:

%doc:LoadXml ... %val Longstring %val = %doc:Value(%xpath)

Although the Value property returns a Unicode string, the assignment to the EBCDIC string %val will not fail due to a Unicode translation problem: if there is any untranslatable Unicode (including, of course, strings in the XML document which your application never accesses), the LoadXml operation fails.

If you use AllowUntranslatable, all Unicode characters in a serialized input XML document are allowed and stored in the XmlDoc. Your stored data may contain content that is not translatable to EBCDIC, however. A subsequent attempt to access such content that performs Unicode to EBCDIC translation (like the Value property statement above) might cause request cancellation.

You should therefore use AllowUntranslatable only if there is also a check for translatability when parts of the XmlDoc that may have non-translatable Unicode content are accessed. The code below, for example, shows a way to get the benefit of specifying AllowUntranslatable while limiting the risk of request cancellation.

In the following example, it is believed that only the element comments might contain untranslatable Unicode among all the data accessed from the XML document:

%resp:LoadXml(%doc, 'AllowUntranslatable') ... %uVal Unicode %val Longstring %uVal = %node:Value('comments') Try %val = %uVal:UnicodeToEbcdic Catch CharacterTranslationException %val = %uVal:UnicodeToEbcdic(CharacterEncode=True) Print 'Untranslatable Unicode, character encoded:' - And %val End Try

Note: Unicode values, untranslatable or not, are always allowed when they are added to an XmlDoc using one of the "Add" or "Insert" methods that "directly store" into an XmlDoc. For example, the following fragment adds an Element node with a value that is the Unicode trademark sign:

%node:AddElement('notation', '&#x2122;':U)

Note: LoadXml accepts input that contains XML hex character references but not input that contains XHTML entity references (other than the five predefined entities described in "Entity references". For example, this statement successfully loads a copyright character:

%d:LoadXml('<a>&#xA9;</a>')

And this statement successfully loads a greater-than sign:

%d:LoadXml('<a>&gt;</a>')

But this statement with a copyright character entity fails:

%d:LoadXml('<a>&copy;</a>')

You can load a copyright character, however, if you decode the reference and convert to Unicode before the deserialization. For example:

%d:LoadXml('<a>&copy;</a>':U)

For more information about working with Unicode characters, see Strings and Unicode with the XmlDoc API.

Using the ReplaceUnicode option

The ReplaceUnicode option lets you replace certain Unicode input characters with those characters you have explicitly specified (by UNICODE commands in your site's Model 204 CCAIN stream).

For example, assume the following command is in CCAIN:

UNICODE Table Standard Rep U=2122 '(tm)'

Given the above command, the ReplaceUnicode option for LoadXml is shown in the following fragment:

%u unicode initial('<a>') %u = %u:unicodeWith('2122':X:utf16ToUnicode) %u = %u:unicodeWith('</a>':U) %d:loadXml(%u, 'ReplaceUnicode') %d:print

The result is:

<a>(tm)</a>

In the preceeding example, the stream of input characters to LoadXml contains the Unicode character U+2122. Since the ReplaceUnicode option applies to both the stream of input characters and to the character value of character references, consider the following fragment (assuming the same CCAIN line as above):

%d:LoadXml('<a>&#x2122;</a>', 'ReplaceUnicode')

The result is also:

<a>(tm)</a>

In this case, U+2122 does not occur in the input character stream, but it is the value of the character reference.

Notes:

  • It is an error to be processing a replacement string within a character reference. For example, assume the following two lines are in CCAIN:

    UNICODE Table Standard Rep U=00B2 '2'

    Given the above command, the following fragment gets a parse error, because the replacement string is being used as part of a character reference:

    %d:LoadXml('<a>&#x' With '&#xB2;':U With ';</a>', 'ReplaceUnicode')

    As a consequence of this rule, a replacement string should not contain an ampersand character (assuming that the ReplaceUnicode option will be used).

  • Replacement of a Unicode character due to the ReplaceUnicode option is only done while processing names and values in the XML document. It is an error if the end of the name or value occurs and the replacement string has not been exhausted. In other words (again assuming that the ReplaceUnicode option will be used), a replacement string should not have "XML markup" that might end a string, such as a quotation mark or a left angle bracket (<). For example, assume the following line is in CCAIN:

    UNICODE Table Standard Rep U=2122 '(trademark)<tm>'

    Given the above command, the following fragment gets a parsing error, because the "less than" character (<) that is encountered in the replacement string ends the element content:

    %d:LoadXml('<a>&#x2122;</a>':U, 'ReplaceUnicode')

  • If a parsing error occurs after processing a Unicode character that has been replaced, the error display of the input stream will contain the replacement string, and the replaced character will not be displayed. However, if the character being replaced was introduced as a character reference, the character reference remains in the display of the input stream.

Examples

  1. The following code creates the XmlDoc representation of the indicated XML document:

    %d object xmlDoc %d = new %d:loadXml('<zen>The Buddha dog says</zen>')

  2. The following code creates an XML document as plain text for a test (or for some other application):

    %d object xmlDoc %d = new %sl object stringlist %sl = new text to %sl <test> <test2> supercalifragilisticexpailodocious </test2> </test> end text %d:loadXml(%sl)

  3. The following code calls a subroutine which uses the ErrRet option:

    %d object xmlDoc %d = new %s longstring ... setup the (serialized) document in %s %d = new call intoXML(%d, %s) ... do interesting things with the XmlDoc ... setup another document in %S call intoXML(%d, %s) ... subroutine intoXML(%d object xmlDoc, %s longstring) if %d:loadXml(%s, 'ErrRet') then ... error handling code ... end if end subroutine

  4. As stated for the options argument in the LoadXml syntax, whitespace normalization applies to the characters in the input serialized string, not the values after entity substitution. Therefore the values of elements foo1 and foo2 created by the following two LoadXml invocations are different:

    %t = $X2C('05') %d:loadXml('<foo1>' With %t With %t with '</foo1>', - 'wsptoken') %d2:loadXml('<foo2>&#x09;&#x09;' with '</foo2>', - 'wsptoken')

  5. In the following fragment, element a, which contains leading and intermediate whitespace, is deserialized with the WspToken option, then printed:

    ... %d object xmlDoc auto new %le string len 16 %le = $X2C('0D25') %d:loadXml('<a foo=" bar " > x' with %le - with 'y</a>', 'WspToken') print %d:serial('.', 'EBCDIC') ...

    The result shows that WspToken removes the leading whitespace — in the Element content, not in the Attribute value — and replaces the intermediate linefeed (the initial carriage return removed as the XML standard normalization dictates) with a single blank:

    <a foo=" bar ">x y</a>

    If WspNewline is used instead, the leading whitespace remains (it contains no line-end characters), and the intermediate linefeed (represented below by a question mark) also remains (it is not leading or trailing):

    <a foo=" bar "> x?y</a>

    In this example, using the WspPreserve option gives the same result as WspNewline: no whitespace is removed, except for the initial carriage return due to the XML standard normalization.

    If the example is changed slightly so the Text node includes only tab characters and a leading line-end character, and LinefeedNoTrailingTabs is specified:

    ... %le = $x2c('0D25') %tb = $x2c('05') %d:loadXml('<a foo=" bar " >' With %le with %tb - with %tb with '</a>', 'LinefeedNoTrailingTabs') print %d:serial('.', 'EBCDIC') ...

    The resulting Text node contains only the line-end character:

    <a foo=" bar ">?</a>

  6. The ParseXml function in the HttpResponse class has the same options as LoadXml. The following fragment requests, receives, and deserializes an XML document from a Web server:

    %httpreq object httpRequest %httpresp object httpResponse %doc object xmlDoc %httpreq = new %doc = new %httpreq:URL = 'foo.com/bar' %httpresp = %httpreq:Get('HTTP_CLIENT') if %httpresp:ParseXML(%doc, 'ErrRet') then ... invalid document received from Web server end If

    Note: If you use $Sock_Recv and LoadXml directly instead of using an HTTP Helper object, always use the BINARY option of $Sock_Recv, so that LoadXml can recognize the character encoding inherent in the serialized XML document.

Request cancellation errors

This list is not exhaustive: it does not include all the errors that are request cancelling.

  • Method object doc is not EMPTY.
  • An options argument is invalid.
  • Insufficient free space exists in CCATEMP.
  • A syntax error occurred in the representation of the XML document in input (this is tolerated if the ErrRet options value is specified).

See also

  • To deserialize a document (which has been POSTed or PUT) using Janus Web Server, use WebReceive.
  • For other transport APIs, such as Janus Sockets or Model 204 MQ Series, LoadXml can be used to deserialize a document that has been received with the transport API. As mentioned in the example above, Janus Sockets has a convenient ParseXml method for deserializing an HTTP response.
  • The function that serializes an XmlDoc as a UTF-8 or EBCDIC string is Serial.
  • For more information about normalization, see Normalization during deserialization.
  • For additional discussion about deserialization, see Transport: receiving and sending XML.