XmlDoc API: Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (add word)
 
(47 intermediate revisions by 4 users not shown)
Line 1: Line 1:
<div class="toclimit-3">
<!-- The XmlDoc API -->
<!-- The XmlDoc API -->
<span style="font-size:120%; color:black"><b>XmlDoc API concepts and data structures</b></span>
<span style="font-size:120%; color:black"><b>XmlDoc API concepts and data structures</b></span>
Line 7: Line 8:
<li>[[List of XmlNode methods]]
<li>[[List of XmlNode methods]]
<li>[[List of XmlNodelist methods]]
<li>[[List of XmlNodelist methods]]
<li>[[List of XmlDoc API methods]]
</ul></td>
</ul></td>
<td valign="top"><ul style="margin-top:0px;">
<td valign="top"><ul style="margin-top:0px;">
Line 12: Line 14:
<li>[[XmlNode methods syntax]]
<li>[[XmlNode methods syntax]]
<li>[[XmlNodelist methods syntax]]
<li>[[XmlNodelist methods syntax]]
<li>[[XML processing in Janus SOAP]]
</ul></td>
</ul></td>
</tr>
</tr>
</table>
</table>
The <var>XmlDoc</var> API is based on the use of XML documents.
The <var>XmlDoc</var> API is based on the use of XML documents.
"[[XML processing in Janus SOAP]]" and various XML references
[[XML processing in Janus SOAP]] and various XML references
explain that an XML document can contain any type of
explain that an XML document can contain any type of
data, so an XML document may not be primarily intended for human reading.
data, so an XML document may not be primarily intended for human reading.
Line 26: Line 29:
converted to an <var>XmlDoc</var> object.
converted to an <var>XmlDoc</var> object.
   
   
Only a few categories of operations are needed
Only a few categories of operations are needed on XML documents; one way to structure them is:
on XML documents; one way to structure them is:
<table class="thJustBold">
<table class="list">
<tr><th>Receive</th>
<tr><th>Receive</th>
<td>Receive the transmitted text of a document and convert it into an <var>XmlDoc</var>, containing nodes to represent the hierarchy of the XML document.</td></tr>
<td>Receive the transmitted text of a document and convert it into an <var>XmlDoc</var>, which uses nodes to represent the hierarchy of the XML document.</td></tr>
 
<tr><th>Update</th>
<tr><th>Update</th>
<td>Update or create an <var>XmlDoc</var>, by adding, deleting, copying, or replacing nodes.</td></tr>
<td>Update or create an <var>XmlDoc</var>, by adding, deleting, copying, or replacing nodes.</td></tr>
<tr><th>Access</th>
<tr><th>Access</th>
<td>Access nodes in an <var>XmlDoc</var>, and data contained within them.</td></tr>
<td>Access nodes in an <var>XmlDoc</var>, and data contained within them.</td></tr>
<tr><th>Send</th>
<tr><th>Send</th>
<td>Convert an <var>XmlDoc</var> into a textual representation, and transmit it.</td></tr>
<td>Convert an <var>XmlDoc</var> into a textual representation, and transmit it.</td></tr>
<tr><th>Other</th>
<tr><th>Other</th>
<td>There are other operations, such as <var>XmlDoc</var> properties to control certain operations, data structure housekeeping, and debugging facilities.</td></tr>
<td>There are other operations, such as <var>XmlDoc</var> properties to control certain operations, data structure housekeeping, and debugging facilities.</td></tr>
</table>
</table>
   
   
The remainder of this article describes the [[Janus SOAP User Language Interface]] objects used to operate
The remainder of this article describes the objects used to operate on XML documents.
on XML documents.
It reviews the above categories of operations, showing how they are accommodated by
It reviews the above categories of operations, showing
the <var>XmlDoc</var> API classes: <var>XmlDoc</var>, <var>XmlNode</var>, and <var>XmlNodelist</var>.
how they are accommodated by
The objects are operated upon by methods that are members of these classes.
the <var>XmlDoc</var>
 
API classes: <var>XmlDoc</var>, <var>XmlNode</var>, and <var>XmlNodelist</var>.
==Typical operations on an XML document==
The objects are operated upon by methods
which are members of these classes.
===Typical operations on an XML document===
This section list the categories of operations on an XML document, and
This section list the categories of operations on an XML document, and
provides the motivation for objects in the <var>XmlDoc</var>
provides the motivation for objects in the <var>XmlDoc</var>
API: <var>XmlDoc</var>s, <var>XmlNode</var>s, and <var>XmlNodelist</var>s.
API: <var>XmlDoc</var>s, <var>XmlNode</var>s, and <var>XmlNodelist</var>s.
====Receive====
 
<div id="Receive"></div>
===Receive or Load===
The process of receiving a document actually consists of two
The process of receiving a document actually consists of two
steps:
steps:
<ol>
<ol>
<li>Receiving the document text
<li>Receiving the document text
using some &ldquo;transport&rdquo; mechanism, such as [[Janus Web Server]] (HTTP, as server)
using some &ldquo;transport&rdquo; mechanism, such as <var class="product">[[Janus Web Server]]</var> (HTTP, as server)
<var class="product">[[Janus Sockets]]</var> (usually, HTTP, as client), <var class="product">Model 204</var> MQ Series, access from a file, etc.
<var class="product">[[Janus Sockets]]</var> (usually, HTTP, as client), <var class="product">Model 204</var> MQ Series, access from a file, etc.
<li>Converting the XML document (deserialization) into its internal representation (an <var>XmlDoc</var>)
<li>Converting the XML document (deserialization) into its internal representation (an <var>XmlDoc</var>)
so that other operations can be performed on it.
so that other operations can be performed on it.
</ol>
</ol>
If the XML document is received by ''Janus Web Server'',
 
If the XML document is received by <var class="product">Janus Web Server</var>,
these steps are performed together by the
these steps are performed together by the
<var>[[WebReceive (XmlDoc function)|WebReceive]]</var> function.
<var>[[WebReceive (XmlDoc function)|WebReceive]]</var> function.
For the [[HTTP Helper]], the document text is received by the <var>[[HttpRequest class|HttpRequest]]</var> <var>[[Get (HttpRequest function)|Get]]</var>,
For the <var class="product">[[HTTP Helper]]</var>, the document text is received by the <var>[[HttpRequest class|HttpRequest]]</var> <var>[[Get (HttpRequest function)|Get]]</var>,
<var>[[Post (HttpRequest function)|Post]]</var> or <var>[[Send (HttpRequest function)|Send]]</var> function, and the
<var>[[Post (HttpRequest function)|Post]]</var>, or <var>[[Send (HttpRequest function)|Send]]</var> function, and the
deserialization is done by the <var>[[HttpResponse class|HttpResponse]]</var> <var>[[ParseXml (HttpResponse function)|ParseXml]]</var> function.
deserialization is done by the <var>[[HttpResponse class|HttpResponse]]</var> <var>[[ParseXml (HttpResponse function)|ParseXml]]</var> function.
For other forms of transport, the steps are performed separately:
For other forms of transport, the steps are performed separately:
the text form of the document is received into a longstring, and
the text form of the document is received into a <var>[[Longstrings|Longstring]]</var>, and
the longstring contents are converted into internal form by the
the <var>Longstring</var> contents are converted into internal form by the
<var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var> function.
<var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var> function.
 
====Update====
See the discussion about [[#recvSendDetail|sending and receiving]], which mentions other details about the operations ("Receive", "Load", etc.) that create the initial content of an <var>XmlDoc</var>.
You modify an <var>XmlDoc</var> using various <var>XmlDoc</var> [[List of XmlDoc API methods|API methods]].
 
===Update===
You modify an <var>XmlDoc</var> using various [[List of XmlDoc API methods|XmlDoc API methods]].
If you start with an empty <var>XmlDoc</var>, some methods (the
If you start with an empty <var>XmlDoc</var>, some methods (the
<var>Add</var>* and <var>Insert</var>* methods for various node types)
<var>Add</var>* and <var>Insert</var>* methods for various node types)
Line 81: Line 90:
representing it in the serial text form.
representing it in the serial text form.
You can also update an <var>XmlDoc</var> into which you have received a document.
You can also update an <var>XmlDoc</var> into which you have received a document.
 
====Access====
See the [[#Updating|section on updating]] for a more detailed overview of <var>XmlDoc</var> updating operations.
 
===Access===
Since an XML document is a hierarchical structure, your application
Since an XML document is a hierarchical structure, your application
will need to select some part of the hierarchy to operate upon, for example,
will need to select some part of the hierarchy to operate upon, for example,
Line 91: Line 102:
where in the hierarchy an update is performed.
where in the hierarchy an update is performed.
   
   
Selecting nodes from an XmlDoc is performed using the XPath language
Selecting nodes from an <var>XmlDoc</var> is performed using the XPath language,
(introduced in [[XML processing in Janus SOAP#XML Path Language (XPath): used in the XmlDoc API|XML Path Language (XPath)]]).
introduced in [[XML processing in Janus SOAP#XML Path Language (XPath): used in the XmlDoc API|XML Path Language (XPath)]].
XPath can be used for accessing a single node in the document, for
XPath can be used for accessing a single node in the document, for
example, getting an element node's string value using the
example, getting an element node's string value using the
Line 102: Line 113:
work with them, including the <var>[[Item (XmlNodelist function)|Item]]</var> function, which gets a single
work with them, including the <var>[[Item (XmlNodelist function)|Item]]</var> function, which gets a single
<var>XmlNode</var> from an <var>XmlNodelist</var>.
<var>XmlNode</var> from an <var>XmlNodelist</var>.
<var>[[SelectSingleNode (XmlDoc/XmlNode function)|SelectSingleNode]]</var> returns an <var>XmlNode</var>, as do most of the <var>Add</var>* and <var>Insert</var>*<var>Before</var>
<var>[[SelectSingleNode (XmlDoc/XmlNode function)|SelectSingleNode]]</var> returns an <var>XmlNode</var>, as do most of the <var>Add</var>* and <var>Insert</var>*<var>Before</var> methods.
methods.
 
====Send====
===Send===
The process of sending a document actually consists of two
The process of sending a document actually consists of two
steps:
steps:
<ol>
<ol>
<li>Converting the <var>XmlDoc</var> into its serial text representation.
<li>[[#About converting|Converting the XmlDoc]] into its serial text representation.
<li>Sending the document text
 
using some &ldquo;transport&rdquo; mechanism, such as ''Janus Web Server'' (HTTP, as server),
<li>Sending the document text using some "transport"; mechanism, such as <var class="product">Janus Web Server</var> (HTTP, as server),
<var class="product">[[Janus Sockets]]</var> (usually, HTTP, as client), <var class="product">Model 204</var> MQ Series, access from a file, etc.
<var class="product">Janus Sockets</var> (usually, HTTP, as client), <var class="product">Model 204</var> MQ Series, access from a file, etc.
</ol>
</ol>
If the XML document is sent by ''Janus Web Server'', the steps can be
 
performed together by the
If the XML document is sent by <var class="product">Janus Web Server</var>, the steps can be performed together by the
<var>[[WebSend (XmlDoc subroutine)|WebSend]]</var> subroutine.
<var>[[WebSend (XmlDoc subroutine)|WebSend]]</var> subroutine.
For the [[HTTP Helper]], the document is serialized by the <var>[[HttpRequest class|HttpRequest]]</var> <var>[[AddXml (HttpRequest subroutine)|AddXml]]</var> subroutine,
For the <var class="product">HTTP Helper</var>, the document is serialized by the <var>[[HttpRequest class|HttpRequest]]</var> <var>[[AddXml (HttpRequest subroutine)|AddXml]]</var> subroutine, and the document is sent by
and the document is sent by
the <var>HttpRequest</var> <var>Get</var>, <var>Post</var>, or <var>Send</var> function.
the <var>HttpRequest</var> <var>Get</var>, <var>Post</var>, or <var>Send</var> function.
For other forms of transport, the steps are performed separately:
For other forms of transport, the steps are performed separately:
the <var>XmlDoc</var> is converted into external form by the
the <var>XmlDoc</var> is converted into external form by the
<var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var> function
<var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var> function,
and the converted result is sent using the appropriate transport.
and the converted result is sent using the appropriate transport.
   
   
====Other operations====
===Other operations===
Some other operations the <var>XmlDoc</var> API methods perform include:
Some other operations the <var>XmlDoc</var> API methods perform include:
<ul>
<ul>
<li>Creating and initializing an <var>XmlDoc</var> or <var>XmlNodelist</var>.
<li>Creating and initializing an <var>XmlDoc</var> or <var>XmlNodelist</var>.
<li>Setting or retrieving
<li>Setting or retrieving
some property of an <var>XmlDoc</var>, for example, the URI associated with
some property of an <var>XmlDoc</var>, for example, the URI associated with
a prefix to be used in an XPath expression
a prefix to be used in an XPath expression
(see <var>[[SelectionNamespace (XmlDoc property)|SelectionNamespace]]</var>).
(see <var>[[SelectionNamespace (XmlDoc property)|SelectionNamespace]]</var>).
<li>Displaying a document, or some part of it, usually for debugging purposes
<li>Displaying a document, or some part of it, usually for debugging purposes
(see <var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var>).
(see <var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var>).
</ul>
</ul>
===The XmlDoc class===
 
==The XmlDoc class==
<!-- on XmlDoc class page, #REDIRECT [[XmlDoc API#The XmlDoc class]] -->
<!-- on XmlDoc class page, #REDIRECT [[XmlDoc API#The XmlDoc class]] -->
An <b><var>XmlDoc</var></b> object is the internal
 
representation of an XML document; creating one is usually done
An <b><var>XmlDoc</var></b> object is the internal representation of an XML document; creating one is usually done
by invoking the <var>XmlDoc</var> [[New (XmlDoc/XmlNodelist constructor)|New]] constructor,
by invoking the <var>XmlDoc</var> [[New (XmlDoc/XmlNodelist constructor)|New]] constructor,
which returns an <var>XmlDoc</var> instance.
which returns an <var>XmlDoc</var> instance.
An <var>XmlDoc</var> is a tree structure of nodes.
An <var>XmlDoc</var> is a tree structure of nodes.
The types of nodes that an <var>XmlDoc</var> may contain are the following (which are enumeration return values of the
The types of nodes that an <var>XmlDoc</var> may contain are shown in the [[#nodeTypes|following subsection]].
 
<div id="nodeTypes"></div>
 
===XmlDoc node types===
An <var>XmlDoc</var> is a tree structure of nodes. The possible node types are listed here (these are the enumeration return values of the
<var>[[Type (XmlDoc/XmlNode function)|Type]]</var> function):
<var>[[Type (XmlDoc/XmlNode function)|Type]]</var> function):
<dl>
<dl>
<dt><var>Attribute</var>
<dt><var>Attribute</var>
<dd>This type of node is used to represent an attribute of an XML element.
<dd>This type of node is used to represent an attribute of an XML element.
<dt><var>Comment</var>
<dt><var>Comment</var>
<dd>This type of node is used to represent a comment
<dd>This type of node is used to represent a comment
(serialized in the form: &lt;!--comment-->) in an XML document.
(serialized in the form: <code>&lt;!--comment--></code>) in an XML document.
 
<dt><var>Root</var>
<dt><var>Root</var>
<dd>This type of node is the root of the <var>XmlDoc</var> tree.
<dd>This type of node is the root of the <var>XmlDoc</var> tree.
Line 155: Line 176:
child nodes.
child nodes.
<var>Root</var> and <var>Element</var> nodes are the only nodes that can have child nodes.
<var>Root</var> and <var>Element</var> nodes are the only nodes that can have child nodes.
<dt><var>Element</var>
<dt><var>Element</var>
<dd>This type of node is used to represent an element in an XML document.
<dd>This type of node is used to represent an element in an XML document.
<var>Element</var> and <var>Root</var> node ares the only nodes that can have child nodes.
<var>Element</var> and <var>Root</var> nodes are the only nodes that can have child nodes.
 
<dt><var>Pi</var>
<dt><var>Pi</var>
<dd>This type of node is used to represent a processing instruction
<dd>This type of node is used to represent a processing instruction
(&lt;?target ...?>) in an XML document.
(<code>&lt;?target ...?></code>) in an XML document.  
<blockquote class="note">
'''Note:'''
<p>'''Note:'''
Although the &ldquo;XML
Although the "XML declaration" (<code>&lt;?xml version=...?></code>) has the same appearance
declaration&rdquo; (<code>&lt;?xml version=...?></code>) has the same appearance
as a processing instruction, it is not a <var>Pi</var>. </p>
as a processing instruction, it is not a <var>Pi</var>.
<p>
Also, note that the values of an <var>XmlDoc</var>'s XML declaration can be obtained
Also, note that the values of an <var>XmlDoc</var>'s XML declaration can be obtained
and set with these properties:
and set with these properties:
<var>[[Version (XmlDoc property)|Version]]</var>,
<var>[[Version (XmlDoc property)|Version]]</var>,
<var>[[Encoding (XmlDoc property)|Encoding]]</var>, and
<var>[[Encoding (XmlDoc property)|Encoding]]</var>, and
<var>[[Standalone (XmlDoc property)|Standalone]]</var>.
<var>[[Standalone (XmlDoc property)|Standalone]]</var>. </p>
</blockquote>
 
<dt><var>Text</var>
<dt><var>Text</var>
<dd>This type of node is used to represent character content within
<dd>This type of node is used to represent character content within an XML element.
an XML element.
Note that a Text node will never contain the null string, and that two <var>Text</var> nodes
Note that a Text node will never contain the null string, and that two <var>Text</var> nodes
can be adjacent only if the <var>[[AdjacentText (XmlDoc property)|AdjacentText]]</var> property
can be adjacent only if the <var>[[AdjacentText (XmlDoc property)|AdjacentText]]</var> property
is set to allow it.
is set to allow it.
</dl>
</dl>
The <var>XmlDoc</var> node types listed above correspond almost exactly with the structures
The <var>XmlDoc</var> node types listed above correspond almost exactly with the structures
contained in an XML document (see [[XML processing in Janus SOAP#XML|XML]] and [[XML processing in Janus SOAP#XML example|XML example]]).
contained in an XML document (see [[XML processing in Janus SOAP#XML|XML]] and [[XML processing in Janus SOAP#XML example|XML example]]).
The <var>Root</var> node is always present and
The <var>Root</var> node, always present, corresponds to the node that
corresponds to the node that
contains the document as a whole.
contains the document as a whole.
You can insert additional nodes, either by deserializing a character stream
You can insert additional nodes, either by deserializing a character stream
containing an XML document instance (for example, with <var>WebReceive</var>),
containing an XML document instance (for example, with <var>WebReceive</var>),
or by using <var>Add</var>*/<var>Insert</var>*<var>Before</var> methods to insert nodes.
or by using <var>Add</var>*/<var>Insert</var>*<var>Before</var> methods to insert nodes.
The children of the <var>Root</var> node are the &ldquo;top-level&rdquo; element and any top-level
The children of the <var>Root</var> node are the "top-level" element and any top-level
processing instructions and/or comments that precede or follow it.
processing instructions and/or comments that precede or follow it.
Do not confuse the <var>Root</var> node,  which is the root
Do not confuse the <var>Root</var> node,  which is the root of the <var>XmlDoc</var> tree, with the top-level element of the XML document.
of the XmlDoc tree, with the top-level element of the XML document.
 
===XmlDoc states===
====XmlDoc states====
An <var>XmlDoc</var> can have one of the following three states:
An <var>XmlDoc</var> can have one of the following three states:
<dl>
<dl>
<dt>EMPTY
<dt>EMPTY
<dd>An <var>XmlDoc</var> in this state has no nodes other than the <var>Root</var> node.
<dd>An <var>XmlDoc</var> in this state has no nodes other than the <var>Root</var> node.
This is the state of an <var>XmlDoc</var> as returned by the <var>XmlDoc</var>
This is the state of an <var>XmlDoc</var> as returned by the <var>XmlDoc</var> <var>New</var> method.
<var>New</var> method.
 
<dt>WELL-FORMED
<dt>WELL-FORMED
<dd>An <var>XmlDoc</var> in this state contains at least the top-level <var>Element</var> node.
<dd>An <var>XmlDoc</var> in this state contains at least the top-level <var>Element</var> node.
<dt>Non-EMPTY not WELL-FORMED
<dt>Non-EMPTY not WELL-FORMED
<dd>An <var>XmlDoc</var> in this state contains at least one <var>Comment</var> or <var>Pi</var>
<dd>An <var>XmlDoc</var> in this state contains at least one <var>Comment</var> or <var>Pi</var>
Line 208: Line 232:
you can only use an EMPTY <var>XmlDoc</var> as the target of &ldquo;deserializing&rdquo; the
you can only use an EMPTY <var>XmlDoc</var> as the target of &ldquo;deserializing&rdquo; the
text representation of an XML document.
text representation of an XML document.
 
===The XmlNode and XmlNodelist classes, and XPath===
==The XmlNode and XmlNodelist classes, and XPath==
<!-- on XmlNode/list class page, #REDIRECT [[XmlDoc API#The XmlNode and XmlNodelist classes, and XPath]] -->
<!-- on XmlNode/list class page, #REDIRECT [[XmlDoc API#The XmlNode and XmlNodelist classes, and XPath]] -->
In addition to using an <var>XmlDoc</var> directly,
In addition to using an <var>XmlDoc</var> directly, you can access an <var>XmlDoc</var> with either of the following objects:
you can access an XmlDoc with either of the following objects:
<ul>
<ul>
<li>An <b><var>XmlNode</var></b>, which is a single pointer to a node in an <var>XmlDoc</var>
<li>An <b><var>XmlNode</var></b>, which is a single pointer to a node in an <var>XmlDoc</var>
<li>An <b><var>XmlNodelist</var></b>, which contains a
<li>An <b><var>XmlNodelist</var></b>, which contains a
list of pointers to nodes selected from an <var>XmlDoc</var>
list of pointers to nodes selected from an <var>XmlDoc</var>
</ul>
</ul>
Instances of both of these objects are created by and returned as the
Instances of both of these objects are created by and returned as the
value of several <var>XmlDoc</var> API functions.
value of several <var>XmlDoc</var> API functions.
Line 229: Line 254:
A single <var>XmlDoc</var> can have any number of <var>XmlNodelist</var>s and <var>XmlNode</var>s associated with it.
A single <var>XmlDoc</var> can have any number of <var>XmlNodelist</var>s and <var>XmlNode</var>s associated with it.
   
   
Most operations on the &ldquo;contents&rdquo; of an <var>XmlDoc</var>
Most operations on the "contents" of an <var>XmlDoc</var>
select one or more nodes using '''XPath''' expressions
select one or more nodes using XPath expressions ("PathExpr" is the XPath syntax term, as explained in
(&ldquo;PathExpr&rdquo; is the XPath syntax term, as explained in
[[XML processing in Janus SOAP#XPath syntax|XPath syntax]].
"[[XML processing in Janus SOAP#XPath syntax|XPath syntax]]".
All methods that accept an XPath LocationPath expression argument are
All methods that accept an XPath LocationPath expression argument are
members of both the <var>XmlDoc</var> and the <var>XmlNode</var> classes.
members of both the <var>XmlDoc</var> and the <var>XmlNode</var> classes.
Line 242: Line 266:
the <var>Root</var> node.
the <var>Root</var> node.
The syntax of an absolute XPath expression begins with a forward slash (<code>/</code>).
The syntax of an absolute XPath expression begins with a forward slash (<code>/</code>).
<dt>Relative XPath expression
<dt>Relative XPath expression
<dd>A relative XPath expression selects nodes from an <var>XmlDoc</var>, starting from
<dd>A relative XPath expression selects nodes from an <var>XmlDoc</var>, starting from
a '''context
a '''context node''' which is determined when the expression is used.
node''' which is determined when the expression is used.
The syntax of a relative XPath expression begins with a character other than a slash.
The syntax of a relative XPath expression begins with a character other than a slash.
When you use a relative XPath expression,
When you use a relative XPath expression, the context node depends on the '''method object'''
the context node depends on the '''method object'''
(the type of object on which the method operates) of the invocation:
(the type of object on which the method operates) of the invocation:
<ol>
<ol>
<li>If the method object is an <var>XmlDoc</var>, the context node is the <var>Root</var> node.
<li>If the method object is an <var>XmlDoc</var>, the context node is the <var>Root</var> node.
<li>If the method object is an <var>XmlNode</var>, the context node is the node which it
<li>If the method object is an <var>XmlNode</var>, the context node is the node which it
points to.
points to.
Line 258: Line 282:
   
   
In addition to operating on the contents of an <var>XmlDoc</var>, there are several methods
In addition to operating on the contents of an <var>XmlDoc</var>, there are several methods
(for example, <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>) that operate on the <var>XmlDoc</var>
(for example, <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>) that operate on the <var>XmlDoc</var> as a whole.
as a whole.
These methods only allow an <var>XmlDoc</var> method object.
These methods only allow an <var>XmlDoc</var> method object.
If you need to obtain the <var>XmlDoc</var> associated with an <var>XmlNode</var> or <var>XmlNodelist</var>,
If you need to obtain the <var>XmlDoc</var> associated with an <var>XmlNode</var> or <var>XmlNodelist</var>,
Line 266: Line 289:
The [[#An example of XmlDoc methods and XPath|following section]] continues the explanation of
The [[#An example of XmlDoc methods and XPath|following section]] continues the explanation of
XPath, <var>XmlNode</var>s, and <var>XmlNodelist</var>s.
XPath, <var>XmlNode</var>s, and <var>XmlNodelist</var>s.
Further information about XPath expressions and
Further information about XPath expressions and node sets is also contained in [[XPath]].
node sets is also contained in [[XPath]].
   
   
===An example of XmlDoc API methods and XPath===
==An example of XmlDoc API methods and XPath==
This section illustrates a small XML document received as a web request,
This section illustrates a small XML document received as a web request,
followed by part of a <var class="product">User Language</var> request that uses some <var>XmlDoc</var>
followed by part of a <var class="product">User Language</var> request that uses some <var>XmlDoc</var>
API methods,
API methods, with particular attention to the method's XPath arguments.
with particular attention to the method's XPath arguments.
   
   
Here is the XML document:
Here is the XML document:
<pre>
<p class="code"><purchase_order>
    <purchase_order>
  <date>25 July, 2001</date>
      <date>25 July, 2001</date>
  <pitm>
      <pitm>
    <partnum>1234</partnum>
        <partnum>1234</partnum>
    <qty>3</qty>
        <qty>3</qty>
  </pitm>
      </pitm>
  <pitm>
      <pitm>
    <partnum>5678</partnum>
        <partnum>5678</partnum>
    <qty>2</qty>
        <qty>2</qty>
  </pitm>
      </pitm>
</purchase_order>
    </purchase_order>
</p>
</pre>
   
   
Here is some <var class="product">User Language</var> which could be used to receive and process this
Here is some <var class="product">User Language</var> which could be used to receive and process this request:
request:
<p class="code">%doc Object XmlDoc
<pre>
%nl Object XmlNodelist
    %doc Object XmlDoc
    %nl Object XmlNodelist
   
   
    * Create XmlDoc, get web request as contents:
&#42; Create XmlDoc, get web request as contents:
    %doc = New
%doc = New
    %doc:WebReceive
%doc:WebReceive
   
   
    * Create work nodelist with all pitm elements:
&#42; Create work nodelist with all pitm elements:
    %nl = %doc:SelectNodes('/purchase_order/pitm')
%nl = %doc:SelectNodes('/purchase_order/pitm')
   
   
    * Process each pitm:
&#42; Process each pitm:
    For %j From 1 To %nl:Count
For %j From 1 To %nl:Count
      %partnum = %nl(%j):Value('partnum')
  %partnum = %nl(%j):Value('partnum')
      %qty    = %nl(%j):Value('qty')
  %qty    = %nl(%j):Value('qty')
      ...
  ...
    End For
End For
</pre>
</p>
   
   
<var>Value</var> and <var>SelectNodes</var>,
<var>Value</var> and <var>SelectNodes</var>,
Line 317: Line 335:
object points.
object points.
   
   
The optional argument shown above is an XPath
The optional argument shown above is an XPath expression (for
expression (for
<var>SelectNodes</var>, <code>/purchase_order/pitm</code>; for <var>Value</var>, <code>partnum</code> and <code>qty</code>).
<var>SelectNodes</var>, <code>/purchase_order/pitm</code>; for <var>Value</var>, <code>partnum</code> and
<code>qty</code>).
An XPath expression selects a list of nodes,
An XPath expression selects a list of nodes,
starting either from the <var>XmlDoc</var> <var>Root</var> (when an '''absolute'''
starting either from the <var>XmlDoc</var> <var>Root</var> (when an '''absolute'''
Line 329: Line 345:
(<code>/</code>) is absolute.
(<code>/</code>) is absolute.
   
   
<var>SelectNodes</var> returns the entire result of its XPath expression
<var>SelectNodes</var> returns the entire result of its XPath expression argument.
argument.
Many other <var>XmlDoc</var> API
Many other <var>XmlDoc</var> API
methods, however, operate on the first of the nodes resulting from the
methods, however, operate on the first of the nodes resulting from the
argument's XPath expression.
argument's XPath expression.
That first node is called the
That first node is called the '''head of the argument XPath result'''.
'''head of the argument XPath result'''.
Note that '''first''' is defined in terms of "document order" (see [[XPath#Order of nodes: node sets versus nodelists|Order of nodes: node sets versus nodelists]]).
Note that '''first''' is defined in terms of &ldquo;document
order&rdquo; (see
"[[XPath#Order of nodes: node sets versus nodelists|Order of nodes: node sets versus nodelists]]").
   
   
===Updating===
==Updating==
Updating an <var>XmlDoc</var> generally refers to the addition and deletion
Updating an <var>XmlDoc</var> generally refers to the addition and deletion
of the nodes of the <var>XmlDoc</var> tree, which includes the generation
of the nodes of the <var>XmlDoc</var> tree, which includes the generation
of the document's initial contents.
of the document's initial contents.
The initial
The initial contents of an <var>XmlDoc</var> can be established by one of the deserialization methods:
contents of an <var>XmlDoc</var> can be established by one of the deserialization methods:
<var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>, <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, or
<var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>, <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, or
<var>[[ParseXml (HttpResponse function)|ParseXml]]</var>.
<var>[[ParseXml (HttpResponse function)|ParseXml]]</var>.
Whether you use a method to set the &ldquo;initial&rdquo; contents
Whether you use a method to set the "initial" contents
of an <var>XmlDoc</var> or whether you start with
of an <var>XmlDoc</var> or whether you start with
an EMPTY <var>XmlDoc</var>, you can then insert nodes into it, using one or more of the
an EMPTY <var>XmlDoc</var>, you can then insert nodes into it, using one or more of the
methods whose
methods whose
name begins with &ldquo;<var>Add</var>,&rdquo; such as <var>[[AddElement (XmlDoc/XmlNode function)|AddElement]]</var>
name begins with "Add", such as <var>[[AddElement (XmlDoc/XmlNode function)|AddElement]]</var>,
or whose name begins with &ldquo;<var>Insert</var>,&rdquo;
or whose name begins with "Insert", such as <var>[[InsertSubtreeBefore (XmlNode function)|InsertSubtreeBefore]]</var>.
such as <var>[[InsertSubtreeBefore (XmlNode function)|InsertSubtreeBefore]]</var>.
   
   
Once an <var>XmlDoc</var> has one or more nodes in addition to the <var>Root</var> node, you
Once an <var>XmlDoc</var> has one or more nodes in addition to the <var>Root</var> node, you
can modify the <var>[[Value (XmlDoc/XmlNode property)|Value]]</var> of <var>Text</var> and other nodes, and you
can modify the <var>[[Value (XmlDoc/XmlNode property)|Value]]</var> of <var>Text</var> and other nodes, and you
can delete nodes from it using <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.
can delete nodes from it using <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.
====Inserting nodes and copying subtrees====
 
The <var>Add</var>* methods are designed to make it easy to &ldquo;append&rdquo; nodes to an
====16M limit to number of XmlDoc items====
<var>XmlDoc</var> in a &ldquo;depth-first, left-to-right&rdquo; order in the simple case.
Internally, an <var>XmlDoc</var> object is maintained in a data structure that has a maximum of 16M items. Each [[#nodeTypes|node]] requires an item, as does each unique string.  A string item is used, for example, as the name of an element, attribute, PI node, or namespace, or as the value of a comment, PI, attribute, or text node, or as a namespace URI.
These methods insert a node as the '''last''' child of the
Items are also used to maintain the <var>[[SelectionNamespace (XmlDoc property)|SelectionNamespace]]</var> property and for other
node pointed to by the method object.
internal purposes.
 
Exceeding the <var>XmlDoc</var> item limit (16M) in an updating operation causes request cancellation. Such operations include, for example, the Receive, Add, and Insert families of methods, as well as changing the <var>[[Value (XmlDoc property)|Value]]</var> of an <var>XmlNode</var>, and so on.
Update operations that cause deletion of items from an <var>XmlDoc</var> (for example, <var>[[DeleteSubtree (XmlDoc subroutine)|DeleteSubtree]]</var>, and replacing the <var>Value</Var> of an <var>XmlNode</var>)
do <var>not</var>, in general, make those items
available for reuse, at least in versions 7.5 of <var class="product">Model 204</var> and earlier.
 
===Inserting nodes and copying subtrees===
The <var>Add</var>* methods are designed to make it easy to "append" nodes to an
<var>XmlDoc</var> in a "depth-first, left-to-right" order in the simple case.
These methods insert a node as the '''last''' child of the node pointed to by the method object.
   
   
Most of the <var>Add</var>* methods (for example <var>AddElement</var>) have <var>Insert</var>*<var>Before</var>
Most of the <var>Add</var>* methods (for example <var>AddElement</var>) have <var>Insert</var>*<var>Before</var>
counterparts (for example, <var>InsertElementBefore</var>)
counterparts (for example, <var>InsertElementBefore</var>) which
which
insert a node in a position other than the last child of an <var>Element</var> or the <var>Root</var>.
insert a node in a position other than the last child of an <var>Element</var> or the <var>Root</var>.
<var>AddAttribute</var> and <var>AddNamespace</var> are the exceptions, without an <var>Insert</var>*<var>Before</var> counterpart.
<var>AddAttribute</var> and <var>AddNamespace</var> are the exceptions, without an <var>Insert</var>*<var>Before</var> counterpart.
Line 376: Line 396:
   
   
Here is an example of the updating methods in the <var>XmlDoc</var> API:
Here is an example of the updating methods in the <var>XmlDoc</var> API:
<pre>
<p class="code">%doc Object XmlDoc
    %doc Object XmlDoc
%doc = New
    %doc = New
%story Object XmlNode
    %story Object XmlNode
%paragraph Object XmlNode
    %paragraph Object XmlNode
%story = %doc:AddElement('story')
    %story = %doc:AddElement('story')
  %story:AddComment('My first XML document')
      %story:AddComment('My first XML document')
  %story:AddElement('greeting', 'Hello, world')
      %story:AddElement('greeting', 'Hello, world')
  %paragraph = %story:AddElement('paragraph')
      %paragraph = %story:AddElement('paragraph')
  %paragraph:AddElement('line', 'Ask not what')
        %paragraph:AddElement('line', 'Ask not what')
  %paragraph:AddElement('line', 'Hear no evil')
        %paragraph:AddElement('line', 'Hear no evil')
</p>
</pre>
   
   
This creates the following XML document:
This creates the following XML document:
<pre>
<p class="code"><story>
    <story>
  <!--My first XML document-->
      <!--My first XML document-->
  <greeting>Hello, world!</greeting>
      <greeting>Hello, world!</greeting>
  <paragraph>
      <paragraph>
    <line>Ask not what</line>
        <line>Ask not what</line>
    <line>Hear no evil</line>
        <line>Hear no evil</line>
  </paragraph>
      </paragraph>
</story>
    </story>
</p>
</pre>
 
====Namespaces with Add* and Insert* methods====
As discussed above in the [[#Updating|introduction to the section on updating]], the number of items in an <var>XmlDoc</var> is limited to 16M items; exceeding this number in any updating operation causes request cancellation.
 
===Namespaces with Add* and Insert* methods===
When an <var>XmlDoc</var> is deserialized, the namespace declarations (and the use of
When an <var>XmlDoc</var> is deserialized, the namespace declarations (and the use of
those declarations by names in the document) follow the scope rules outlined
those declarations by names in the document) follow the scope rules outlined
in "[[XML processing in Janus SOAP#Name and namespace syntax|Name and namespace syntax]]".
in [[XML processing in Janus SOAP#Name and namespace syntax|Name and namespace syntax]].
This section explains the namespace structure that results from the updating
The namespace structure that results from the updating (adding and removing nodes) of an <var>XmlDoc</var> is explained here.
(adding and removing nodes) of an <var>XmlDoc</var>.
Most of these updating methods are the <var>Add</var>* and <var>Insert</var>*<var>Before</var> methods
Most of these updating methods are the <var>Add</var>* and <var>Insert</var>*<var>Before</var> methods
(described individually in the [[List of XmlDoc API methods]]).
(described individually in the [[List of XmlDoc API methods|List of XmlDoc API methods]]).
   
   
In the following
In the following discussion, all references to an <var>Add</var>* method (for example, <var>AddElement</var>) refer
discussion, all references to an <var>Add</var>* method (for example, <var>AddElement</var>) refer
equally to the corresponding <var>Insert</var>*<var>Before</var> method (for example, <var>InsertElementBefore</var>).
equally to the corresponding <var>Insert</var>*<var>Before</var> method (for example, <var>InsertElementBefore</var>).
<ul>
<ul>
<li>From deserialization through serialization, the <var>XmlDoc</var> API
<li>From deserialization through serialization, the <var>XmlDoc</var> API
enforces the syntax rules of
enforces the syntax rules of [[#Well-formed documents and validation|well-formed documents]].
[[#Well-formed documents and validation|well-formed documents]].
When namespace handling is in effect for an <var>XmlDoc</var>
When namespace handling is in effect for an <var>XmlDoc</var>
(that is, the
(that is, the <var>Namespace</var> property setting is <code>On</code>, the default), the prefixes of names of nodes you add must be declared.
<var>Namespace</var> property setting is <code>On</code>, the default), the prefixes of names of nodes
 
you add must be declared.
<li>Once an attribute or element is added, its URI is fixed and will not change thereafter.
<li>Once an attribute or element is added, its URI is fixed
 
and will not change thereafter.
<li>To allow <var>AddElement</var> to add an element with a
<li>To allow <var>AddElement</var> to add an element with a
prefix/URI declaration other than that which is in scope,
prefix/URI declaration other than that which is in scope,
it has a URI argument, which specifies the URI
it has a URI argument, which specifies the URI of the element.
of the element.
If the URI argument is used and the resulting prefix/URI combination requires
If the URI argument is used and the resulting prefix/URI combination requires
a namespace declaration (because it differs from what is in scope), a
a namespace declaration (because it differs from what is in scope), a
declaration is created at the element.
declaration is created at the element.
<li>The URI argument of <var>AddElement</var> can also be such that the application logic
<li>The URI argument of <var>AddElement</var> can also be such that the application logic
around <var>AddElement</var> does not need to depend on whether the prefix/URI declaration
around <var>AddElement</var> does not need to depend on whether the prefix/URI declaration
Line 435: Line 452:
the creation of a namespace declaration at the <var>Element</var> parent of the
the creation of a namespace declaration at the <var>Element</var> parent of the
inserted <var>Attribute</var> node.
inserted <var>Attribute</var> node.
<li>As an alternative to the URI argument on <var>AddElement</var> and <var>AddAttribute</var>, a
<li>As an alternative to the URI argument on <var>AddElement</var> and <var>AddAttribute</var>, a
namespace declaration is created at an element with the <var>AddNamespace</var> method.
namespace declaration is created at an element with the <var>AddNamespace</var> method.
   
   
<var>AddNamespace</var> also lets you insert a namespace declaration that is not used by
<var>AddNamespace</var> also lets you insert a namespace declaration that is not used by
an element nor any of its attributes.
an element nor any of its attributes. For example, the following fragment:
For example, the following fragment:
<p class="code">%n = %doc:AddElement('a')
<pre>
%n:AddNamespace('x', 'y:z')
    %n = %doc:AddElement('a')
%n:AddElement('x:b')
    %n:AddNamespace('x', 'y:z')
%n:AddElement('x:c')
    %n:AddElement('x:b')
</p>
    %n:AddElement('x:c')
</pre>
   
   
Creates the following document:
Creates the following document:
<pre>
<p class="code"><a xmlns:x="y:z">
    <a xmlns:x="y:z">
  <x:b/>
      <x:b/>
  <x:c/>
      <x:c/>
</a>
    </a>
</p>
</pre>
 
<li>When the <var>AddSubtree</var> method copies between different <var>XmlDoc</var>s, it does not allow
<li>When the <var>AddSubtree</var> method copies between different <var>XmlDoc</var>s, it does not allow
the source and target <var>XmlDoc</var>s to have different <var>Namespace</var> property
the source and target <var>XmlDoc</var>s to have different <var>Namespace</var> property
settings (doing so could require re-parsing names in some cases).
settings (doing so could require re-parsing names in some cases).
<li>The namespace axis is not allowed in an XPath expression.
<li>The namespace axis is not allowed in an XPath expression.
Without XPath access to a pointer to a namespace node, the namespace
Without XPath access to a pointer to a namespace node, the namespace
Line 465: Line 482:
certain <var>XmlDoc</var> API methods (for example, the <var>Uri</var> property of an <var>XmlNode</var>).
certain <var>XmlDoc</var> API methods (for example, the <var>Uri</var> property of an <var>XmlNode</var>).
</ul>
</ul>
====Deleting nodes====
 
===Deleting nodes===
The <var>XmlDoc</var> API subroutine used to delete individual nodes (and their descendants)
The <var>XmlDoc</var> API subroutine used to delete individual nodes (and their descendants)
from an <var>XmlDoc</var> is <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.
from an <var>XmlDoc</var> is <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.
An example is shown below.
An example is shown below.
   
   
If a node you delete
If a node you delete was referenced by an <var>XmlNode</var>, the value of that
was referenced by an <var>XmlNode</var>, the value of that
<var>XmlNode</var> becomes <var>Null</var>.
<var>XmlNode</var> becomes <var>Null</var>.
Similarly,
Similarly, if any deleted nodes were referenced by an item of an <var>XmlNodelist</var>,
if any deleted nodes were referenced by an item of an <var>XmlNodelist</var>,
the value of that item of the <var>XmlNodelist</var> becomes <var>Null</var>.
the value of that item of the <var>XmlNodelist</var> becomes <var>Null</var>.
   
   
If you need to &ldquo;clean up&rdquo; an <var>XmlNodelist</var> that refers to a deleted <var>XmlDoc</var>
If you need to "clean up" an <var>XmlNodelist</var> that refers to a deleted <var>XmlDoc</var>
node, you can use the <var>[[Difference (XmlNodelist function)|Difference]]</var> function,
node, you can use the <var>[[Difference (XmlNodelist function)|Difference]]</var> function,
as shown in the following example.
as shown in the following example.
   
   
In the example, the <var>XmlNodelist</var> object (<code>%nlis</code>) is cleaned up
In the example, the <code>%nlis</code> <var>XmlNodelist</var> is cleaned up
(with <var>Difference</var>) to remove the second item from the
(with <var>Difference</var>) to remove the items which have become null.
nodelist before referencing the nodelist items.
The final reference to items in <code>%nlis</code> uses relative XPath (the <code>'@auth'</code>
The reference to the nodelist items
argument of the <var>Value</var> method invocation, which selects the attribute named <code>auth</code>).
uses relative XPath (the <code>'@auth'</code>
<p class="code">%nlis     is object xmlNodelist
argument of the <var>Value</var> method invocation, selecting attributes named
%removeLis is object xmlNodelist
<code>auth</code>).
<pre>
    %nlis Object XmlNodelist
    %rlis Object XmlNodelist
   
   
    * Get nodelist for chapters, and
&#42; Get nodelist for chapters, and delete all
    * delete Dave's chapter from the XmlDoc:
&#42; chapters by indicated author from the XmlDoc:
    %nlis = %d:Nodes('/book/chapter')
%deleteAuth = 'Dave'
    %rlis = %d:Nodes('/book/chapter[@auth="Dave"]')
%nlis = %d:selectNodes('/book/chapter')
    For %i From 1 to %rlis:Count
%removeLis = %d:selectNodes('/book/chapter[@auth="' %deleteAuth '"]')
      %rlis:Item(%i):DeleteSubtree
for %i from 1 to %removeLis:count
    End For
  %removeLis(%i):deleteSubtree
end for
   
   
    * Cleanup the chapter nodelist, show author of
&#42; Cleanup the chapter nodelist, show author of
    * remaining chapters, & display the document:
&#42; remaining chapters, & display the document:
    %nlis = %nlis:Difference(%rlis)
%nlis = %nlis:difference(%removeLis)
    For %i From 1 To %nlis:Count
for %i from 1 to %nlis:count
      Print 'Author:' And %nlis:Item(%i):Value('@auth')
  print 'Author:' and %nlis(%i):value('@auth')
    End For
end for
   
   
    %d:Print
%d:print
    ...
...
</pre>
</p>
====Namespace URI for XPath prefixes====
Also note that in the above example, the <var>Item</var> method is implicitly used (<code>%removeLis(%i)</code> and <code>%nlis(%i)</code>) and that [[Implicit concatenation|implicit concatenation]] is used in the second XPath expression (<code>'/book/chapter[@auth="' %deleteAuth '"]'</code>).
 
As discussed in the [[#Updating|introduction to the section on updating]], the items deleted by <var>DeleteSubtree</var> are not available for reuse by later Add, Insert, etc. methods, so <var>DeleteSubtree</var> does not "relieve" the restriction of 16M items in an <var>XmlDoc</var>.
 
===Namespace URI for XPath prefixes===
It is important to realize that the URI associated with a prefix
It is important to realize that the URI associated with a prefix
<i><b>in the XML document</b></i> is controlled by the <code>xmlns</code>
<i><b>in the XML document</b></i> is controlled by the <code>xmlns</code> namespace declarations in the document.
namespace declarations in the document.
However, when a prefix is used in a name in an XPath argument to
However, when a prefix is used in a name in an XPath argument to
an <var>XmlDoc</var> API method, the URI for that prefix must be established
an <var>XmlDoc</var> API method, the URI for that prefix must be established
Line 521: Line 537:
the <var>[[SelectionNamespace (XmlDoc property)|SelectionNamespace]]</var> property.
the <var>[[SelectionNamespace (XmlDoc property)|SelectionNamespace]]</var> property.
   
   
The prefix names
The prefix names used in Xpath selection are independent of the prefix names used in the document
used in Xpath selection are independent of the prefix names used in the document
serialization and deserialization.
serialization and deserialization.
Since an XML document element prefix may be associated with multiple namespaces,
Since an XML document element prefix may be associated with multiple namespaces,
Line 529: Line 544:
an element name in an XPath location step.
an element name in an XPath location step.
   
   
===Transport: receiving and sending XML===
<div id="recvSendDetail"></div>
The provision for receiving and sending XML is very simple:
 
==Transport: receiving and sending XML==
Distinct sets of methods provide for the receiving and sending of XML:
<ul>
<ul>
<li>Receiving XML involves converting the information from the character,
<li>Receiving XML involves converting the information from the character,
Line 539: Line 556:
<li>The <var>[[WebReceive (XmlDoc function)|WebReceive]]</var> function
<li>The <var>[[WebReceive (XmlDoc function)|WebReceive]]</var> function
is designed to receive an XML document that has arrived as a web request.
is designed to receive an XML document that has arrived as a web request.
<li>The <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>
<li>The <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>
function of the <var>HttpResponse</var> class accomplishes this for
function of the <var>HttpResponse</var> class accomplishes this for
HTTP clients.
HTTP clients.
<li>For other transport mechanisms, such as <var class="product">Model 204</var> MQ Series,
<li>For other transport mechanisms, such as <var class="product">Model 204</var> MQ Series,
the character-format XML document can be placed into a longstring,
the character-format XML document can be placed into a <var>[[Longstrings|Longstring]]</var>,
and the <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var> function
and the <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var> function
then places the information into an <var>XmlDoc</var>.
then places the information into an <var>XmlDoc</var>.
</ul>
</ul>
<li>Sending XML involves converting the information in an <var>XmlDoc</var> to a
<li>Sending XML involves converting the information in an <var>XmlDoc</var> to a
character stream, including markup such as element tags;
character stream, including markup such as element tags;
Line 554: Line 574:
<li>The <var>[[WebSend (XmlDoc subroutine)|WebSend]]</var> subroutine
<li>The <var>[[WebSend (XmlDoc subroutine)|WebSend]]</var> subroutine
is designed to send an XML document as a web response.
is designed to send an XML document as a web response.
<li>The <var>[[AddXml (HttpRequest subroutine)|AddXml]]</var> method of the <var>[[HttpRequest class|HttpRequest]]</var> class accomplishes this for
<li>The <var>[[AddXml (HttpRequest subroutine)|AddXml]]</var> method of the <var>[[HttpRequest class|HttpRequest]]</var> class accomplishes this for
HTTP clients.
HTTP clients.
<li>For other transport mechanisms, such as <var class="product">Model 204</var> MQ Series,
<li>For other transport mechanisms, such as <var class="product">Model 204</var> MQ Series,
the <var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var> function is used to
the <var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var> function is used to
Line 561: Line 583:
</ul>
</ul>
</ul>
</ul>
About encoding:
 
===Information form and content===
The <var class="product">Janus SOAP</var> deserialization methods receive and convert the character form of XML document data into an XmlDoc, which precisely represents the <i>information content</i> of that XML data. They do not preserve those aspects of the character form that are incidental to the information content, such as whether a character reference or an empty-element tag was present. The serialization methods reproduce and send the <var>XmlDoc</var> content, and they offer many global options for how that content is represented on output as a string of characters.
 
For example, the following two serializations represent exactly the same information content:
<p class="code"><top><foo/>&amp;#x31;</top>
 
<top><foo></foo>1</top> </p>
 
In the first string, <code>foo</code> is serialized using the empty element tag <code>&lt;foo/></code>, and the number <code>1</code> is serialized using a character reference <code>&amp;#x31;</code>. In the second string, <code>foo</code> is serialized using a start tag (<code>&lt;foo></code>) and an end tag (<code>&lt;/foo></code>), and the number <code>1</code> is serialized using character content. Neither of these outputs, however, is related to how the XML content was obtained, that is, deserialized. 
 
A given piece of information content has many potential serializations, including variations in line ends, CDATA sections, quotation characters used in attributes, attribute and namespace declaration order, etc. And in general, the character form of an XML document before it is deserialized will not be the same after it is re-serialized. It is a user task to determine to what degree to [[#Updating|update]] an XmlDoc or to select serialization options to produce the desired form of character string output of the XML document information.
 
For example, trying to obtain output like the second serialization in the example above (<code><top><foo></foo>1</top></code>), you specify literal start and end tags for <code>foo</code> in a SOUL deserialization method. But you discover that the default XmlDoc serialization result is <code><top><foo/>1</top></code>. One way (now deprecated) to produce the tag presentation you want in this case is to use the serialization method option (<var>[[XmlDoc API serialization options#NoEmptyElt|NoEmptyElt]]</var>) that forces start and end tags for childless elements. But a better way is discussed in the next paragraph, using the <var>NoEmptyElement</var> property.
 
A related conversion issue arises when using SOUL to generate HTML. Because some browsers work correctly for certain childless elements (for example, <code>&lt;br></code> tags) only if they have an empty element tag, and for other childless elements (for example, <code>&lt;div></code> tags) only if they have separate start and end tags, the <var>NoEmptyElt</var> solution just mentioned will produce HTML that works for some elements but not for others. The resolution for this is not in the serialization or deserialization methods but rather in the [[#Updating|updating methods]] described above. You can use the updating methods to build or modify the HTML elements in your <var>XmlDoc</var>, getting information content equivalent to the deserialization methods, but providing better control and access. For the issue in this example, the <var>[[NoEmptyElement (XmlNode property)|NoEmptyElement]]</var> property would let you selectively apply the tag format needed for successful HTML, as shown in the following request fragment:
<p class="code">%html = %doc:addElement('html')
  %body = %html:addElement('body')
    %div = %body:addElement('div')
      %div:<b>noEmptyElement</b> = true
      %div:addAttribute('id', 'topOfBody')
    %body:addText('foo')
    %body:addElement('br')
    %body:addText('bar')
</p>
If you use the <var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var> method to display the fragment above, the result is this <var>XmlDoc</var> (note the <code>&lt;div></code> and <code>&lt;br></code> tags):
<p class="output"><html>
  <body>
      &lt;div id="topOfBody"></div>
foo      &lt;br/> 
bar</body>
</html> </p>
 
Using the updating methods to build HTML content is also superior to using something ostensibly simpler like this SOUL fragment to create output like that above:
<p class="code">Print '<html>'
Print '  <body>'
Print '      &lt;div id="topOfBody"></div>'
Print '      foo'
Print '      &lt;br/>'
Print '      bar'
Print '  </body>'
Print '</html>'
</p>
 
This approach quickly breaks down, primarily because the places at which you compose the content of the XML document can be widely dispersed (the code not all nicely together like this), and because keeping track of the element end tags can become very difficult.
 
For more information about the character transformations that Janus SOAP applies during deserialization, see [[XML processing in Janus SOAP#Normalization during deserialization|Normalization during deserialization]].
 
For a summary of the various output formatting options available to the Janus SOAP serialization methods, see [[XmlDoc API serialization options]].
 
===About encoding===
<ul>
<ul>
<li>When the internal representation of an <var>XmlDoc</var> is EBCDIC (prior to <var class="product">Sirius Mods</var>
<li>When the internal representation of an <var>XmlDoc</var> is EBCDIC (prior to <var class="product">Sirius Mods</var>
Line 572: Line 644:
a Unicode character that is not translatable to EBCDIC.
a Unicode character that is not translatable to EBCDIC.
   
   
See [[#Char and Reference|"Char and Reference"]] for more information about characters in an XML document.
See [[#Char and Reference|Char and Reference]] for more information about characters in an XML document.
 
<li>The encodings that are accepted in the deserialization operations
<li>The encodings that are accepted in the deserialization operations
are <code>UTF-8</code> and <code>UTF-16</code>, and <code>ISO-8859-<i>n</i></code> (where <i>n</i> is a digit
are <code>UTF-8</code> and <code>UTF-16</code>, and <code>ISO-8859-<i>n</i></code> (where <i>n</i> is a digit from 1 to 9).
from 1 to 9).
Prior to <var class="product">Sirius Mods</var> Version 7.6,
Prior to <var class="product">Sirius Mods</var> Version 7.6,
all of the <code>ISO-8859-<i>n</i></code> variants are treated as <code>ISO-8859-1</code>.
all of the <code>ISO-8859-<i>n</i></code> variants are treated as <code>ISO-8859-1</code>.
As of <var class="product">Sirius Mods</var> 7.6, the variants determine Ascii to Unicode conversions
As of <var class="product">Sirius Mods</var> 7.6, the variants determine Ascii to Unicode conversions
according to the specification of the individual variant.
according to the specification of the individual variant.
<p>
<p class="note">'''Note:'''
'''Note:'''
These encoding names must be specified in uppercase letters. </p>
These encoding names must be specified in uppercase letters. </p>
<li>When the document is serialized, the result is either EBCDIC or is in
 
the UTF-8 encoding.
<li>When the document is serialized, the result is EBCDIC or is in the UTF-8 encoding.
Therefore,
Therefore, the only values permitted to be set for the <var>[[Encoding (XmlDoc property)|Encoding]]</var> property are <code>UTF-8</code> and the null string;
the only values permitted to be set for the <var>[[Encoding (XmlDoc property)|Encoding]]</var> property
in that, see <var>Encoding</var>'s [[Encoding (XmlDoc property)#Usage Notes|Usage Notes]]
are <code>UTF-8</code> and the null string;
in that, see the <var>[[Encoding (XmlDoc property)#Usage Notes|Usage Notes]]</var>
for more information about the character sets allowed in
for more information about the character sets allowed in
a serialized input XML document and the value of <code>encoding</code> in an XML
a serialized input XML document and the value of <code>encoding</code> in an XML declaration.
declaration.
</ul>
</ul>
 
===Strings and Unicode with the XmlDoc API===
==Strings and Unicode with the XmlDoc API==
As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDoc</var>s are maintained in Unicode
As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDoc</var>s are maintained in Unicode
rather than EBCDIC; this is true for all string values, names, prefixes,
rather than EBCDIC; this is true for all string values, names, prefixes, and URIs.
and URIs.
As a consequence, most of the arguments and results
As a consequence, most of the arguments and results
of the <var>XmlDoc</var> API methods that formerly were strings or longstrings are
of the <var>XmlDoc</var> API methods that formerly were strings or longstrings are
Line 608: Line 675:
For example, the EBCDIC character strings in the arguments in a statement
For example, the EBCDIC character strings in the arguments in a statement
like the following are automatically converted to Unicode:
like the following are automatically converted to Unicode:
<pre>
<p class="code">%d:AddElement('name', 'value')
    %d:AddElement('name', 'value')
</p>
</pre>
   
   
Similarly,
Similarly, if the variable <code>%str</code>, below, was declared as type <var>String</var> or
if the variable <code>%str</code>, below, was declared as type <var>String</var> or
<var>Longstring</var>, then the Unicode result of the <var>Value</var> method is automatically
<var>Longstring</var>, then the Unicode result of the <var>Value</var> method is automatically
converted to EBCDIC when it is stored in <code>%str</code>:
converted to EBCDIC when it is stored in <code>%str</code>:
<pre>
<p class="code">%str = %n:Value
    %str = %n:Value
</p>
</pre>
   
   
The principal benefit of this switch to Unicode is conformance with the W3C XML
The principal benefit of this switch to Unicode is conformance with the W3C XML
standard, which defines &ldquo;characters&rdquo; in terms of Unicode characters
standard, which defines "characters" in terms of Unicode characters (most of which are valid in XML documents).
(most of which are valid in XML documents).
You can now store string values that are not translatable to EBCDIC &mdash;
You can now
<var class="product">Sirius Mods</var> 7.5 allows storage only of (most) non-null EBCDIC characters or of characters
store string values that are not translatable to EBCDIC &mdash;
<var class="product">Sirius Mods</var> 7.5 allows storage
only of (most) non-null EBCDIC characters or of characters
that translate to those EBCDIC characters.
that translate to those EBCDIC characters.
   
   
Line 633: Line 694:
<var class="product">Sirius Mods</var> 7.5.
<var class="product">Sirius Mods</var> 7.5.
But there are other changes to or effects on the <var>XmlDoc</var> API that are due
But there are other changes to or effects on the <var>XmlDoc</var> API that are due
to the switch to Unicode maintenance
to the switch to Unicode maintenance (the <var class="product">[http://www.sirius-software.com/maint/download/modrel76.pdf Sirius Mods Release 7.6 Notes]</var>
(the <var class="product">Sirius Mods</var> Release 7.6 Notes
and the individual method descriptions provide additional details):
and the individual method descriptions
provide additional details):
<ul>
<ul>
<li>The workaround (<var>InvalidChar</var> method) for accommodating nulls and EBCDIC
<li>The workaround (<var>InvalidChar</var> method) for accommodating nulls and EBCDIC
characters that are not allowed by the XML standard is replaced by:
characters that are not allowed by the XML standard is replaced by:
<ul>
<ul>
<li>The <var>[[AllowNull (XmlDoc property)|AllowNull]]</var> property can
<li>The <var>[[AllowNull (XmlDoc property)|AllowNull]]</var> property can let nulls be stored in an <var>XmlDoc</var>.
let nulls be stored in an <var>XmlDoc</var>.
<li>A method argument (<var>AllowUntranslatable</var>) of the deserialization methods that lets you
<li>A method argument (<var>AllowUntranslatable</var>)
of the deserialization methods that lets you
store Unicode characters that do not translate to EBCDIC.
store Unicode characters that do not translate to EBCDIC.
Such characters may be also stored directly by the <var>Add</var>* and <var>Insert</var>* methods
Such characters may be also stored directly by the <var>Add</var>* and <var>Insert</var>* methods
Line 651: Line 708:
EBCDIC characters that do not translate to Unicode
EBCDIC characters that do not translate to Unicode
must be handled before they are passed to an <var>XmlDoc</var> update operation.
must be handled before they are passed to an <var>XmlDoc</var> update operation.
For example, EBCDIC X'04' is the SEL (&ldquo;Select&rdquo;) control character.
For example, EBCDIC X'04' is the SEL ("Select") control character.
Since there is no &ldquo;Select&rdquo; control character in Unicode, there is no mapping
Since there is no "Select" control character in Unicode, there is no mapping
between EBCDIC X'04' and any Unicode character.
between EBCDIC X'04' and any Unicode character.
For this you might use
For this you might use the <var>Untranslatable</var> parameter of the
the <var>Untranslatable</var> parameter of the
<var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> function.
<var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> function.
</ul>
</ul>
<li>If you have defined [[Unicode#Invertible_translations|uninvertible translations]],
<li>If you have defined [[Unicode#Invertible_translations|uninvertible translations]],
the implicit translation of EBCDIC string arguments and results to Unicode
the implicit translation of EBCDIC string arguments and results to Unicode
Line 666: Line 723:
and that it also uses the following <var>[[UNICODE command|UNICODE]]</var> commands
and that it also uses the following <var>[[UNICODE command|UNICODE]]</var> commands
to allow for the codepage 1047 square bracket characters:
to allow for the codepage 1047 square bracket characters:
<pre>
<p class="code">UNICODE Table Standard Trans E=AD To U=005B
    UNICODE Table Standard Trans E=AD To U=005B
UNICODE Table Standard Trans E=BD To U=005D
    UNICODE Table Standard Trans E=BD To U=005D
</p>
</pre>
   
   
These <var>UNICODE</var> commands cause uninvertible translations.
These <var>UNICODE</var> commands cause uninvertible translations.
For example, by the first command, EBCDIC X'AD' translates to U+005B, but
For example, by the first command, EBCDIC X'AD' translates to U+005B, but
by the definition of codepage 0037, U+005B translates to EBCDIC X'BA'.
by the definition of codepage 0037, U+005B translates to EBCDIC X'BA'.
Consequently, you can add a X'AD' character to an <var>XmlDoc</var>, but if you display
Consequently, you can add a X'AD' character to an <var>XmlDoc</var>, but if you display its value:
its value:
<p class="code">%nod:AddElement('leftSquare', 'AD':X)
<pre>
Print %nod:Value('leftSquare'):StringToHex
    %nod:AddElement('leftSquare', 'AD':X)
</p>
    Print %nod:Value('leftSquare'):StringToHex
</pre>
   
   
You get the following result:
You get the following result:
<pre>
<p class="output">BA
    BA
</p>
</pre>
   
   
The <var>Value</var> method returns the Unicode character U+005B, which is translated
The <var>Value</var> method returns the Unicode character U+005B, which is translated
Line 691: Line 744:
   
   
In version 7.5, because <var>XmlDoc</var> strings are stored in EBCDIC,
In version 7.5, because <var>XmlDoc</var> strings are stored in EBCDIC,
no implicit translation is performed, and
no implicit translation is performed, and the result of the above two statements is:
the result of the above two statements is:
<p class="output">AD
<pre>
</p>
    AD
<p class="note"><b>Note:</b> Model&nbsp;204 7.6 maintenance added left and right square bracket XHTML entities to reduce your concern about [[Unicode#sqbrackets|codepages and square brackets]]. For an example, see the [[UnicodeAfter (Unicode function)#Examples|UnicodeAfter method]].</p></li>
</pre>
 
<li>The <var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var> subroutine
<li>The <var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var> subroutine
is equipped to display the Unicode values that are stored in <var>XmlDoc</var>s,
is equipped to display the Unicode values that are stored in <var>XmlDoc</var>s,
Line 706: Line 759:
the <var>Print</var> <var>CharacterEncodeAll</var> option is required to display
the <var>Print</var> <var>CharacterEncodeAll</var> option is required to display
a character reference and avoid request cancellation.
a character reference and avoid request cancellation.
<li>As described further in [[Unicode#Implicit Unicode conversions|"Implicit Unicode conversions"]],
 
<li>As described further in [[Unicode#Implicit Unicode conversions|Implicit Unicode conversions]],
the <var class="product">User Language</var> <var>Print</var>
the <var class="product">User Language</var> <var>Print</var>
statement under <var class="product">Sirius Mods</var> 7.6 does not cancel the request
statement under <var class="product">Sirius Mods</var> 7.6 does not cancel the request
Line 718: Line 772:
The following statement succeeds because the <var>Print</var> statement
The following statement succeeds because the <var>Print</var> statement
can handle untranslatable Unicode characters:
can handle untranslatable Unicode characters:
<pre>
<p class="code">Print %nodeY:Value
    Print %nodeY:Value
</p>
</pre>
   
   
The result under <var class="product">Sirius Mods</var> 7.5 is a request cancellation.
The result under <var class="product">Sirius Mods</var> 7.5 is a request cancellation.
The result under <var class="product">Sirius Mods</var> 7.6 is:
The result under <var class="product">Sirius Mods</var> 7.6 is:
<pre>
<p class="code">&amp;#x2122;
    &amp;#x2122;
</p>
</pre>
   
   
However, the following common operation using the <var>StringToHex</var> method
However, the following common operation using the <var>StringToHex</var> method
with <var>Value</var> does '''not''' succeed:
with <var>Value</var> does '''not''' succeed:
<pre>
<p class="code">Print %nodeY:Value:stringToHex
    Print %nodeY:Value:stringToHex
</p>
</pre>
   
   
When <var>StringToHex</var> attempts to implicitly convert to EBCDIC the Unicode character
When <var>StringToHex</var> attempts to implicitly convert to EBCDIC the Unicode character
Line 743: Line 794:
you can use the <var>[[UnicodeToUtf16 (Unicode function)|UnicodeToUtf16]]</var> function
you can use the <var>[[UnicodeToUtf16 (Unicode function)|UnicodeToUtf16]]</var> function
to encode the Unicode character as a UTF-16 string for input to <var>StringToHex</var>:
to encode the Unicode character as a UTF-16 string for input to <var>StringToHex</var>:
<pre>
<p class="code">Print %nodeY:Value:unicodeToUtf16:stringToHex
    Print %nodeY:Value:unicodeToUtf16:stringToHex
</p>
</pre>
</ul>
</ul>
   
   
For more information about the characters that are valid
For more information about the characters that are valid in an <var>XmlDoc</var> API XML document, see
in an <var>XmlDoc</var> API XML document, see
[[XML_processing_in_Janus_SOAP#ISO-10646_and_EBCDIC_characters|XML processing in Janus_SOAP: ISO-10646 and EBCDIC characters]]
[[XML_processing_in_Janus_SOAP#ISO-10646_and_EBCDIC_characters|"XML processing in Janus_SOAP: ISO-10646 and EBCDIC characters"]]
 
====Using Longstrings or Unicode instead of Strings====
===Using Longstrings or Unicode instead of Strings===
Either the <var>[[Unicode#The_User_Language_Unicode_type|Unicode]]</var> or
Either the <var>[[Unicode#The_User_Language_Unicode_type|Unicode]]</var> or
<var>[[Longstrings|Longstring]]</var> datatype
<var>[[Longstrings|Longstring]]</var> datatype provides an atomic type that can contain a string longer than 255 bytes.
provides an atomic type that can contain a string longer than 255 bytes.
The <var>XmlDoc</var> API methods, like all <var class="product">[[Janus SOAP]]</var> methods, accept strings longer than 255
The <var>XmlDoc</var> API methods, like all <var>[[Janus SOAP]]</var> methods, accept strings longer than 255
whenever they have a string argument or result, which is to say:
whenever
they have a string argument or result, which is to say:
<ul>
<ul>
<li>Input values may exceed 255 bytes in length.
<li>Input values may exceed 255 bytes in length.
<li>Various <var>XmlDoc</var> API methods will return a string longer than 255 bytes,
<li>Various <var>XmlDoc</var> API methods will return a string longer than 255 bytes,
if indeed the result value exceeds 255 bytes.
if indeed the result value exceeds 255 bytes.
Line 765: Line 814:
   
   
The following subsections provide some guidelines to determine when
The following subsections provide some guidelines to determine when
you 'must' use a longstring (or <var>Unicode</var>, as of <var class="product">Sirius Mods</var> version 7.6)
you ''must'' use a longstring (or <var>Unicode</var>, as of <var class="product">Sirius Mods</var> version 7.6)
%variable or context for a string argument or for
%variable or context for a string argument or for
the result of a method in the <var>XmlDoc</var> API.
the result of a method in the <var>XmlDoc</var> API.
Line 772: Line 821:
255</code> %variable, it is recommended that you use a <var>Longstring</var> or <var>Unicode</var> in the
255</code> %variable, it is recommended that you use a <var>Longstring</var> or <var>Unicode</var> in the
<var>XmlDoc</var> API methods wherever you might be using a <code>String Len 255</code> %variable.
<var>XmlDoc</var> API methods wherever you might be using a <code>String Len 255</code> %variable.
=====Xml and Serial methods=====
 
====Xml and Serial methods====
You should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the result
You should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the result
of the <var>Xml</var> or <var>Serial</var> methods &mdash; the total concatenated length of all
of the <var>Xml</var> or <var>Serial</var> methods &mdash; the total concatenated length of all
Line 779: Line 829:
Thus, the first invocation of the <var>Xml</var> method below will never fail (for length
Thus, the first invocation of the <var>Xml</var> method below will never fail (for length
reasons) but the second will usually cause a request cancellation:
reasons) but the second will usually cause a request cancellation:
<pre>
<p class="code">%ls Longstring
    %ls Longstring
%ls = %doc:Serial
    %ls = %doc:Serial
%ss String Len 255
    %ss String Len 255
%ss = %doc:Serial
    %ss = %doc:Serial
</p>
</pre>
 
=====Value[Default] methods=====
====Value[Default] methods====
Usually you should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the
Usually you should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the
result of the <var>Value</var> or <var>ValueDefault</var> methods.
result of the <var>Value</var> or <var>ValueDefault</var> methods.
For example,
For example, the first two invocations of the <var>Value</var> method below will succeed
the first two invocations of the <var>Value</var> method below will succeed
but the third will cause a request cancellation:
but the third will cause a request cancellation:
<pre>
<p class="code">%ss String Len 255
    %ss String Len 255
%ls Longstring
    %ls Longstring
%doc:LoadXml('<top> <big>' With -
    %doc:LoadXml('<top> <big>' With -
  $Lstr_Left('a', 300) With '</big> <little>' -
      $Lstr_Left('a', 300) With '</big> <little>' -
  With 'Less than 256 chars</little> </top>')
      With 'Less than 256 chars</little> </top>')
%ls = %doc:Value('/top/big')
    %ls = %doc:Value('/top/big')
%ss = %doc:Value('/top/little')
    %ss = %doc:Value('/top/little')
%ss = %doc:Value('/top/big')
    %ss = %doc:Value('/top/big')
</p>
</pre>
   
   
As noted above, the best approach here is to use <var>Longstring</var> or <var>Unicode</var> %variables
As noted above, the best approach here is to use <var>Longstring</var> or <var>Unicode</var> %variables
where you might use <code>String Len 255</code> %variables.
where you might use <code>String Len 255</code> %variables.
=====URI-related methods=====
 
Besides the <var>Xml</var> and <var>Value</var> methods,
====URI-related methods====
other <var>XmlDoc</var> API methods either cannot return a value longer
Besides the <var>Xml</var> and <var>Value</var> methods, other <var>XmlDoc</var> API methods either cannot return a value longer
than 255 bytes or, with typical XML documents, are unlikely to do so.
than 255 bytes or, with typical XML documents, are unlikely to do so.
If you have a namespace URI that exceeds 255 bytes,
If you have a namespace URI that exceeds 255 bytes,
it may be necessary to use a <var>Longstring</var> or <var>Unicode</var> %variable.
it may be necessary to use a <var>Longstring</var> or <var>Unicode</var> %variable.
   
   
For example,
For example, the first two invocations of the <var>Uri</var> method below will succeed,
the first two invocations of the <var>Uri</var> method below will succeed,
but the third will cause a request cancellation:
but the third will cause a request cancellation:
<pre>
<p class="code">%ss String Len 255
    %ss String Len 255
%ls Longstring
    %ls Longstring
%doc:LoadXml('<top><a:inner xmlns:a="urn:' With -
    %doc:LoadXml('<top><a:inner xmlns:a="urn:' With -
$Lstr_Left('big', 300, '_') With '"/></top>')
    $Lstr_Left('big', 300, '_') With '"/></top>')
%ss = %doc:URI('*')
    %ss = %doc:URI('*')
%ls = %doc:URI('*/*')
    %ls = %doc:URI('*/*')
%ss = %doc:URI('*/*')
    %ss = %doc:URI('*/*')
</p>
</pre>
   
   
As noted above, the easiest approach here is to use <var>Longstring</var> or <var>Unicode</var>
As noted above, the easiest approach here is to use <var>Longstring</var> or <var>Unicode</var>
%variables where you might use <code>String Len 255</code> %variables.
%variables where you might use <code>String Len 255</code> %variables.
   
   
===Conventions and terminology for XmlDoc API methds===
==Conventions and terminology for XmlDoc API methods==
In addition to those described in [[Notation conventions for methods|"Notation conventions for methods"]],
In addition to those described in [[Notation conventions for methods]],
the following conventions are also used in the individual
the following conventions are also used in the individual <var>XmlDoc</var> API method descriptions:
<var>XmlDoc</var> API method descriptions:
<ul>
<ul>
<li>Symbols used in the syntax include the following.
<li>Symbols used in the syntax include the following.
Usually, they represent method objects; in actual code, they may be
Usually, they represent method objects; in actual code, they may be
replaced by object variables
replaced by object variables of the indicated class or by method invocations that return
of the indicated class or by method invocations that return
such object variables:
such object variables:
<dl>
<dl>
Line 842: Line 886:
that operate on a node and that can be used with either an <var>XmlNode</var> or an <var>XmlDoc</var>.
that operate on a node and that can be used with either an <var>XmlNode</var> or an <var>XmlDoc</var>.
If an <var>XmlDoc</var>, the node for the operation is the root node.
If an <var>XmlDoc</var>, the node for the operation is the root node.
<dt>doc
<dt>doc
<dd>Denotes an object of class <var>XmlDoc</var>.
<dd>Denotes an object of class <var>XmlDoc</var>.
<dt>nod
<dt>nod
<dd>Denotes an object of class <var>XmlNode</var>.
<dd>Denotes an object of class <var>XmlNode</var>.
<dt>nodl
<dt>nodl
<dd>Denotes an object of class <var>XmlNodelist</var>.
<dd>Denotes an object of class <var>XmlNodelist</var>.
</dl>
</dl>
<li>Although the terms &ldquo;XmlNode&rdquo; and &ldquo;node&rdquo; are closely related,
 
effort is made to distinguish them as necessary in the method descriptions.
<li>Although the terms "XmlNode" and "node" are closely related, effort is made to distinguish them as necessary in the method descriptions.
An <var>XmlNode</var> is an object that points to a node in an <var>XmlDoc</var>.
An <var>XmlNode</var> is an object that points to a node in an <var>XmlDoc</var>.
Similarly, an <var>XmlNodelist</var> is an object that contains a set, or list, of
Similarly, an <var>XmlNodelist</var> is an object that contains a set, or list, of
<var>XmlNode</var>s selected from a particular <var>XmlDoc</var>.
<var>XmlNode</var>s selected from a particular <var>XmlDoc</var>.
Strictly speaking, a &ldquo;nodelist&rdquo; does not exist, but
Strictly speaking, a "nodelist" does not exist, but the term is occasionally used as an abbreviation or generalization of <var>XmlNodelist</var>.
the term is occasionally used as an abbreviation or generalization of <var>XmlNodelist</var>.
 
<li>Null objects, null strings, empty results
<li>Null objects, null strings, empty results
<ul>
<ul>
<li>A <var>Null</var> object is one that has been deleted or that has not been
<li>A <var>Null</var> object is one that has been deleted or that has not been
instantiated.
instantiated.
A &ldquo;null&rdquo; string is a zero length string value.
A "null" string is a zero-length string value. The text in the method descriptions distinguishes these two terms.
The text in the method descriptions distinguishes these two terms.
 
<li>Object-type arguments must not be <var>Null</var>, unless that
<li>Object-type arguments must not be <var>Null</var>, unless that argument explicitly allows <var>Null</var>.
argument explicitly allows <var>Null</var>.
Hence, a <var>Null</var> argument typically causes a request cancellation.
Hence, a <var>Null</var> argument typically causes a request cancellation.
Currently, no <var>XmlDoc</var> API methods allow <var>Null</var> object arguments,
Currently, no <var>XmlDoc</var> API methods allow <var>Null</var> object arguments,
and the &ldquo;Request Cancellation Errors&rdquo; section for each method does not
and the "Request Cancellation Errors" section for each method does not include this condition.
include this condition.
 
<li>Some methods that have an XPath argument allow the result of the XPath
<li>Some methods that have an XPath argument allow the result of the XPath
expression to be the empty set of nodes; most, however, will cancel the request
expression to be the empty set of nodes; most, however, will cancel the request if this happens.
if this happens.
Each method that has an XPath argument will either list the empty XPath result
Each method that has an XPath argument will either list the empty XPath result
as a request cancellation error, or will explain the operation of the method
as a request cancellation error, or will explain the operation of the method
Line 876: Line 921:
</ul>
</ul>
</ul>
</ul>
==See also==
<ul>
<li>[[XPath]]
</ul>
<!-- end of toclimit div -->
[[Category:Overviews]]
[[Category:Overviews]]
[[Category:Janus SOAP]]

Latest revision as of 19:39, 13 May 2016

XmlDoc API concepts and data structures

See also:

The XmlDoc API is based on the use of XML documents. XML processing in Janus SOAP and various XML references explain that an XML document can contain any type of data, so an XML document may not be primarily intended for human reading. Nevertheless, an XML document can be simply and meaningfully expressed or represented entirely with readable characters. This character form of an XML document is called the serial form. When operating on an XML document with the XmlDoc API, the serial form is converted to an XmlDoc object.

Only a few categories of operations are needed on XML documents; one way to structure them is:

Receive Receive the transmitted text of a document and convert it into an XmlDoc, which uses nodes to represent the hierarchy of the XML document.
Update Update or create an XmlDoc, by adding, deleting, copying, or replacing nodes.
Access Access nodes in an XmlDoc, and data contained within them.
Send Convert an XmlDoc into a textual representation, and transmit it.
Other There are other operations, such as XmlDoc properties to control certain operations, data structure housekeeping, and debugging facilities.

The remainder of this article describes the objects used to operate on XML documents. It reviews the above categories of operations, showing how they are accommodated by the XmlDoc API classes: XmlDoc, XmlNode, and XmlNodelist. The objects are operated upon by methods that are members of these classes.

Typical operations on an XML document

This section list the categories of operations on an XML document, and provides the motivation for objects in the XmlDoc API: XmlDocs, XmlNodes, and XmlNodelists.

Receive or Load

The process of receiving a document actually consists of two steps:

  1. Receiving the document text using some “transport” mechanism, such as Janus Web Server (HTTP, as server) Janus Sockets (usually, HTTP, as client), Model 204 MQ Series, access from a file, etc.
  2. Converting the XML document (deserialization) into its internal representation (an XmlDoc) so that other operations can be performed on it.

If the XML document is received by Janus Web Server, these steps are performed together by the WebReceive function. For the HTTP Helper, the document text is received by the HttpRequest Get, Post, or Send function, and the deserialization is done by the HttpResponse ParseXml function. For other forms of transport, the steps are performed separately: the text form of the document is received into a Longstring, and the Longstring contents are converted into internal form by the LoadXml function.

See the discussion about sending and receiving, which mentions other details about the operations ("Receive", "Load", etc.) that create the initial content of an XmlDoc.

Update

You modify an XmlDoc using various XmlDoc API methods. If you start with an empty XmlDoc, some methods (the Add* and Insert* methods for various node types) allow you to generate an XmlDoc "directly", without first representing it in the serial text form. You can also update an XmlDoc into which you have received a document.

See the section on updating for a more detailed overview of XmlDoc updating operations.

Access

Since an XML document is a hierarchical structure, your application will need to select some part of the hierarchy to operate upon, for example, to obtain its value. Various XmlDoc API methods do this. In addition, some XmlDoc API updating methods also require that you specify where in the hierarchy an update is performed.

Selecting nodes from an XmlDoc is performed using the XPath language, introduced in XML Path Language (XPath). XPath can be used for accessing a single node in the document, for example, getting an element node's string value using the Value property. You can also work with lists of selected nodes, represented by XmlNodelists. The SelectNodes function produces such a list. Other XmlNodelist methods also work with them, including the Item function, which gets a single XmlNode from an XmlNodelist. SelectSingleNode returns an XmlNode, as do most of the Add* and Insert*Before methods.

Send

The process of sending a document actually consists of two steps:

  1. Converting the XmlDoc into its serial text representation.
  2. Sending the document text using some "transport"; mechanism, such as Janus Web Server (HTTP, as server), Janus Sockets (usually, HTTP, as client), Model 204 MQ Series, access from a file, etc.

If the XML document is sent by Janus Web Server, the steps can be performed together by the WebSend subroutine. For the HTTP Helper, the document is serialized by the HttpRequest AddXml subroutine, and the document is sent by the HttpRequest Get, Post, or Send function. For other forms of transport, the steps are performed separately: the XmlDoc is converted into external form by the Serial function, and the converted result is sent using the appropriate transport.

Other operations

Some other operations the XmlDoc API methods perform include:

  • Creating and initializing an XmlDoc or XmlNodelist.
  • Setting or retrieving some property of an XmlDoc, for example, the URI associated with a prefix to be used in an XPath expression (see SelectionNamespace).
  • Displaying a document, or some part of it, usually for debugging purposes (see Print).

The XmlDoc class

An XmlDoc object is the internal representation of an XML document; creating one is usually done by invoking the XmlDoc New constructor, which returns an XmlDoc instance. An XmlDoc is a tree structure of nodes. The types of nodes that an XmlDoc may contain are shown in the following subsection.

XmlDoc node types

An XmlDoc is a tree structure of nodes. The possible node types are listed here (these are the enumeration return values of the Type function):

Attribute
This type of node is used to represent an attribute of an XML element.
Comment
This type of node is used to represent a comment (serialized in the form: <!--comment-->) in an XML document.
Root
This type of node is the root of the XmlDoc tree. It has zero or one Element child nodes and any number of Comment and Pi child nodes. Root and Element nodes are the only nodes that can have child nodes.
Element
This type of node is used to represent an element in an XML document. Element and Root nodes are the only nodes that can have child nodes.
Pi
This type of node is used to represent a processing instruction (<?target ...?>) in an XML document.

Note: Although the "XML declaration" (<?xml version=...?>) has the same appearance as a processing instruction, it is not a Pi.

Also, note that the values of an XmlDoc's XML declaration can be obtained and set with these properties: Version, Encoding, and Standalone.

Text
This type of node is used to represent character content within an XML element. Note that a Text node will never contain the null string, and that two Text nodes can be adjacent only if the AdjacentText property is set to allow it.

The XmlDoc node types listed above correspond almost exactly with the structures contained in an XML document (see XML and XML example). The Root node, always present, corresponds to the node that contains the document as a whole. You can insert additional nodes, either by deserializing a character stream containing an XML document instance (for example, with WebReceive), or by using Add*/Insert*Before methods to insert nodes. The children of the Root node are the "top-level" element and any top-level processing instructions and/or comments that precede or follow it. Do not confuse the Root node, which is the root of the XmlDoc tree, with the top-level element of the XML document.

XmlDoc states

An XmlDoc can have one of the following three states:

EMPTY
An XmlDoc in this state has no nodes other than the Root node. This is the state of an XmlDoc as returned by the XmlDoc New method.
WELL-FORMED
An XmlDoc in this state contains at least the top-level Element node.
Non-EMPTY not WELL-FORMED
An XmlDoc in this state contains at least one Comment or Pi node but no Element nodes.

Note that only an XmlDoc in the WELL-FORMED state may be converted into a complete text representation of an XmlDoc, and that you can only use an EMPTY XmlDoc as the target of “deserializing” the text representation of an XML document.

The XmlNode and XmlNodelist classes, and XPath

In addition to using an XmlDoc directly, you can access an XmlDoc with either of the following objects:

  • An XmlNode, which is a single pointer to a node in an XmlDoc
  • An XmlNodelist, which contains a list of pointers to nodes selected from an XmlDoc

Instances of both of these objects are created by and returned as the value of several XmlDoc API functions. An XmlNodelist may also be created by an invocation of the XmlNodelist New constructor, which requires the specification of an XmlDoc argument — the XmlDoc with which the XmlNodelist is associated. There is not a New constructor in the XmlNode class.

A single XmlDoc can have any number of XmlNodelists and XmlNodes associated with it.

Most operations on the "contents" of an XmlDoc select one or more nodes using XPath expressions ("PathExpr" is the XPath syntax term, as explained in XPath syntax. All methods that accept an XPath LocationPath expression argument are members of both the XmlDoc and the XmlNode classes.

There are two forms of XPath expressions:

Absolute XPath expression
An absolute XPath expression selects nodes from an XmlDoc, starting at the Root node. The syntax of an absolute XPath expression begins with a forward slash (/).
Relative XPath expression
A relative XPath expression selects nodes from an XmlDoc, starting from a context node which is determined when the expression is used. The syntax of a relative XPath expression begins with a character other than a slash. When you use a relative XPath expression, the context node depends on the method object (the type of object on which the method operates) of the invocation:
  1. If the method object is an XmlDoc, the context node is the Root node.
  2. If the method object is an XmlNode, the context node is the node which it points to.

In addition to operating on the contents of an XmlDoc, there are several methods (for example, WebReceive) that operate on the XmlDoc as a whole. These methods only allow an XmlDoc method object. If you need to obtain the XmlDoc associated with an XmlNode or XmlNodelist, use the XmlDoc function.

The following section continues the explanation of XPath, XmlNodes, and XmlNodelists. Further information about XPath expressions and node sets is also contained in XPath.

An example of XmlDoc API methods and XPath

This section illustrates a small XML document received as a web request, followed by part of a User Language request that uses some XmlDoc API methods, with particular attention to the method's XPath arguments.

Here is the XML document:

<purchase_order> <date>25 July, 2001</date> <pitm> <partnum>1234</partnum> <qty>3</qty> </pitm> <pitm> <partnum>5678</partnum> <qty>2</qty> </pitm> </purchase_order>

Here is some User Language which could be used to receive and process this request:

%doc Object XmlDoc %nl Object XmlNodelist * Create XmlDoc, get web request as contents: %doc = New %doc:WebReceive * Create work nodelist with all pitm elements: %nl = %doc:SelectNodes('/purchase_order/pitm') * Process each pitm: For %j From 1 To %nl:Count %partnum = %nl(%j):Value('partnum') %qty = %nl(%j):Value('qty') ... End For

Value and SelectNodes, like many methods in the XmlDoc API, have an optional argument that allows you to process any of the nodes in an XmlDoc, rather than the default, which is to process the node to which the method object points.

The optional argument shown above is an XPath expression (for SelectNodes, /purchase_order/pitm; for Value, partnum and qty). An XPath expression selects a list of nodes, starting either from the XmlDoc Root (when an absolute Xpath expression is used) or from a particular context node in an XmlDoc (when a relative Xpath expression is used). Syntactically, an XPath expression that begins with a slash (/) is absolute.

SelectNodes returns the entire result of its XPath expression argument. Many other XmlDoc API methods, however, operate on the first of the nodes resulting from the argument's XPath expression. That first node is called the head of the argument XPath result. Note that first is defined in terms of "document order" (see Order of nodes: node sets versus nodelists).

Updating

Updating an XmlDoc generally refers to the addition and deletion of the nodes of the XmlDoc tree, which includes the generation of the document's initial contents. The initial contents of an XmlDoc can be established by one of the deserialization methods: LoadXml, WebReceive, or ParseXml. Whether you use a method to set the "initial" contents of an XmlDoc or whether you start with an EMPTY XmlDoc, you can then insert nodes into it, using one or more of the methods whose name begins with "Add", such as AddElement, or whose name begins with "Insert", such as InsertSubtreeBefore.

Once an XmlDoc has one or more nodes in addition to the Root node, you can modify the Value of Text and other nodes, and you can delete nodes from it using DeleteSubtree.

16M limit to number of XmlDoc items

Internally, an XmlDoc object is maintained in a data structure that has a maximum of 16M items. Each node requires an item, as does each unique string. A string item is used, for example, as the name of an element, attribute, PI node, or namespace, or as the value of a comment, PI, attribute, or text node, or as a namespace URI. Items are also used to maintain the SelectionNamespace property and for other internal purposes.

Exceeding the XmlDoc item limit (16M) in an updating operation causes request cancellation. Such operations include, for example, the Receive, Add, and Insert families of methods, as well as changing the Value of an XmlNode, and so on.

Update operations that cause deletion of items from an XmlDoc (for example, DeleteSubtree, and replacing the Value of an XmlNode) do not, in general, make those items available for reuse, at least in versions 7.5 of Model 204 and earlier.

Inserting nodes and copying subtrees

The Add* methods are designed to make it easy to "append" nodes to an XmlDoc in a "depth-first, left-to-right" order in the simple case. These methods insert a node as the last child of the node pointed to by the method object.

Most of the Add* methods (for example AddElement) have Insert*Before counterparts (for example, InsertElementBefore) which insert a node in a position other than the last child of an Element or the Root. AddAttribute and AddNamespace are the exceptions, without an Insert*Before counterpart.

AddElement (as does InsertElementBefore) has an optional text value argument, with which you can insert a Text node child of the inserted Element node.

Here is an example of the updating methods in the XmlDoc API:

%doc Object XmlDoc %doc = New %story Object XmlNode %paragraph Object XmlNode %story = %doc:AddElement('story') %story:AddComment('My first XML document') %story:AddElement('greeting', 'Hello, world') %paragraph = %story:AddElement('paragraph') %paragraph:AddElement('line', 'Ask not what') %paragraph:AddElement('line', 'Hear no evil')

This creates the following XML document:

<story> <greeting>Hello, world!</greeting> <paragraph> <line>Ask not what</line> <line>Hear no evil</line> </paragraph> </story>

As discussed above in the introduction to the section on updating, the number of items in an XmlDoc is limited to 16M items; exceeding this number in any updating operation causes request cancellation.

Namespaces with Add* and Insert* methods

When an XmlDoc is deserialized, the namespace declarations (and the use of those declarations by names in the document) follow the scope rules outlined in Name and namespace syntax. The namespace structure that results from the updating (adding and removing nodes) of an XmlDoc is explained here. Most of these updating methods are the Add* and Insert*Before methods (described individually in the List of XmlDoc API methods).

In the following discussion, all references to an Add* method (for example, AddElement) refer equally to the corresponding Insert*Before method (for example, InsertElementBefore).

  • From deserialization through serialization, the XmlDoc API enforces the syntax rules of well-formed documents. When namespace handling is in effect for an XmlDoc (that is, the Namespace property setting is On, the default), the prefixes of names of nodes you add must be declared.
  • Once an attribute or element is added, its URI is fixed and will not change thereafter.
  • To allow AddElement to add an element with a prefix/URI declaration other than that which is in scope, it has a URI argument, which specifies the URI of the element. If the URI argument is used and the resulting prefix/URI combination requires a namespace declaration (because it differs from what is in scope), a declaration is created at the element.
  • The URI argument of AddElement can also be such that the application logic around AddElement does not need to depend on whether the prefix/URI declaration is already in scope. AddAttribute has a URI argument for the same reason, and it can also result in the creation of a namespace declaration at the Element parent of the inserted Attribute node.
  • As an alternative to the URI argument on AddElement and AddAttribute, a namespace declaration is created at an element with the AddNamespace method. AddNamespace also lets you insert a namespace declaration that is not used by an element nor any of its attributes. For example, the following fragment:

    %n = %doc:AddElement('a') %n:AddNamespace('x', 'y:z') %n:AddElement('x:b') %n:AddElement('x:c')

    Creates the following document:

    <a xmlns:x="y:z"> <x:b/> <x:c/> </a>

  • When the AddSubtree method copies between different XmlDocs, it does not allow the source and target XmlDocs to have different Namespace property settings (doing so could require re-parsing names in some cases).
  • The namespace axis is not allowed in an XPath expression. Without XPath access to a pointer to a namespace node, the namespace clearly cannot be changed nor deleted, so the URIs associated with nodes cannot be "changed out from under". You can obtain the information in a namespace declaration, however, using certain XmlDoc API methods (for example, the Uri property of an XmlNode).

Deleting nodes

The XmlDoc API subroutine used to delete individual nodes (and their descendants) from an XmlDoc is DeleteSubtree. An example is shown below.

If a node you delete was referenced by an XmlNode, the value of that XmlNode becomes Null. Similarly, if any deleted nodes were referenced by an item of an XmlNodelist, the value of that item of the XmlNodelist becomes Null.

If you need to "clean up" an XmlNodelist that refers to a deleted XmlDoc node, you can use the Difference function, as shown in the following example.

In the example, the %nlis XmlNodelist is cleaned up (with Difference) to remove the items which have become null. The final reference to items in %nlis uses relative XPath (the '@auth' argument of the Value method invocation, which selects the attribute named auth).

%nlis is object xmlNodelist %removeLis is object xmlNodelist * Get nodelist for chapters, and delete all * chapters by indicated author from the XmlDoc: %deleteAuth = 'Dave' %nlis = %d:selectNodes('/book/chapter') %removeLis = %d:selectNodes('/book/chapter[@auth="' %deleteAuth '"]') for %i from 1 to %removeLis:count %removeLis(%i):deleteSubtree end for * Cleanup the chapter nodelist, show author of * remaining chapters, & display the document: %nlis = %nlis:difference(%removeLis) for %i from 1 to %nlis:count print 'Author:' and %nlis(%i):value('@auth') end for %d:print ...

Also note that in the above example, the Item method is implicitly used (%removeLis(%i) and %nlis(%i)) and that implicit concatenation is used in the second XPath expression ('/book/chapter[@auth="' %deleteAuth '"]').

As discussed in the introduction to the section on updating, the items deleted by DeleteSubtree are not available for reuse by later Add, Insert, etc. methods, so DeleteSubtree does not "relieve" the restriction of 16M items in an XmlDoc.

Namespace URI for XPath prefixes

It is important to realize that the URI associated with a prefix in the XML document is controlled by the xmlns namespace declarations in the document. However, when a prefix is used in a name in an XPath argument to an XmlDoc API method, the URI for that prefix must be established so that the full XPath name (local part and URI namespace) can be used to locate a document element. This association of XPath prefixes to URIs is established using the SelectionNamespace property.

The prefix names used in Xpath selection are independent of the prefix names used in the document serialization and deserialization. Since an XML document element prefix may be associated with multiple namespaces, or an element may have a namespace and no associated prefix, XPath prefixes stipulate the namespace that fully qualifies an element name in an XPath location step.

Transport: receiving and sending XML

Distinct sets of methods provide for the receiving and sending of XML:

  • Receiving XML involves converting the information from the character, marked-up form of an XML document into an XmlDoc; this operation is called deserialization.
    • The WebReceive function is designed to receive an XML document that has arrived as a web request.
    • The ParseXml function of the HttpResponse class accomplishes this for HTTP clients.
    • For other transport mechanisms, such as Model 204 MQ Series, the character-format XML document can be placed into a Longstring, and the LoadXml function then places the information into an XmlDoc.
  • Sending XML involves converting the information in an XmlDoc to a character stream, including markup such as element tags; this operation is called serialization.
    • The WebSend subroutine is designed to send an XML document as a web response.
    • The AddXml method of the HttpRequest class accomplishes this for HTTP clients.
    • For other transport mechanisms, such as Model 204 MQ Series, the Serial function is used to place the serialized form into a Longstring, which can then be sent.

Information form and content

The Janus SOAP deserialization methods receive and convert the character form of XML document data into an XmlDoc, which precisely represents the information content of that XML data. They do not preserve those aspects of the character form that are incidental to the information content, such as whether a character reference or an empty-element tag was present. The serialization methods reproduce and send the XmlDoc content, and they offer many global options for how that content is represented on output as a string of characters.

For example, the following two serializations represent exactly the same information content:

<top><foo/>&#x31;</top> <top><foo></foo>1</top>

In the first string, foo is serialized using the empty element tag <foo/>, and the number 1 is serialized using a character reference &#x31;. In the second string, foo is serialized using a start tag (<foo>) and an end tag (</foo>), and the number 1 is serialized using character content. Neither of these outputs, however, is related to how the XML content was obtained, that is, deserialized.

A given piece of information content has many potential serializations, including variations in line ends, CDATA sections, quotation characters used in attributes, attribute and namespace declaration order, etc. And in general, the character form of an XML document before it is deserialized will not be the same after it is re-serialized. It is a user task to determine to what degree to update an XmlDoc or to select serialization options to produce the desired form of character string output of the XML document information.

For example, trying to obtain output like the second serialization in the example above (<top><foo></foo>1</top>), you specify literal start and end tags for foo in a SOUL deserialization method. But you discover that the default XmlDoc serialization result is <top><foo/>1</top>. One way (now deprecated) to produce the tag presentation you want in this case is to use the serialization method option (NoEmptyElt) that forces start and end tags for childless elements. But a better way is discussed in the next paragraph, using the NoEmptyElement property.

A related conversion issue arises when using SOUL to generate HTML. Because some browsers work correctly for certain childless elements (for example, <br> tags) only if they have an empty element tag, and for other childless elements (for example, <div> tags) only if they have separate start and end tags, the NoEmptyElt solution just mentioned will produce HTML that works for some elements but not for others. The resolution for this is not in the serialization or deserialization methods but rather in the updating methods described above. You can use the updating methods to build or modify the HTML elements in your XmlDoc, getting information content equivalent to the deserialization methods, but providing better control and access. For the issue in this example, the NoEmptyElement property would let you selectively apply the tag format needed for successful HTML, as shown in the following request fragment:

%html = %doc:addElement('html') %body = %html:addElement('body') %div = %body:addElement('div') %div:noEmptyElement = true %div:addAttribute('id', 'topOfBody') %body:addText('foo') %body:addElement('br') %body:addText('bar')

If you use the Print method to display the fragment above, the result is this XmlDoc (note the <div> and <br> tags):

<html> <body> <div id="topOfBody">

foo <br/> bar</body>

</html>

Using the updating methods to build HTML content is also superior to using something ostensibly simpler like this SOUL fragment to create output like that above:

Print '<html>' Print ' <body>' Print ' <div id="topOfBody">'

Print ' foo' Print ' <br/>' Print ' bar' Print ' </body>' Print '</html>'

This approach quickly breaks down, primarily because the places at which you compose the content of the XML document can be widely dispersed (the code not all nicely together like this), and because keeping track of the element end tags can become very difficult.

For more information about the character transformations that Janus SOAP applies during deserialization, see Normalization during deserialization.

For a summary of the various output formatting options available to the Janus SOAP serialization methods, see XmlDoc API serialization options.

About encoding

  • When the internal representation of an XmlDoc is EBCDIC (prior to Sirius Mods 7.6), the deserialization methods reject a document if it contains an ISO-10646 (Unicode) character that cannot be represented in EBCDIC. When the internal representation of an XmlDoc is Unicode (Sirius Mods 7.6 and higher), the deserialization methods by default reject a document if it contains a Unicode character that is not translatable to EBCDIC. See Char and Reference for more information about characters in an XML document.
  • The encodings that are accepted in the deserialization operations are UTF-8 and UTF-16, and ISO-8859-n (where n is a digit from 1 to 9). Prior to Sirius Mods Version 7.6, all of the ISO-8859-n variants are treated as ISO-8859-1. As of Sirius Mods 7.6, the variants determine Ascii to Unicode conversions according to the specification of the individual variant.

    Note: These encoding names must be specified in uppercase letters.

  • When the document is serialized, the result is EBCDIC or is in the UTF-8 encoding. Therefore, the only values permitted to be set for the Encoding property are UTF-8 and the null string; in that, see Encoding's Usage Notes for more information about the character sets allowed in a serialized input XML document and the value of encoding in an XML declaration.

Strings and Unicode with the XmlDoc API

As of Sirius Mods version 7.6, XmlDocs are maintained in Unicode rather than EBCDIC; this is true for all string values, names, prefixes, and URIs. As a consequence, most of the arguments and results of the XmlDoc API methods that formerly were strings or longstrings are Unicode strings as of version 7.6.

This switch to Unicode requires little or no change to most existing XmlDoc API applications, however: XmlDoc API argument and result variables declared as String or Longstring are automatically converted from EBCDIC to Unicode by the Sirius Mods. For example, the EBCDIC character strings in the arguments in a statement like the following are automatically converted to Unicode:

%d:AddElement('name', 'value')

Similarly, if the variable %str, below, was declared as type String or Longstring, then the Unicode result of the Value method is automatically converted to EBCDIC when it is stored in %str:

%str = %n:Value

The principal benefit of this switch to Unicode is conformance with the W3C XML standard, which defines "characters" in terms of Unicode characters (most of which are valid in XML documents). You can now store string values that are not translatable to EBCDIC — Sirius Mods 7.5 allows storage only of (most) non-null EBCDIC characters or of characters that translate to those EBCDIC characters.

The automatic EBCDIC/Unicode conversions described above will not cause request cancellations in requests that run successfully under Sirius Mods 7.5. But there are other changes to or effects on the XmlDoc API that are due to the switch to Unicode maintenance (the Sirius Mods Release 7.6 Notes and the individual method descriptions provide additional details):

  • The workaround (InvalidChar method) for accommodating nulls and EBCDIC characters that are not allowed by the XML standard is replaced by:
    • The AllowNull property can let nulls be stored in an XmlDoc.
    • A method argument (AllowUntranslatable) of the deserialization methods that lets you store Unicode characters that do not translate to EBCDIC. Such characters may be also stored directly by the Add* and Insert* methods of the XmlDoc API; these methods do not require a special argument. EBCDIC characters that do not translate to Unicode must be handled before they are passed to an XmlDoc update operation. For example, EBCDIC X'04' is the SEL ("Select") control character. Since there is no "Select" control character in Unicode, there is no mapping between EBCDIC X'04' and any Unicode character. For this you might use the Untranslatable parameter of the EbcdicToUnicode function.
  • If you have defined uninvertible translations, the implicit translation of EBCDIC string arguments and results to Unicode as of version 7.6 of the Sirius Mods will change the behavior of the XmlDoc API methods compared to their operation in version 7.5. For example, assume CCAIN establishes codepage 0037 as the base, and that it also uses the following UNICODE commands to allow for the codepage 1047 square bracket characters:

    UNICODE Table Standard Trans E=AD To U=005B UNICODE Table Standard Trans E=BD To U=005D

    These UNICODE commands cause uninvertible translations. For example, by the first command, EBCDIC X'AD' translates to U+005B, but by the definition of codepage 0037, U+005B translates to EBCDIC X'BA'. Consequently, you can add a X'AD' character to an XmlDoc, but if you display its value:

    %nod:AddElement('leftSquare', 'AD':X) Print %nod:Value('leftSquare'):StringToHex

    You get the following result:

    BA

    The Value method returns the Unicode character U+005B, which is translated implicitly to EBCDIC X'BA' as the string input for StringToHex.

    In version 7.5, because XmlDoc strings are stored in EBCDIC, no implicit translation is performed, and the result of the above two statements is:

    AD

    Note: Model 204 7.6 maintenance added left and right square bracket XHTML entities to reduce your concern about codepages and square brackets. For an example, see the UnicodeAfter method.

  • The Print subroutine is equipped to display the Unicode values that are stored in XmlDocs, even if the Unicode characters are not translatable to EBCDIC. If non-translatable Unicode characters are stored in XmlDoc Attribute or Element values, Print displays their XML hexadecimal character references. If non-translatable Unicode characters are stored in a context other than Element or Attribute (a name, Comment, or Pi), the Print CharacterEncodeAll option is required to display a character reference and avoid request cancellation.
  • As described further in Implicit Unicode conversions, the User Language Print statement under Sirius Mods 7.6 does not cancel the request if it is presented with a Unicode character that does not translate to EBCDIC. If it encounters an untranslatable Unicode character, Print will display an EBCDIC string that contains the character's hex encoding. As an example, consider the direct printing of the output of Value. Say the element node assigned to %nodeY contains the Unicode trademark character (U+2122), which does not translate to EBCDIC. The following statement succeeds because the Print statement can handle untranslatable Unicode characters:

    Print %nodeY:Value

    The result under Sirius Mods 7.5 is a request cancellation. The result under Sirius Mods 7.6 is:

    &#x2122;

    However, the following common operation using the StringToHex method with Value does not succeed:

    Print %nodeY:Value:stringToHex

    When StringToHex attempts to implicitly convert to EBCDIC the Unicode character passed to it by the Value function, the conversion fails because the character is not translatable to EBCDIC, and the request is cancelled. Such an implicit conversion, which simply uses the current Unicode translation tables, does not do character encoding.

    To avoid a request cancellation here and view the Value result, you can use the UnicodeToUtf16 function to encode the Unicode character as a UTF-16 string for input to StringToHex:

    Print %nodeY:Value:unicodeToUtf16:stringToHex

For more information about the characters that are valid in an XmlDoc API XML document, see XML processing in Janus_SOAP: ISO-10646 and EBCDIC characters

Using Longstrings or Unicode instead of Strings

Either the Unicode or Longstring datatype provides an atomic type that can contain a string longer than 255 bytes. The XmlDoc API methods, like all Janus SOAP methods, accept strings longer than 255 whenever they have a string argument or result, which is to say:

  • Input values may exceed 255 bytes in length.
  • Various XmlDoc API methods will return a string longer than 255 bytes, if indeed the result value exceeds 255 bytes.

The following subsections provide some guidelines to determine when you must use a longstring (or Unicode, as of Sirius Mods version 7.6) %variable or context for a string argument or for the result of a method in the XmlDoc API. Since the server table requirements and the processing overhead for Longstring or Unicode are just a little more than for a String Len 255 %variable, it is recommended that you use a Longstring or Unicode in the XmlDoc API methods wherever you might be using a String Len 255 %variable.

Xml and Serial methods

You should use a Longstring or Unicode %variable to hold the result of the Xml or Serial methods — the total concatenated length of all markup and character content in a document (or subtree, for Serial) — which will most likely exceed 255 bytes. Thus, the first invocation of the Xml method below will never fail (for length reasons) but the second will usually cause a request cancellation:

%ls Longstring %ls = %doc:Serial %ss String Len 255 %ss = %doc:Serial

Value[Default] methods

Usually you should use a Longstring or Unicode %variable to hold the result of the Value or ValueDefault methods. For example, the first two invocations of the Value method below will succeed but the third will cause a request cancellation:

%ss String Len 255 %ls Longstring %doc:LoadXml('<top> ' With - $Lstr_Left('a', 300) With ' <little>' - With 'Less than 256 chars</little> </top>') %ls = %doc:Value('/top/big') %ss = %doc:Value('/top/little') %ss = %doc:Value('/top/big')

As noted above, the best approach here is to use Longstring or Unicode %variables where you might use String Len 255 %variables.

URI-related methods

Besides the Xml and Value methods, other XmlDoc API methods either cannot return a value longer than 255 bytes or, with typical XML documents, are unlikely to do so. If you have a namespace URI that exceeds 255 bytes, it may be necessary to use a Longstring or Unicode %variable.

For example, the first two invocations of the Uri method below will succeed, but the third will cause a request cancellation:

%ss String Len 255 %ls Longstring %doc:LoadXml('<top><a:inner xmlns:a="urn:' With - $Lstr_Left('big', 300, '_') With '"/></top>') %ss = %doc:URI('*') %ls = %doc:URI('*/*') %ss = %doc:URI('*/*')

As noted above, the easiest approach here is to use Longstring or Unicode %variables where you might use String Len 255 %variables.

Conventions and terminology for XmlDoc API methods

In addition to those described in Notation conventions for methods, the following conventions are also used in the individual XmlDoc API method descriptions:

  • Symbols used in the syntax include the following. Usually, they represent method objects; in actual code, they may be replaced by object variables of the indicated class or by method invocations that return such object variables:
    nr
    Denotes an abstract class (short for “node reference”) for methods that operate on a node and that can be used with either an XmlNode or an XmlDoc. If an XmlDoc, the node for the operation is the root node.
    doc
    Denotes an object of class XmlDoc.
    nod
    Denotes an object of class XmlNode.
    nodl
    Denotes an object of class XmlNodelist.
  • Although the terms "XmlNode" and "node" are closely related, effort is made to distinguish them as necessary in the method descriptions. An XmlNode is an object that points to a node in an XmlDoc. Similarly, an XmlNodelist is an object that contains a set, or list, of XmlNodes selected from a particular XmlDoc. Strictly speaking, a "nodelist" does not exist, but the term is occasionally used as an abbreviation or generalization of XmlNodelist.
  • Null objects, null strings, empty results
    • A Null object is one that has been deleted or that has not been instantiated. A "null" string is a zero-length string value. The text in the method descriptions distinguishes these two terms.
    • Object-type arguments must not be Null, unless that argument explicitly allows Null. Hence, a Null argument typically causes a request cancellation. Currently, no XmlDoc API methods allow Null object arguments, and the "Request Cancellation Errors" section for each method does not include this condition.
    • Some methods that have an XPath argument allow the result of the XPath expression to be the empty set of nodes; most, however, will cancel the request if this happens. Each method that has an XPath argument will either list the empty XPath result as a request cancellation error, or will explain the operation of the method when the XPath result is the empty nodeset.

See also