XmlDoc API: Difference between revisions

From m204wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 52: Line 52:
The objects are operated upon by methods that are members of these classes.
The objects are operated upon by methods that are members of these classes.


===Typical operations on an XML document===
==Typical operations on an XML document==
This section list the categories of operations on an XML document, and
This section list the categories of operations on an XML document, and
provides the motivation for objects in the <var>XmlDoc</var>
provides the motivation for objects in the <var>XmlDoc</var>
Line 58: Line 58:


<div id="Receive"></div>
<div id="Receive"></div>
====Receive or Load====
===Receive or Load===
The process of receiving a document actually consists of two
The process of receiving a document actually consists of two
steps:
steps:
Line 83: Line 83:
See the discussion about [[#recvSendDetail|sending and receiving]], which mentions other details about the operations ("Receive", "Load", etc.) that create the initial content of an <var>XmlDoc</var>.
See the discussion about [[#recvSendDetail|sending and receiving]], which mentions other details about the operations ("Receive", "Load", etc.) that create the initial content of an <var>XmlDoc</var>.


====Update====
===Update===
You modify an <var>XmlDoc</var> using various [[List of XmlDoc API methods|XmlDoc API methods]].
You modify an <var>XmlDoc</var> using various [[List of XmlDoc API methods|XmlDoc API methods]].
If you start with an empty <var>XmlDoc</var>, some methods (the
If you start with an empty <var>XmlDoc</var>, some methods (the
Line 93: Line 93:
See the [[#Updating|section on updating]] for a more detailed overview of <var>XmlDoc</var> updating operations.
See the [[#Updating|section on updating]] for a more detailed overview of <var>XmlDoc</var> updating operations.


====Access====
===Access===
Since an XML document is a hierarchical structure, your application
Since an XML document is a hierarchical structure, your application
will need to select some part of the hierarchy to operate upon, for example,
will need to select some part of the hierarchy to operate upon, for example,
Line 115: Line 115:
<var>[[SelectSingleNode (XmlDoc/XmlNode function)|SelectSingleNode]]</var> returns an <var>XmlNode</var>, as do most of the <var>Add</var>* and <var>Insert</var>*<var>Before</var> methods.
<var>[[SelectSingleNode (XmlDoc/XmlNode function)|SelectSingleNode]]</var> returns an <var>XmlNode</var>, as do most of the <var>Add</var>* and <var>Insert</var>*<var>Before</var> methods.


====Send====
===Send===
The process of sending a document actually consists of two
The process of sending a document actually consists of two
steps:
steps:
Line 134: Line 134:
and the converted result is sent using the appropriate transport.
and the converted result is sent using the appropriate transport.
   
   
====Other operations====
===Other operations===
Some other operations the <var>XmlDoc</var> API methods perform include:
Some other operations the <var>XmlDoc</var> API methods perform include:
<ul>
<ul>
Line 148: Line 148:
</ul>
</ul>


===The XmlDoc class===
==The XmlDoc class==
<!-- on XmlDoc class page, #REDIRECT [[XmlDoc API#The XmlDoc class]] -->
<!-- on XmlDoc class page, #REDIRECT [[XmlDoc API#The XmlDoc class]] -->


Line 159: Line 159:
<div id="nodeTypes"></div>
<div id="nodeTypes"></div>


====XmlDoc node types====
===XmlDoc node types===
An <var>XmlDoc</var> is a tree structure of nodes. The possible node types are listed here (these are the enumeration return values of the
An <var>XmlDoc</var> is a tree structure of nodes. The possible node types are listed here (these are the enumeration return values of the
<var>[[Type (XmlDoc/XmlNode function)|Type]]</var> function):
<var>[[Type (XmlDoc/XmlNode function)|Type]]</var> function):
Line 214: Line 214:
Do not confuse the <var>Root</var> node,  which is the root of the <var>XmlDoc</var> tree, with the top-level element of the XML document.
Do not confuse the <var>Root</var> node,  which is the root of the <var>XmlDoc</var> tree, with the top-level element of the XML document.


====XmlDoc states====
===XmlDoc states===
An <var>XmlDoc</var> can have one of the following three states:
An <var>XmlDoc</var> can have one of the following three states:
<dl>
<dl>
Line 233: Line 233:
text representation of an XML document.
text representation of an XML document.


===The XmlNode and XmlNodelist classes, and XPath===
==The XmlNode and XmlNodelist classes, and XPath==
<!-- on XmlNode/list class page, #REDIRECT [[XmlDoc API#The XmlNode and XmlNodelist classes, and XPath]] -->
<!-- on XmlNode/list class page, #REDIRECT [[XmlDoc API#The XmlNode and XmlNodelist classes, and XPath]] -->
In addition to using an <var>XmlDoc</var> directly, you can access an <var>XmlDoc</var> with either of the following objects:
In addition to using an <var>XmlDoc</var> directly, you can access an <var>XmlDoc</var> with either of the following objects:
Line 291: Line 291:
Further information about XPath expressions and node sets is also contained in [[XPath]].
Further information about XPath expressions and node sets is also contained in [[XPath]].
   
   
===An example of XmlDoc API methods and XPath===
==An example of XmlDoc API methods and XPath==
This section illustrates a small XML document received as a web request,
This section illustrates a small XML document received as a web request,
followed by part of a <var class="product">User Language</var> request that uses some <var>XmlDoc</var>
followed by part of a <var class="product">User Language</var> request that uses some <var>XmlDoc</var>
Line 352: Line 352:
Note that '''first''' is defined in terms of "document order" (see [[XPath#Order of nodes: node sets versus nodelists|Order of nodes: node sets versus nodelists]]).
Note that '''first''' is defined in terms of "document order" (see [[XPath#Order of nodes: node sets versus nodelists|Order of nodes: node sets versus nodelists]]).
   
   
===Updating===
==Updating==
Updating an <var>XmlDoc</var> generally refers to the addition and deletion
Updating an <var>XmlDoc</var> generally refers to the addition and deletion
of the nodes of the <var>XmlDoc</var> tree, which includes the generation
of the nodes of the <var>XmlDoc</var> tree, which includes the generation
Line 370: Line 370:
can delete nodes from it using <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.
can delete nodes from it using <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.


=====16M limit to number of XmlDoc items=====
====16M limit to number of XmlDoc items====
Internally, an <var>XmlDoc</var> object is maintained in a data structure that has a maximum of 16M items. Each [[#nodeTypes|node]] requires an item, as does each unique string.  A string item is used, for example, as the name of an element, attribute, PI node, or namespace, or as the value of a comment, PI, attribute, or text node, or as a namespace URI.
Internally, an <var>XmlDoc</var> object is maintained in a data structure that has a maximum of 16M items. Each [[#nodeTypes|node]] requires an item, as does each unique string.  A string item is used, for example, as the name of an element, attribute, PI node, or namespace, or as the value of a comment, PI, attribute, or text node, or as a namespace URI.
Items are also used to maintain the <var>[[SelectionNamespace (XmlDoc property)|SelectionNamespace]]</var> property and for other
Items are also used to maintain the <var>[[SelectionNamespace (XmlDoc property)|SelectionNamespace]]</var> property and for other
Line 381: Line 381:
available for reuse, at least in versions 7.5 of <var class="product">Model 204</var> and earlier.
available for reuse, at least in versions 7.5 of <var class="product">Model 204</var> and earlier.


====Inserting nodes and copying subtrees====
===Inserting nodes and copying subtrees===
The <var>Add</var>* methods are designed to make it easy to "append" nodes to an
The <var>Add</var>* methods are designed to make it easy to "append" nodes to an
<var>XmlDoc</var> in a "depth-first, left-to-right" order in the simple case.
<var>XmlDoc</var> in a "depth-first, left-to-right" order in the simple case.
Line 421: Line 421:
As discussed above in the [[#Updating|introduction to the section on updating]], the number of items in an <var>XmlDoc</var> is limited to 16M items; exceeding this number in any updating operation causes request cancellation.
As discussed above in the [[#Updating|introduction to the section on updating]], the number of items in an <var>XmlDoc</var> is limited to 16M items; exceeding this number in any updating operation causes request cancellation.


====Namespaces with Add* and Insert* methods====
===Namespaces with Add* and Insert* methods===
When an <var>XmlDoc</var> is deserialized, the namespace declarations (and the use of
When an <var>XmlDoc</var> is deserialized, the namespace declarations (and the use of
those declarations by names in the document) follow the scope rules outlined
those declarations by names in the document) follow the scope rules outlined
Line 483: Line 483:
</ul>
</ul>


====Deleting nodes====
===Deleting nodes===
The <var>XmlDoc</var> API subroutine used to delete individual nodes (and their descendants)
The <var>XmlDoc</var> API subroutine used to delete individual nodes (and their descendants)
from an <var>XmlDoc</var> is <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.
from an <var>XmlDoc</var> is <var>[[DeleteSubtree (XmlDoc/XmlNode subroutine)|DeleteSubtree]]</var>.
Line 527: Line 527:
As discussed in the [[#Updating|introduction to the section on updating]], the items deleted by <var>DeleteSubtree</var> are not available for reuse by later Add, Insert, etc. methods, so <var>DeleteSubtree</var> does not "relieve" the restriction of 16M items in an <var>XmlDoc</var>.
As discussed in the [[#Updating|introduction to the section on updating]], the items deleted by <var>DeleteSubtree</var> are not available for reuse by later Add, Insert, etc. methods, so <var>DeleteSubtree</var> does not "relieve" the restriction of 16M items in an <var>XmlDoc</var>.


====Namespace URI for XPath prefixes====
===Namespace URI for XPath prefixes===
It is important to realize that the URI associated with a prefix
It is important to realize that the URI associated with a prefix
<i><b>in the XML document</b></i> is controlled by the <code>xmlns</code> namespace declarations in the document.
<i><b>in the XML document</b></i> is controlled by the <code>xmlns</code> namespace declarations in the document.
Line 546: Line 546:
<div id="recvSendDetail"></div>
<div id="recvSendDetail"></div>


===Transport: receiving and sending XML===
==Transport: receiving and sending XML==
Distinct sets of methods provide the for receiving and sending of XML:
Distinct sets of methods provide the for receiving and sending of XML:
<ul>
<ul>
Line 612: Line 612:
</ul>
</ul>


===Strings and Unicode with the XmlDoc API===
==Strings and Unicode with the XmlDoc API==
As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDoc</var>s are maintained in Unicode
As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDoc</var>s are maintained in Unicode
rather than EBCDIC; this is true for all string values, names, prefixes, and URIs.
rather than EBCDIC; this is true for all string values, names, prefixes, and URIs.
Line 750: Line 750:
[[XML_processing_in_Janus_SOAP#ISO-10646_and_EBCDIC_characters|XML processing in Janus_SOAP: ISO-10646 and EBCDIC characters]]
[[XML_processing_in_Janus_SOAP#ISO-10646_and_EBCDIC_characters|XML processing in Janus_SOAP: ISO-10646 and EBCDIC characters]]


====Using Longstrings or Unicode instead of Strings====
===Using Longstrings or Unicode instead of Strings===
Either the <var>[[Unicode#The_User_Language_Unicode_type|Unicode]]</var> or
Either the <var>[[Unicode#The_User_Language_Unicode_type|Unicode]]</var> or
<var>[[Longstrings|Longstring]]</var> datatype provides an atomic type that can contain a string longer than 255 bytes.
<var>[[Longstrings|Longstring]]</var> datatype provides an atomic type that can contain a string longer than 255 bytes.
Line 771: Line 771:
<var>XmlDoc</var> API methods wherever you might be using a <code>String Len 255</code> %variable.
<var>XmlDoc</var> API methods wherever you might be using a <code>String Len 255</code> %variable.


=====Xml and Serial methods=====
====Xml and Serial methods====
You should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the result
You should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the result
of the <var>Xml</var> or <var>Serial</var> methods &mdash; the total concatenated length of all
of the <var>Xml</var> or <var>Serial</var> methods &mdash; the total concatenated length of all
Line 784: Line 784:
</p>
</p>


=====Value[Default] methods=====
====Value[Default] methods====
Usually you should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the
Usually you should use a <var>Longstring</var> or <var>Unicode</var> %variable to hold the
result of the <var>Value</var> or <var>ValueDefault</var> methods.
result of the <var>Value</var> or <var>ValueDefault</var> methods.
Line 802: Line 802:
where you might use <code>String Len 255</code> %variables.
where you might use <code>String Len 255</code> %variables.


=====URI-related methods=====
====URI-related methods====
Besides the <var>Xml</var> and <var>Value</var> methods, other <var>XmlDoc</var> API methods either cannot return a value longer
Besides the <var>Xml</var> and <var>Value</var> methods, other <var>XmlDoc</var> API methods either cannot return a value longer
than 255 bytes or, with typical XML documents, are unlikely to do so.
than 255 bytes or, with typical XML documents, are unlikely to do so.
Line 822: Line 822:
%variables where you might use <code>String Len 255</code> %variables.
%variables where you might use <code>String Len 255</code> %variables.
   
   
===Conventions and terminology for XmlDoc API methods===
==Conventions and terminology for XmlDoc API methods==
In addition to those described in [[Notation conventions for methods]],
In addition to those described in [[Notation conventions for methods]],
the following conventions are also used in the individual <var>XmlDoc</var> API method descriptions:
the following conventions are also used in the individual <var>XmlDoc</var> API method descriptions:

Revision as of 00:58, 18 March 2014

XmlDoc API concepts and data structures

See also:

The XmlDoc API is based on the use of XML documents. XML processing in Janus SOAP and various XML references explain that an XML document can contain any type of data, so an XML document may not be primarily intended for human reading. Nevertheless, an XML document can be simply and meaningfully expressed or represented entirely with readable characters. This character form of an XML document is called the serial form. When operating on an XML document with the XmlDoc API, the serial form is converted to an XmlDoc object.

Only a few categories of operations are needed on XML documents; one way to structure them is:

Receive Receive the transmitted text of a document and convert it into an XmlDoc, which uses nodes to represent the hierarchy of the XML document.
Update Update or create an XmlDoc, by adding, deleting, copying, or replacing nodes.
Access Access nodes in an XmlDoc, and data contained within them.
Send Convert an XmlDoc into a textual representation, and transmit it.
Other There are other operations, such as XmlDoc properties to control certain operations, data structure housekeeping, and debugging facilities.

The remainder of this article describes the objects used to operate on XML documents. It reviews the above categories of operations, showing how they are accommodated by the XmlDoc API classes: XmlDoc, XmlNode, and XmlNodelist. The objects are operated upon by methods that are members of these classes.

Typical operations on an XML document

This section list the categories of operations on an XML document, and provides the motivation for objects in the XmlDoc API: XmlDocs, XmlNodes, and XmlNodelists.

Receive or Load

The process of receiving a document actually consists of two steps:

  1. Receiving the document text using some “transport” mechanism, such as Janus Web Server (HTTP, as server) Janus Sockets (usually, HTTP, as client), Model 204 MQ Series, access from a file, etc.
  2. Converting the XML document (deserialization) into its internal representation (an XmlDoc) so that other operations can be performed on it.

If the XML document is received by Janus Web Server, these steps are performed together by the WebReceive function. For the HTTP Helper, the document text is received by the HttpRequest Get, Post, or Send function, and the deserialization is done by the HttpResponse ParseXml function. For other forms of transport, the steps are performed separately: the text form of the document is received into a Longstring, and the Longstring contents are converted into internal form by the LoadXml function.

See the discussion about sending and receiving, which mentions other details about the operations ("Receive", "Load", etc.) that create the initial content of an XmlDoc.

Update

You modify an XmlDoc using various XmlDoc API methods. If you start with an empty XmlDoc, some methods (the Add* and Insert* methods for various node types) allow you to generate an XmlDoc "directly", without first representing it in the serial text form. You can also update an XmlDoc into which you have received a document.

See the section on updating for a more detailed overview of XmlDoc updating operations.

Access

Since an XML document is a hierarchical structure, your application will need to select some part of the hierarchy to operate upon, for example, to obtain its value. Various XmlDoc API methods do this. In addition, some XmlDoc API updating methods also require that you specify where in the hierarchy an update is performed.

Selecting nodes from an XmlDoc is performed using the XPath language, introduced in XML Path Language (XPath). XPath can be used for accessing a single node in the document, for example, getting an element node's string value using the Value property. You can also work with lists of selected nodes, represented by XmlNodelists. The SelectNodes function produces such a list. Other XmlNodelist methods also work with them, including the Item function, which gets a single XmlNode from an XmlNodelist. SelectSingleNode returns an XmlNode, as do most of the Add* and Insert*Before methods.

Send

The process of sending a document actually consists of two steps:

  1. Converting the XmlDoc into its serial text representation.
  2. Sending the document text using some "transport"; mechanism, such as Janus Web Server (HTTP, as server), Janus Sockets (usually, HTTP, as client), Model 204 MQ Series, access from a file, etc.

If the XML document is sent by Janus Web Server, the steps can be performed together by the WebSend subroutine. For the HTTP Helper, the document is serialized by the HttpRequest AddXml subroutine, and the document is sent by the HttpRequest Get, Post, or Send function. For other forms of transport, the steps are performed separately: the XmlDoc is converted into external form by the Serial function, and the converted result is sent using the appropriate transport.

Other operations

Some other operations the XmlDoc API methods perform include:

  • Creating and initializing an XmlDoc or XmlNodelist.
  • Setting or retrieving some property of an XmlDoc, for example, the URI associated with a prefix to be used in an XPath expression (see SelectionNamespace).
  • Displaying a document, or some part of it, usually for debugging purposes (see Print).

The XmlDoc class

An XmlDoc object is the internal representation of an XML document; creating one is usually done by invoking the XmlDoc New constructor, which returns an XmlDoc instance. An XmlDoc is a tree structure of nodes. The types of nodes that an XmlDoc may contain are shown in the following subsection.

XmlDoc node types

An XmlDoc is a tree structure of nodes. The possible node types are listed here (these are the enumeration return values of the Type function):

Attribute
This type of node is used to represent an attribute of an XML element.
Comment
This type of node is used to represent a comment (serialized in the form: <!--comment-->) in an XML document.
Root
This type of node is the root of the XmlDoc tree. It has zero or one Element child nodes and any number of Comment and Pi child nodes. Root and Element nodes are the only nodes that can have child nodes.
Element
This type of node is used to represent an element in an XML document. Element and Root nodes are the only nodes that can have child nodes.
Pi
This type of node is used to represent a processing instruction (<?target ...?>) in an XML document.

Note: Although the "XML declaration" (<?xml version=...?>) has the same appearance as a processing instruction, it is not a Pi.

Also, note that the values of an XmlDoc's XML declaration can be obtained and set with these properties: Version, Encoding, and Standalone.

Text
This type of node is used to represent character content within an XML element. Note that a Text node will never contain the null string, and that two Text nodes can be adjacent only if the AdjacentText property is set to allow it.

The XmlDoc node types listed above correspond almost exactly with the structures contained in an XML document (see XML and XML example). The Root node, always present, corresponds to the node that contains the document as a whole. You can insert additional nodes, either by deserializing a character stream containing an XML document instance (for example, with WebReceive), or by using Add*/Insert*Before methods to insert nodes. The children of the Root node are the "top-level" element and any top-level processing instructions and/or comments that precede or follow it. Do not confuse the Root node, which is the root of the XmlDoc tree, with the top-level element of the XML document.

XmlDoc states

An XmlDoc can have one of the following three states:

EMPTY
An XmlDoc in this state has no nodes other than the Root node. This is the state of an XmlDoc as returned by the XmlDoc New method.
WELL-FORMED
An XmlDoc in this state contains at least the top-level Element node.
Non-EMPTY not WELL-FORMED
An XmlDoc in this state contains at least one Comment or Pi node but no Element nodes.

Note that only an XmlDoc in the WELL-FORMED state may be converted into a complete text representation of an XmlDoc, and that you can only use an EMPTY XmlDoc as the target of “deserializing” the text representation of an XML document.

The XmlNode and XmlNodelist classes, and XPath

In addition to using an XmlDoc directly, you can access an XmlDoc with either of the following objects:

  • An XmlNode, which is a single pointer to a node in an XmlDoc
  • An XmlNodelist, which contains a list of pointers to nodes selected from an XmlDoc

Instances of both of these objects are created by and returned as the value of several XmlDoc API functions. An XmlNodelist may also be created by an invocation of the XmlNodelist New constructor, which requires the specification of an XmlDoc argument — the XmlDoc with which the XmlNodelist is associated. There is not a New constructor in the XmlNode class.

A single XmlDoc can have any number of XmlNodelists and XmlNodes associated with it.

Most operations on the "contents" of an XmlDoc select one or more nodes using XPath expressions ("PathExpr" is the XPath syntax term, as explained in XPath syntax. All methods that accept an XPath LocationPath expression argument are members of both the XmlDoc and the XmlNode classes.

There are two forms of XPath expressions:

Absolute XPath expression
An absolute XPath expression selects nodes from an XmlDoc, starting at the Root node. The syntax of an absolute XPath expression begins with a forward slash (/).
Relative XPath expression
A relative XPath expression selects nodes from an XmlDoc, starting from a context node which is determined when the expression is used. The syntax of a relative XPath expression begins with a character other than a slash. When you use a relative XPath expression, the context node depends on the method object (the type of object on which the method operates) of the invocation:
  1. If the method object is an XmlDoc, the context node is the Root node.
  2. If the method object is an XmlNode, the context node is the node which it points to.

In addition to operating on the contents of an XmlDoc, there are several methods (for example, WebReceive) that operate on the XmlDoc as a whole. These methods only allow an XmlDoc method object. If you need to obtain the XmlDoc associated with an XmlNode or XmlNodelist, use the XmlDoc function.

The following section continues the explanation of XPath, XmlNodes, and XmlNodelists. Further information about XPath expressions and node sets is also contained in XPath.

An example of XmlDoc API methods and XPath

This section illustrates a small XML document received as a web request, followed by part of a User Language request that uses some XmlDoc API methods, with particular attention to the method's XPath arguments.

Here is the XML document:

<purchase_order> <date>25 July, 2001</date> <pitm> <partnum>1234</partnum> <qty>3</qty> </pitm> <pitm> <partnum>5678</partnum> <qty>2</qty> </pitm> </purchase_order>

Here is some User Language which could be used to receive and process this request:

%doc Object XmlDoc %nl Object XmlNodelist * Create XmlDoc, get web request as contents: %doc = New %doc:WebReceive * Create work nodelist with all pitm elements: %nl = %doc:SelectNodes('/purchase_order/pitm') * Process each pitm: For %j From 1 To %nl:Count %partnum = %nl(%j):Value('partnum') %qty = %nl(%j):Value('qty') ... End For

Value and SelectNodes, like many methods in the XmlDoc API, have an optional argument that allows you to process any of the nodes in an XmlDoc, rather than the default, which is to process the node to which the method object points.

The optional argument shown above is an XPath expression (for SelectNodes, /purchase_order/pitm; for Value, partnum and qty). An XPath expression selects a list of nodes, starting either from the XmlDoc Root (when an absolute Xpath expression is used) or from a particular context node in an XmlDoc (when a relative Xpath expression is used). Syntactically, an XPath expression that begins with a slash (/) is absolute.

SelectNodes returns the entire result of its XPath expression argument. Many other XmlDoc API methods, however, operate on the first of the nodes resulting from the argument's XPath expression. That first node is called the head of the argument XPath result. Note that first is defined in terms of "document order" (see Order of nodes: node sets versus nodelists).

Updating

Updating an XmlDoc generally refers to the addition and deletion of the nodes of the XmlDoc tree, which includes the generation of the document's initial contents. The initial contents of an XmlDoc can be established by one of the deserialization methods: LoadXml, WebReceive, or ParseXml. Whether you use a method to set the "initial" contents of an XmlDoc or whether you start with an EMPTY XmlDoc, you can then insert nodes into it, using one or more of the methods whose name begins with "Add", such as AddElement, or whose name begins with "Insert", such as InsertSubtreeBefore.

Once an XmlDoc has one or more nodes in addition to the Root node, you can modify the Value of Text and other nodes, and you can delete nodes from it using DeleteSubtree.

16M limit to number of XmlDoc items

Internally, an XmlDoc object is maintained in a data structure that has a maximum of 16M items. Each node requires an item, as does each unique string. A string item is used, for example, as the name of an element, attribute, PI node, or namespace, or as the value of a comment, PI, attribute, or text node, or as a namespace URI. Items are also used to maintain the SelectionNamespace property and for other internal purposes.

Exceeding the XmlDoc item limit (16M) in an updating operation causes request cancellation. Such operations include, for example, the Receive, Add, and Insert families of methods, as well as changing the Value of an XmlNode, and so on.

Update operations that cause deletion of items from an XmlDoc (for example, DeleteSubtree, and replacing the Value of an XmlNode) do not, in general, make those items available for reuse, at least in versions 7.5 of Model 204 and earlier.

Inserting nodes and copying subtrees

The Add* methods are designed to make it easy to "append" nodes to an XmlDoc in a "depth-first, left-to-right" order in the simple case. These methods insert a node as the last child of the node pointed to by the method object.

Most of the Add* methods (for example AddElement) have Insert*Before counterparts (for example, InsertElementBefore) which insert a node in a position other than the last child of an Element or the Root. AddAttribute and AddNamespace are the exceptions, without an Insert*Before counterpart.

AddElement (as does InsertElementBefore) has an optional text value argument, with which you can insert a Text node child of the inserted Element node.

Here is an example of the updating methods in the XmlDoc API:

%doc Object XmlDoc %doc = New %story Object XmlNode %paragraph Object XmlNode %story = %doc:AddElement('story') %story:AddComment('My first XML document') %story:AddElement('greeting', 'Hello, world') %paragraph = %story:AddElement('paragraph') %paragraph:AddElement('line', 'Ask not what') %paragraph:AddElement('line', 'Hear no evil')

This creates the following XML document:

<story> <greeting>Hello, world!</greeting> <paragraph> <line>Ask not what</line> <line>Hear no evil</line> </paragraph> </story>

As discussed above in the introduction to the section on updating, the number of items in an XmlDoc is limited to 16M items; exceeding this number in any updating operation causes request cancellation.

Namespaces with Add* and Insert* methods

When an XmlDoc is deserialized, the namespace declarations (and the use of those declarations by names in the document) follow the scope rules outlined in Name and namespace syntax. The namespace structure that results from the updating (adding and removing nodes) of an XmlDoc is explained here. Most of these updating methods are the Add* and Insert*Before methods (described individually in the List of XmlDoc API methods).

In the following discussion, all references to an Add* method (for example, AddElement) refer equally to the corresponding Insert*Before method (for example, InsertElementBefore).

  • From deserialization through serialization, the XmlDoc API enforces the syntax rules of well-formed documents. When namespace handling is in effect for an XmlDoc (that is, the Namespace property setting is On, the default), the prefixes of names of nodes you add must be declared.
  • Once an attribute or element is added, its URI is fixed and will not change thereafter.
  • To allow AddElement to add an element with a prefix/URI declaration other than that which is in scope, it has a URI argument, which specifies the URI of the element. If the URI argument is used and the resulting prefix/URI combination requires a namespace declaration (because it differs from what is in scope), a declaration is created at the element.
  • The URI argument of AddElement can also be such that the application logic around AddElement does not need to depend on whether the prefix/URI declaration is already in scope. AddAttribute has a URI argument for the same reason, and it can also result in the creation of a namespace declaration at the Element parent of the inserted Attribute node.
  • As an alternative to the URI argument on AddElement and AddAttribute, a namespace declaration is created at an element with the AddNamespace method. AddNamespace also lets you insert a namespace declaration that is not used by an element nor any of its attributes. For example, the following fragment:

    %n = %doc:AddElement('a') %n:AddNamespace('x', 'y:z') %n:AddElement('x:b') %n:AddElement('x:c')

    Creates the following document:

    <a xmlns:x="y:z"> <x:b/> <x:c/> </a>

  • When the AddSubtree method copies between different XmlDocs, it does not allow the source and target XmlDocs to have different Namespace property settings (doing so could require re-parsing names in some cases).
  • The namespace axis is not allowed in an XPath expression. Without XPath access to a pointer to a namespace node, the namespace clearly cannot be changed nor deleted, so the URIs associated with nodes cannot be "changed out from under". You can obtain the information in a namespace declaration, however, using certain XmlDoc API methods (for example, the Uri property of an XmlNode).

Deleting nodes

The XmlDoc API subroutine used to delete individual nodes (and their descendants) from an XmlDoc is DeleteSubtree. An example is shown below.

If a node you delete was referenced by an XmlNode, the value of that XmlNode becomes Null. Similarly, if any deleted nodes were referenced by an item of an XmlNodelist, the value of that item of the XmlNodelist becomes Null.

If you need to "clean up" an XmlNodelist that refers to a deleted XmlDoc node, you can use the Difference function, as shown in the following example.

In the example, the %nlis XmlNodelist is cleaned up (with Difference) to remove the items which have become null. The final reference to items in %nlis uses relative XPath (the '@auth' argument of the Value method invocation, which selects the attribute named auth).

%nlis is object xmlNodelist %removeLis is object xmlNodelist * Get nodelist for chapters, and delete all * chapters by indicated author from the XmlDoc: %deleteAuth = 'Dave' %nlis = %d:selectNodes('/book/chapter') %removeLis = %d:selectNodes('/book/chapter[@auth="' %deleteAuth '"]') for %i from 1 to %removeLis:count %removeLis(%i):deleteSubtree end for * Cleanup the chapter nodelist, show author of * remaining chapters, & display the document: %nlis = %nlis:difference(%removeLis) for %i from 1 to %nlis:count print 'Author:' and %nlis(%i):value('@auth') end for %d:print ...

Also note that in the above example, the Item method is implicitly used (%removeLis(%i) and %nlis(%i)) and that implicit concatenation is used in the second XPath expression ('/book/chapter[@auth="' %deleteAuth '"]').

As discussed in the introduction to the section on updating, the items deleted by DeleteSubtree are not available for reuse by later Add, Insert, etc. methods, so DeleteSubtree does not "relieve" the restriction of 16M items in an XmlDoc.

Namespace URI for XPath prefixes

It is important to realize that the URI associated with a prefix in the XML document is controlled by the xmlns namespace declarations in the document. However, when a prefix is used in a name in an XPath argument to an XmlDoc API method, the URI for that prefix must be established so that the full XPath name (local part and URI namespace) can be used to locate a document element. This association of XPath prefixes to URIs is established using the SelectionNamespace property.

The prefix names used in Xpath selection are independent of the prefix names used in the document serialization and deserialization. Since an XML document element prefix may be associated with multiple namespaces, or an element may have a namespace and no associated prefix, XPath prefixes stipulate the namespace that fully qualifies an element name in an XPath location step.

Transport: receiving and sending XML

Distinct sets of methods provide the for receiving and sending of XML:

  • Receiving XML involves converting the information from the character, marked-up form of an XML document into an XmlDoc; this operation is called deserialization.
    • The WebReceive function is designed to receive an XML document that has arrived as a web request.
    • The ParseXml function of the HttpResponse class accomplishes this for HTTP clients.
    • For other transport mechanisms, such as Model 204 MQ Series, the character-format XML document can be placed into a Longstring, and the LoadXml function then places the information into an XmlDoc.
  • Sending XML involves converting the information in an XmlDoc to a character stream, including markup such as element tags; this operation is called serialization.
    • The WebSend subroutine is designed to send an XML document as a web response.
    • The AddXml method of the HttpRequest class accomplishes this for HTTP clients.
    • For other transport mechanisms, such as Model 204 MQ Series, the Serial function is used to place the serialized form into a Longstring, which can then be sent.

About encoding:

  • When the internal representation of an XmlDoc is EBCDIC (prior to Sirius Mods 7.6), the deserialization methods reject a document if it contains an ISO-10646 (Unicode) character that cannot be represented in EBCDIC. When the internal representation of an XmlDoc is Unicode (Sirius Mods 7.6 and higher), the deserialization methods by default reject a document if it contains a Unicode character that is not translatable to EBCDIC. See Char and Reference for more information about characters in an XML document.
  • The encodings that are accepted in the deserialization operations are UTF-8 and UTF-16, and ISO-8859-n (where n is a digit from 1 to 9). Prior to Sirius Mods Version 7.6, all of the ISO-8859-n variants are treated as ISO-8859-1. As of Sirius Mods 7.6, the variants determine Ascii to Unicode conversions according to the specification of the individual variant.

    Note: These encoding names must be specified in uppercase letters.

  • When the document is serialized, the result is EBCDIC or is in the UTF-8 encoding. Therefore, the only values permitted to be set for the Encoding property are UTF-8 and the null string; in that, see Encoding's Usage Notes for more information about the character sets allowed in a serialized input XML document and the value of encoding in an XML declaration.

Strings and Unicode with the XmlDoc API

As of Sirius Mods version 7.6, XmlDocs are maintained in Unicode rather than EBCDIC; this is true for all string values, names, prefixes, and URIs. As a consequence, most of the arguments and results of the XmlDoc API methods that formerly were strings or longstrings are Unicode strings as of version 7.6.

This switch to Unicode requires little or no change to most existing XmlDoc API applications, however: XmlDoc API argument and result variables declared as String or Longstring are automatically converted from EBCDIC to Unicode by the Sirius Mods. For example, the EBCDIC character strings in the arguments in a statement like the following are automatically converted to Unicode:

%d:AddElement('name', 'value')

Similarly, if the variable %str, below, was declared as type String or Longstring, then the Unicode result of the Value method is automatically converted to EBCDIC when it is stored in %str:

%str = %n:Value

The principal benefit of this switch to Unicode is conformance with the W3C XML standard, which defines "characters" in terms of Unicode characters (most of which are valid in XML documents). You can now store string values that are not translatable to EBCDIC — Sirius Mods 7.5 allows storage only of (most) non-null EBCDIC characters or of characters that translate to those EBCDIC characters.

The automatic EBCDIC/Unicode conversions described above will not cause request cancellations in requests that run successfully under Sirius Mods 7.5. But there are other changes to or effects on the XmlDoc API that are due to the switch to Unicode maintenance (the Sirius Mods Release 7.6 Notes and the individual method descriptions provide additional details):

  • The workaround (InvalidChar method) for accommodating nulls and EBCDIC characters that are not allowed by the XML standard is replaced by:
    • The AllowNull property can let nulls be stored in an XmlDoc.
    • A method argument (AllowUntranslatable) of the deserialization methods that lets you store Unicode characters that do not translate to EBCDIC. Such characters may be also stored directly by the Add* and Insert* methods of the XmlDoc API; these methods do not require a special argument. EBCDIC characters that do not translate to Unicode must be handled before they are passed to an XmlDoc update operation. For example, EBCDIC X'04' is the SEL ("Select") control character. Since there is no "Select" control character in Unicode, there is no mapping between EBCDIC X'04' and any Unicode character. For this you might use the Untranslatable parameter of the EbcdicToUnicode function.
  • If you have defined uninvertible translations, the implicit translation of EBCDIC string arguments and results to Unicode as of version 7.6 of the Sirius Mods will change the behavior of the XmlDoc API methods compared to their operation in version 7.5. For example, assume CCAIN establishes codepage 0037 as the base, and that it also uses the following UNICODE commands to allow for the codepage 1047 square bracket characters:

    UNICODE Table Standard Trans E=AD To U=005B UNICODE Table Standard Trans E=BD To U=005D

    These UNICODE commands cause uninvertible translations. For example, by the first command, EBCDIC X'AD' translates to U+005B, but by the definition of codepage 0037, U+005B translates to EBCDIC X'BA'. Consequently, you can add a X'AD' character to an XmlDoc, but if you display its value:

    %nod:AddElement('leftSquare', 'AD':X) Print %nod:Value('leftSquare'):StringToHex

    You get the following result:

    BA

    The Value method returns the Unicode character U+005B, which is translated implicitly to EBCDIC X'BA' as the string input for StringToHex.

    In version 7.5, because XmlDoc strings are stored in EBCDIC, no implicit translation is performed, and the result of the above two statements is:

    AD

  • The Print subroutine is equipped to display the Unicode values that are stored in XmlDocs, even if the Unicode characters are not translatable to EBCDIC. If non-translatable Unicode characters are stored in XmlDoc Attribute or Element values, Print displays their XML hexadecimal character references. If non-translatable Unicode characters are stored in a context other than Element or Attribute (a name, Comment, or Pi), the Print CharacterEncodeAll option is required to display a character reference and avoid request cancellation.
  • As described further in "Implicit Unicode conversions", the User Language Print statement under Sirius Mods 7.6 does not cancel the request if it is presented with a Unicode character that does not translate to EBCDIC. If it encounters an untranslatable Unicode character, Print will display an EBCDIC string that contains the character's hex encoding. As an example, consider the direct printing of the output of Value. Say the element node assigned to %nodeY contains the Unicode trademark character (U+2122), which does not translate to EBCDIC. The following statement succeeds because the Print statement can handle untranslatable Unicode characters:

    Print %nodeY:Value

    The result under Sirius Mods 7.5 is a request cancellation. The result under Sirius Mods 7.6 is:

    &#x2122;

    However, the following common operation using the StringToHex method with Value does not succeed:

    Print %nodeY:Value:stringToHex

    When StringToHex attempts to implicitly convert to EBCDIC the Unicode character passed to it by the Value function, the conversion fails because the character is not translatable to EBCDIC, and the request is cancelled. Such an implicit conversion, which simply uses the current Unicode translation tables, does not do character encoding.

    To avoid a request cancellation here and view the Value result, you can use the UnicodeToUtf16 function to encode the Unicode character as a UTF-16 string for input to StringToHex:

    Print %nodeY:Value:unicodeToUtf16:stringToHex

For more information about the characters that are valid in an XmlDoc API XML document, see XML processing in Janus_SOAP: ISO-10646 and EBCDIC characters

Using Longstrings or Unicode instead of Strings

Either the Unicode or Longstring datatype provides an atomic type that can contain a string longer than 255 bytes. The XmlDoc API methods, like all Janus SOAP methods, accept strings longer than 255 whenever they have a string argument or result, which is to say:

  • Input values may exceed 255 bytes in length.
  • Various XmlDoc API methods will return a string longer than 255 bytes, if indeed the result value exceeds 255 bytes.

The following subsections provide some guidelines to determine when you must use a longstring (or Unicode, as of Sirius Mods version 7.6) %variable or context for a string argument or for the result of a method in the XmlDoc API. Since the server table requirements and the processing overhead for Longstring or Unicode are just a little more than for a String Len 255 %variable, it is recommended that you use a Longstring or Unicode in the XmlDoc API methods wherever you might be using a String Len 255 %variable.

Xml and Serial methods

You should use a Longstring or Unicode %variable to hold the result of the Xml or Serial methods — the total concatenated length of all markup and character content in a document (or subtree, for Serial) — which will most likely exceed 255 bytes. Thus, the first invocation of the Xml method below will never fail (for length reasons) but the second will usually cause a request cancellation:

%ls Longstring %ls = %doc:Serial %ss String Len 255 %ss = %doc:Serial

Value[Default] methods

Usually you should use a Longstring or Unicode %variable to hold the result of the Value or ValueDefault methods. For example, the first two invocations of the Value method below will succeed but the third will cause a request cancellation:

%ss String Len 255 %ls Longstring %doc:LoadXml('<top> ' With - $Lstr_Left('a', 300) With ' <little>' - With 'Less than 256 chars</little> </top>') %ls = %doc:Value('/top/big') %ss = %doc:Value('/top/little') %ss = %doc:Value('/top/big')

As noted above, the best approach here is to use Longstring or Unicode %variables where you might use String Len 255 %variables.

URI-related methods

Besides the Xml and Value methods, other XmlDoc API methods either cannot return a value longer than 255 bytes or, with typical XML documents, are unlikely to do so. If you have a namespace URI that exceeds 255 bytes, it may be necessary to use a Longstring or Unicode %variable.

For example, the first two invocations of the Uri method below will succeed, but the third will cause a request cancellation:

%ss String Len 255 %ls Longstring %doc:LoadXml('<top><a:inner xmlns:a="urn:' With - $Lstr_Left('big', 300, '_') With '"/></top>') %ss = %doc:URI('*') %ls = %doc:URI('*/*') %ss = %doc:URI('*/*')

As noted above, the easiest approach here is to use Longstring or Unicode %variables where you might use String Len 255 %variables.

Conventions and terminology for XmlDoc API methods

In addition to those described in Notation conventions for methods, the following conventions are also used in the individual XmlDoc API method descriptions:

  • Symbols used in the syntax include the following. Usually, they represent method objects; in actual code, they may be replaced by object variables of the indicated class or by method invocations that return such object variables:
    nr
    Denotes an abstract class (short for “node reference”) for methods that operate on a node and that can be used with either an XmlNode or an XmlDoc. If an XmlDoc, the node for the operation is the root node.
    doc
    Denotes an object of class XmlDoc.
    nod
    Denotes an object of class XmlNode.
    nodl
    Denotes an object of class XmlNodelist.
  • Although the terms "XmlNode" and "node" are closely related, effort is made to distinguish them as necessary in the method descriptions. An XmlNode is an object that points to a node in an XmlDoc. Similarly, an XmlNodelist is an object that contains a set, or list, of XmlNodes selected from a particular XmlDoc. Strictly speaking, a "nodelist" does not exist, but the term is occasionally used as an abbreviation or generalization of XmlNodelist.
  • Null objects, null strings, empty results
    • A Null object is one that has been deleted or that has not been instantiated. A "null" string is a zero-length string value. The text in the method descriptions distinguishes these two terms.
    • Object-type arguments must not be Null, unless that argument explicitly allows Null. Hence, a Null argument typically causes a request cancellation. Currently, no XmlDoc API methods allow Null object arguments, and the "Request Cancellation Errors" section for each method does not include this condition.
    • Some methods that have an XPath argument allow the result of the XPath expression to be the empty set of nodes; most, however, will cancel the request if this happens. Each method that has an XPath argument will either list the empty XPath result as a request cancellation error, or will explain the operation of the method when the XPath result is the empty nodeset.

See also