XmlDoc API concepts and data structures
The XmlDoc API is based on the use of XML documents. XML processing in Janus SOAP and various XML references explain that an XML document can contain any type of data, so an XML document may not be primarily intended for human reading. Nevertheless, an XML document can be simply and meaningfully expressed or represented entirely with readable characters. This character form of an XML document is called the serial form. When operating on an XML document with the XmlDoc API, the serial form is converted to an XmlDoc object.
Only a few categories of operations are needed on XML documents; one way to structure them is:
|Receive||Receive the transmitted text of a document and convert it into an XmlDoc, which uses nodes to represent the hierarchy of the XML document.|
|Update||Update or create an XmlDoc, by adding, deleting, copying, or replacing nodes.|
|Access||Access nodes in an XmlDoc, and data contained within them.|
|Send||Convert an XmlDoc into a textual representation, and transmit it.|
|Other||There are other operations, such as XmlDoc properties to control certain operations, data structure housekeeping, and debugging facilities.|
The remainder of this article describes the objects used to operate on XML documents. It reviews the above categories of operations, showing how they are accommodated by the XmlDoc API classes: XmlDoc, XmlNode, and XmlNodelist. The objects are operated upon by methods that are members of these classes.
Typical operations on an XML document
This section list the categories of operations on an XML document, and provides the motivation for objects in the XmlDoc API: XmlDocs, XmlNodes, and XmlNodelists.
Receive or Load
The process of receiving a document actually consists of two steps:
- Receiving the document text using some “transport” mechanism, such as Janus Web Server (HTTP, as server) Janus Sockets (usually, HTTP, as client), Model 204 MQ Series, access from a file, etc.
- Converting the XML document (deserialization) into its internal representation (an XmlDoc) so that other operations can be performed on it.
If the XML document is received by Janus Web Server, these steps are performed together by the WebReceive function. For the HTTP Helper, the document text is received by the HttpRequest Get, Post, or Send function, and the deserialization is done by the HttpResponse ParseXml function. For other forms of transport, the steps are performed separately: the text form of the document is received into a Longstring, and the Longstring contents are converted into internal form by the LoadXml function.
See the discussion about sending and receiving, which mentions other details about the operations ("Receive", "Load", etc.) that create the initial content of an XmlDoc.
You modify an XmlDoc using various XmlDoc API methods. If you start with an empty XmlDoc, some methods (the Add* and Insert* methods for various node types) allow you to generate an XmlDoc "directly", without first representing it in the serial text form. You can also update an XmlDoc into which you have received a document.
See the section on updating for a more detailed overview of XmlDoc updating operations.
Since an XML document is a hierarchical structure, your application will need to select some part of the hierarchy to operate upon, for example, to obtain its value. Various XmlDoc API methods do this. In addition, some XmlDoc API updating methods also require that you specify where in the hierarchy an update is performed.
Selecting nodes from an XmlDoc is performed using the XPath language, introduced in XML Path Language (XPath). XPath can be used for accessing a single node in the document, for example, getting an element node's string value using the Value property. You can also work with lists of selected nodes, represented by XmlNodelists. The SelectNodes function produces such a list. Other XmlNodelist methods also work with them, including the Item function, which gets a single XmlNode from an XmlNodelist. SelectSingleNode returns an XmlNode, as do most of the Add* and Insert*Before methods.
The process of sending a document actually consists of two steps:
- Converting the XmlDoc into its serial text representation.
- Sending the document text using some "transport"; mechanism, such as Janus Web Server (HTTP, as server), Janus Sockets (usually, HTTP, as client), Model 204 MQ Series, access from a file, etc.
If the XML document is sent by Janus Web Server, the steps can be performed together by the WebSend subroutine. For the HTTP Helper, the document is serialized by the HttpRequest AddXml subroutine, and the document is sent by the HttpRequest Get, Post, or Send function. For other forms of transport, the steps are performed separately: the XmlDoc is converted into external form by the Serial function, and the converted result is sent using the appropriate transport.
Some other operations the XmlDoc API methods perform include:
- Creating and initializing an XmlDoc or XmlNodelist.
- Setting or retrieving some property of an XmlDoc, for example, the URI associated with a prefix to be used in an XPath expression (see SelectionNamespace).
- Displaying a document, or some part of it, usually for debugging purposes (see Print).
The XmlDoc class
An XmlDoc object is the internal representation of an XML document; creating one is usually done by invoking the XmlDoc New constructor, which returns an XmlDoc instance. An XmlDoc is a tree structure of nodes. The types of nodes that an XmlDoc may contain are shown in the following subsection.
XmlDoc node types
An XmlDoc is a tree structure of nodes. The possible node types are listed here (these are the enumeration return values of the Type function):
- This type of node is used to represent an attribute of an XML element.
- This type of node is used to represent a comment
(serialized in the form:
<!--comment-->) in an XML document.
- This type of node is the root of the XmlDoc tree. It has zero or one Element child nodes and any number of Comment and Pi child nodes. Root and Element nodes are the only nodes that can have child nodes.
- This type of node is used to represent an element in an XML document. Element and Root nodes are the only nodes that can have child nodes.
- This type of node is used to represent a processing instruction
<?target ...?>) in an XML document.
Note: Although the "XML declaration" (
<?xml version=...?>) has the same appearance as a processing instruction, it is not a Pi.
Also, note that the values of an XmlDoc's XML declaration can be obtained and set with these properties: Version, Encoding, and Standalone.
- This type of node is used to represent character content within an XML element. Note that a Text node will never contain the null string, and that two Text nodes can be adjacent only if the AdjacentText property is set to allow it.
The XmlDoc node types listed above correspond almost exactly with the structures contained in an XML document (see XML and XML example). The Root node, always present, corresponds to the node that contains the document as a whole. You can insert additional nodes, either by deserializing a character stream containing an XML document instance (for example, with WebReceive), or by using Add*/Insert*Before methods to insert nodes. The children of the Root node are the "top-level" element and any top-level processing instructions and/or comments that precede or follow it. Do not confuse the Root node, which is the root of the XmlDoc tree, with the top-level element of the XML document.
An XmlDoc can have one of the following three states:
- An XmlDoc in this state has no nodes other than the Root node. This is the state of an XmlDoc as returned by the XmlDoc New method.
- An XmlDoc in this state contains at least the top-level Element node.
- Non-EMPTY not WELL-FORMED
- An XmlDoc in this state contains at least one Comment or Pi node but no Element nodes.
Note that only an XmlDoc in the WELL-FORMED state may be converted into a complete text representation of an XmlDoc, and that you can only use an EMPTY XmlDoc as the target of “deserializing” the text representation of an XML document.
The XmlNode and XmlNodelist classes, and XPath
In addition to using an XmlDoc directly, you can access an XmlDoc with either of the following objects:
- An XmlNode, which is a single pointer to a node in an XmlDoc
- An XmlNodelist, which contains a list of pointers to nodes selected from an XmlDoc
Instances of both of these objects are created by and returned as the value of several XmlDoc API functions. An XmlNodelist may also be created by an invocation of the XmlNodelist New constructor, which requires the specification of an XmlDoc argument — the XmlDoc with which the XmlNodelist is associated. There is not a New constructor in the XmlNode class.
A single XmlDoc can have any number of XmlNodelists and XmlNodes associated with it.
Most operations on the "contents" of an XmlDoc select one or more nodes using XPath expressions ("PathExpr" is the XPath syntax term, as explained in XPath syntax. All methods that accept an XPath LocationPath expression argument are members of both the XmlDoc and the XmlNode classes.
There are two forms of XPath expressions:
- Absolute XPath expression
- An absolute XPath expression selects nodes from an XmlDoc, starting at
the Root node.
The syntax of an absolute XPath expression begins with a forward slash (
- Relative XPath expression
- A relative XPath expression selects nodes from an XmlDoc, starting from
a context node which is determined when the expression is used.
The syntax of a relative XPath expression begins with a character other than a slash.
When you use a relative XPath expression, the context node depends on the method object
(the type of object on which the method operates) of the invocation:
- If the method object is an XmlDoc, the context node is the Root node.
- If the method object is an XmlNode, the context node is the node which it points to.
In addition to operating on the contents of an XmlDoc, there are several methods (for example, WebReceive) that operate on the XmlDoc as a whole. These methods only allow an XmlDoc method object. If you need to obtain the XmlDoc associated with an XmlNode or XmlNodelist, use the XmlDoc function.
The following section continues the explanation of XPath, XmlNodes, and XmlNodelists. Further information about XPath expressions and node sets is also contained in XPath.
An example of XmlDoc API methods and XPath
This section illustrates a small XML document received as a web request, followed by part of a User Language request that uses some XmlDoc API methods, with particular attention to the method's XPath arguments.
Here is the XML document:
<purchase_order> <date>25 July, 2001</date> <pitm> <partnum>1234</partnum> <qty>3</qty> </pitm> <pitm> <partnum>5678</partnum> <qty>2</qty> </pitm> </purchase_order>
Here is some User Language which could be used to receive and process this request:
%doc Object XmlDoc %nl Object XmlNodelist * Create XmlDoc, get web request as contents: %doc = New %doc:WebReceive * Create work nodelist with all pitm elements: %nl = %doc:SelectNodes('/purchase_order/pitm') * Process each pitm: For %j From 1 To %nl:Count %partnum = %nl(%j):Value('partnum') %qty = %nl(%j):Value('qty') ... End For
Value and SelectNodes, like many methods in the XmlDoc API, have an optional argument that allows you to process any of the nodes in an XmlDoc, rather than the default, which is to process the node to which the method object points.
The optional argument shown above is an XPath expression (for
/purchase_order/pitm; for Value,
An XPath expression selects a list of nodes,
starting either from the XmlDoc Root (when an absolute
Xpath expression is used)
or from a particular context node in an XmlDoc (when a relative
Xpath expression is used).
Syntactically, an XPath expression that begins with a slash
/) is absolute.
SelectNodes returns the entire result of its XPath expression argument. Many other XmlDoc API methods, however, operate on the first of the nodes resulting from the argument's XPath expression. That first node is called the head of the argument XPath result. Note that first is defined in terms of "document order" (see Order of nodes: node sets versus nodelists).
Updating an XmlDoc generally refers to the addition and deletion of the nodes of the XmlDoc tree, which includes the generation of the document's initial contents. The initial contents of an XmlDoc can be established by one of the deserialization methods: LoadXml, WebReceive, or ParseXml. Whether you use a method to set the "initial" contents of an XmlDoc or whether you start with an EMPTY XmlDoc, you can then insert nodes into it, using one or more of the methods whose name begins with "Add", such as AddElement, or whose name begins with "Insert", such as InsertSubtreeBefore.
Once an XmlDoc has one or more nodes in addition to the Root node, you can modify the Value of Text and other nodes, and you can delete nodes from it using DeleteSubtree.
16M limit to number of XmlDoc items
Internally, an XmlDoc object is maintained in a data structure that has a maximum of 16M items. Each node requires an item, as does each unique string. A string item is used, for example, as the name of an element, attribute, PI node, or namespace, or as the value of a comment, PI, attribute, or text node, or as a namespace URI. Items are also used to maintain the SelectionNamespace property and for other internal purposes.
Exceeding the XmlDoc item limit (16M) in an updating operation causes request cancellation. Such operations include, for example, the Receive, Add, and Insert families of methods, as well as changing the Value of an XmlNode, and so on.
Update operations that cause deletion of items from an XmlDoc (for example, DeleteSubtree, and replacing the Value of an XmlNode) do not, in general, make those items available for reuse, at least in versions 7.5 of Model 204 and earlier.
Inserting nodes and copying subtrees
The Add* methods are designed to make it easy to "append" nodes to an XmlDoc in a "depth-first, left-to-right" order in the simple case. These methods insert a node as the last child of the node pointed to by the method object.
Most of the Add* methods (for example AddElement) have Insert*Before counterparts (for example, InsertElementBefore) which insert a node in a position other than the last child of an Element or the Root. AddAttribute and AddNamespace are the exceptions, without an Insert*Before counterpart.
AddElement (as does InsertElementBefore) has an optional text value argument, with which you can insert a Text node child of the inserted Element node.
Here is an example of the updating methods in the XmlDoc API:
%doc Object XmlDoc %doc = New %story Object XmlNode %paragraph Object XmlNode %story = %doc:AddElement('story') %story:AddComment('My first XML document') %story:AddElement('greeting', 'Hello, world') %paragraph = %story:AddElement('paragraph') %paragraph:AddElement('line', 'Ask not what') %paragraph:AddElement('line', 'Hear no evil')
This creates the following XML document:
<story> <greeting>Hello, world!</greeting> <paragraph> <line>Ask not what</line> <line>Hear no evil</line> </paragraph> </story>
As discussed above in the introduction to the section on updating, the number of items in an XmlDoc is limited to 16M items; exceeding this number in any updating operation causes request cancellation.
Namespaces with Add* and Insert* methods
When an XmlDoc is deserialized, the namespace declarations (and the use of those declarations by names in the document) follow the scope rules outlined in Name and namespace syntax. The namespace structure that results from the updating (adding and removing nodes) of an XmlDoc is explained here. Most of these updating methods are the Add* and Insert*Before methods (described individually in the List of XmlDoc API methods).
In the following discussion, all references to an Add* method (for example, AddElement) refer equally to the corresponding Insert*Before method (for example, InsertElementBefore).
- From deserialization through serialization, the XmlDoc API
enforces the syntax rules of well-formed documents.
When namespace handling is in effect for an XmlDoc
(that is, the Namespace property setting is
On, the default), the prefixes of names of nodes you add must be declared.
- Once an attribute or element is added, its URI is fixed and will not change thereafter.
- To allow AddElement to add an element with a prefix/URI declaration other than that which is in scope, it has a URI argument, which specifies the URI of the element. If the URI argument is used and the resulting prefix/URI combination requires a namespace declaration (because it differs from what is in scope), a declaration is created at the element.
- The URI argument of AddElement can also be such that the application logic around AddElement does not need to depend on whether the prefix/URI declaration is already in scope. AddAttribute has a URI argument for the same reason, and it can also result in the creation of a namespace declaration at the Element parent of the inserted Attribute node.
- As an alternative to the URI argument on AddElement and AddAttribute, a
namespace declaration is created at an element with the AddNamespace method.
AddNamespace also lets you insert a namespace declaration that is not used by
an element nor any of its attributes. For example, the following fragment:
%n = %doc:AddElement('a') %n:AddNamespace('x', 'y:z') %n:AddElement('x:b') %n:AddElement('x:c')
Creates the following document:
<a xmlns:x="y:z"> <x:b/> <x:c/> </a>
- When the AddSubtree method copies between different XmlDocs, it does not allow the source and target XmlDocs to have different Namespace property settings (doing so could require re-parsing names in some cases).
- The namespace axis is not allowed in an XPath expression. Without XPath access to a pointer to a namespace node, the namespace clearly cannot be changed nor deleted, so the URIs associated with nodes cannot be "changed out from under". You can obtain the information in a namespace declaration, however, using certain XmlDoc API methods (for example, the Uri property of an XmlNode).
The XmlDoc API subroutine used to delete individual nodes (and their descendants) from an XmlDoc is DeleteSubtree. An example is shown below.
If a node you delete was referenced by an XmlNode, the value of that XmlNode becomes Null. Similarly, if any deleted nodes were referenced by an item of an XmlNodelist, the value of that item of the XmlNodelist becomes Null.
If you need to "clean up" an XmlNodelist that refers to a deleted XmlDoc node, you can use the Difference function, as shown in the following example.
In the example, the
%nlis XmlNodelist is cleaned up
(with Difference) to remove the items which have become null.
The final reference to items in
%nlis uses relative XPath (the
argument of the Value method invocation, which selects the attribute named
%nlis is object xmlNodelist %removeLis is object xmlNodelist * Get nodelist for chapters, and delete all * chapters by indicated author from the XmlDoc: %deleteAuth = 'Dave' %nlis = %d:selectNodes('/book/chapter') %removeLis = %d:selectNodes('/book/chapter[@auth="' %deleteAuth '"]') for %i from 1 to %removeLis:count %removeLis(%i):deleteSubtree end for * Cleanup the chapter nodelist, show author of * remaining chapters, & display the document: %nlis = %nlis:difference(%removeLis) for %i from 1 to %nlis:count print 'Author:' and %nlis(%i):value('@auth') end for %d:print ...
Also note that in the above example, the Item method is implicitly used (
%nlis(%i)) and that implicit concatenation is used in the second XPath expression (
'/book/chapter[@auth="' %deleteAuth '"]').
As discussed in the introduction to the section on updating, the items deleted by DeleteSubtree are not available for reuse by later Add, Insert, etc. methods, so DeleteSubtree does not "relieve" the restriction of 16M items in an XmlDoc.
Namespace URI for XPath prefixes
It is important to realize that the URI associated with a prefix
in the XML document is controlled by the
xmlns namespace declarations in the document.
However, when a prefix is used in a name in an XPath argument to
an XmlDoc API method, the URI for that prefix must be established
so that the full XPath name (local part and URI namespace)
can be used to locate a document element.
This association of XPath prefixes to URIs is established using
the SelectionNamespace property.
The prefix names used in Xpath selection are independent of the prefix names used in the document serialization and deserialization. Since an XML document element prefix may be associated with multiple namespaces, or an element may have a namespace and no associated prefix, XPath prefixes stipulate the namespace that fully qualifies an element name in an XPath location step.
Transport: receiving and sending XML
Distinct sets of methods provide for the receiving and sending of XML:
- Receiving XML involves converting the information from the character,
marked-up form of an XML document into an XmlDoc;
this operation is called deserialization.
- The WebReceive function is designed to receive an XML document that has arrived as a web request.
- The ParseXml function of the HttpResponse class accomplishes this for HTTP clients.
- For other transport mechanisms, such as Model 204 MQ Series, the character-format XML document can be placed into a Longstring, and the LoadXml function then places the information into an XmlDoc.
- Sending XML involves converting the information in an XmlDoc to a
character stream, including markup such as element tags;
this operation is called serialization.
- The WebSend subroutine is designed to send an XML document as a web response.
- The AddXml method of the HttpRequest class accomplishes this for HTTP clients.
- For other transport mechanisms, such as Model 204 MQ Series, the Serial function is used to place the serialized form into a Longstring, which can then be sent.
Information form and content
The Janus SOAP deserialization methods receive and convert the character form of XML document data into an XmlDoc, which precisely represents the information content of that XML data. They do not preserve those aspects of the character form that are incidental to the information content, such as whether a character reference or an empty-element tag was present. The serialization methods reproduce and send the XmlDoc content, and they offer many global options for how that content is represented on output as a string of characters.
For example, the following two serializations represent exactly the same information content:
In the first string,
foo is serialized using the empty element tag
<foo/>, and the number
1 is serialized using a character reference
1. In the second string,
foo is serialized using a start tag (
<foo>) and an end tag (
</foo>), and the number
1 is serialized using character content. Neither of these outputs, however, is related to how the XML content was obtained, that is, deserialized.
A given piece of information content has many potential serializations, including variations in line ends, CDATA sections, quotation characters used in attributes, attribute and namespace declaration order, etc. And in general, the character form of an XML document before it is deserialized will not be the same after it is re-serialized. It is a user task to determine to what degree to update an XmlDoc or to select serialization options to produce the desired form of character string output of the XML document information.
For example, trying to obtain output like the second serialization in the example above (
<top><foo></foo>1</top>), you specify literal start and end tags for
foo in a SOUL deserialization method. But you discover that the default XmlDoc serialization result is
<top><foo/>1</top>. One way (now deprecated) to produce the tag presentation you want in this case is to use the serialization method option (NoEmptyElt) that forces start and end tags for childless elements. But a better way is discussed in the next paragraph, using the NoEmptyElement property.
A related conversion issue arises when using SOUL to generate HTML. Because some browsers work correctly for certain childless elements (for example,
<br> tags) only if they have an empty element tag, and for other childless elements (for example,
<div> tags) only if they have separate start and end tags, the NoEmptyElt solution just mentioned will produce HTML that works for some elements but not for others. The resolution for this is not in the serialization or deserialization methods but rather in the updating methods described above. You can use the updating methods to build or modify the HTML elements in your XmlDoc, getting information content equivalent to the deserialization methods, but providing better control and access. For the issue in this example, the NoEmptyElement property would let you selectively apply the tag format needed for successful HTML, as shown in the following request fragment:
%html = %doc:addElement('html') %body = %html:addElement('body') %div = %body:addElement('div') %div:noEmptyElement = true %div:addAttribute('id', 'topOfBody') %body:addText('foo') %body:addElement('br') %body:addText('bar')
If you use the Print method to display the fragment above, the result is this XmlDoc (note the
<html> <body> <div id="topOfBody">
foo <br/> bar</body>
Using the updating methods to build HTML content is also superior to using something ostensibly simpler like this SOUL fragment to create output like that above:
Print '<html>' Print ' <body>' Print ' <div id="topOfBody">'
Print ' foo' Print ' <br/>' Print ' bar' Print ' </body>' Print '</html>'
This approach quickly breaks down, primarily because the places at which you compose the content of the XML document can be widely dispersed (the code not all nicely together like this), and because keeping track of the element end tags can become very difficult.
For more information about the character transformations that Janus SOAP applies during deserialization, see Normalization during deserialization.
For a summary of the various output formatting options available to the Janus SOAP serialization methods, see XmlDoc API serialization options.
- When the internal representation of an XmlDoc is EBCDIC (prior to Sirius Mods 7.6), the deserialization methods reject a document if it contains an ISO-10646 (Unicode) character that cannot be represented in EBCDIC. When the internal representation of an XmlDoc is Unicode (Sirius Mods 7.6 and higher), the deserialization methods by default reject a document if it contains a Unicode character that is not translatable to EBCDIC. See Char and Reference for more information about characters in an XML document.
- The encodings that are accepted in the deserialization operations
ISO-8859-n(where n is a digit from 1 to 9). Prior to Sirius Mods Version 7.6, all of the
ISO-8859-nvariants are treated as
ISO-8859-1. As of Sirius Mods 7.6, the variants determine Ascii to Unicode conversions according to the specification of the individual variant.
Note: These encoding names must be specified in uppercase letters.
- When the document is serialized, the result is EBCDIC or is in the UTF-8 encoding.
Therefore, the only values permitted to be set for the Encoding property are
UTF-8and the null string; in that, see Encoding's Usage Notes for more information about the character sets allowed in a serialized input XML document and the value of
encodingin an XML declaration.
Strings and Unicode with the XmlDoc API
As of Sirius Mods version 7.6, XmlDocs are maintained in Unicode rather than EBCDIC; this is true for all string values, names, prefixes, and URIs. As a consequence, most of the arguments and results of the XmlDoc API methods that formerly were strings or longstrings are Unicode strings as of version 7.6.
This switch to Unicode requires little or no change to most existing XmlDoc API applications, however: XmlDoc API argument and result variables declared as String or Longstring are automatically converted from EBCDIC to Unicode by the Sirius Mods. For example, the EBCDIC character strings in the arguments in a statement like the following are automatically converted to Unicode:
Similarly, if the variable
%str, below, was declared as type String or
Longstring, then the Unicode result of the Value method is automatically
converted to EBCDIC when it is stored in
%str = %n:Value
The principal benefit of this switch to Unicode is conformance with the W3C XML standard, which defines "characters" in terms of Unicode characters (most of which are valid in XML documents). You can now store string values that are not translatable to EBCDIC — Sirius Mods 7.5 allows storage only of (most) non-null EBCDIC characters or of characters that translate to those EBCDIC characters.
The automatic EBCDIC/Unicode conversions described above will not cause request cancellations in requests that run successfully under Sirius Mods 7.5. But there are other changes to or effects on the XmlDoc API that are due to the switch to Unicode maintenance (the Sirius Mods Release 7.6 Notes and the individual method descriptions provide additional details):
- The workaround (InvalidChar method) for accommodating nulls and EBCDIC
characters that are not allowed by the XML standard is replaced by:
- The AllowNull property can let nulls be stored in an XmlDoc.
- A method argument (AllowUntranslatable) of the deserialization methods that lets you store Unicode characters that do not translate to EBCDIC. Such characters may be also stored directly by the Add* and Insert* methods of the XmlDoc API; these methods do not require a special argument. EBCDIC characters that do not translate to Unicode must be handled before they are passed to an XmlDoc update operation. For example, EBCDIC X'04' is the SEL ("Select") control character. Since there is no "Select" control character in Unicode, there is no mapping between EBCDIC X'04' and any Unicode character. For this you might use the Untranslatable parameter of the EbcdicToUnicode function.
- If you have defined uninvertible translations,
the implicit translation of EBCDIC string arguments and results to Unicode
as of version 7.6 of the Sirius Mods will change the behavior of the
XmlDoc API methods compared to their operation in version 7.5.
For example, assume CCAIN establishes codepage 0037 as the base,
and that it also uses the following UNICODE commands
to allow for the codepage 1047 square bracket characters:
UNICODE Table Standard Trans E=AD To U=005B UNICODE Table Standard Trans E=BD To U=005D
These UNICODE commands cause uninvertible translations. For example, by the first command, EBCDIC X'AD' translates to U+005B, but by the definition of codepage 0037, U+005B translates to EBCDIC X'BA'. Consequently, you can add a X'AD' character to an XmlDoc, but if you display its value:
%nod:AddElement('leftSquare', 'AD':X) Print %nod:Value('leftSquare'):StringToHex
You get the following result:
The Value method returns the Unicode character U+005B, which is translated implicitly to EBCDIC X'BA' as the string input for StringToHex.
In version 7.5, because XmlDoc strings are stored in EBCDIC, no implicit translation is performed, and the result of the above two statements is:
Note: Model 204 7.6 maintenance added left and right square bracket XHTML entities to reduce your concern about codepages and square brackets. For an example, see the UnicodeAfter method.
- The Print subroutine is equipped to display the Unicode values that are stored in XmlDocs, even if the Unicode characters are not translatable to EBCDIC. If non-translatable Unicode characters are stored in XmlDoc Attribute or Element values, Print displays their XML hexadecimal character references. If non-translatable Unicode characters are stored in a context other than Element or Attribute (a name, Comment, or Pi), the Print CharacterEncodeAll option is required to display a character reference and avoid request cancellation.
- As described further in Implicit Unicode conversions,
the User Language Print
statement under Sirius Mods 7.6 does not cancel the request
if it is presented with a Unicode character that does not translate to EBCDIC.
If it encounters an untranslatable Unicode character, Print will display
an EBCDIC string that contains the character's hex encoding.
As an example, consider the direct printing of the output of Value.
Say the element node assigned to
%nodeYcontains the Unicode trademark character (U+2122), which does not translate to EBCDIC. The following statement succeeds because the Print statement can handle untranslatable Unicode characters:
The result under Sirius Mods 7.5 is a request cancellation. The result under Sirius Mods 7.6 is:
However, the following common operation using the StringToHex method with Value does not succeed:
When StringToHex attempts to implicitly convert to EBCDIC the Unicode character passed to it by the Value function, the conversion fails because the character is not translatable to EBCDIC, and the request is cancelled. Such an implicit conversion, which simply uses the current Unicode translation tables, does not do character encoding.
To avoid a request cancellation here and view the Value result, you can use the UnicodeToUtf16 function to encode the Unicode character as a UTF-16 string for input to StringToHex:
For more information about the characters that are valid in an XmlDoc API XML document, see XML processing in Janus_SOAP: ISO-10646 and EBCDIC characters
Using Longstrings or Unicode instead of Strings
Either the Unicode or Longstring datatype provides an atomic type that can contain a string longer than 255 bytes. The XmlDoc API methods, like all Janus SOAP methods, accept strings longer than 255 whenever they have a string argument or result, which is to say:
- Input values may exceed 255 bytes in length.
- Various XmlDoc API methods will return a string longer than 255 bytes, if indeed the result value exceeds 255 bytes.
The following subsections provide some guidelines to determine when
you must use a longstring (or Unicode, as of Sirius Mods version 7.6)
%variable or context for a string argument or for
the result of a method in the XmlDoc API.
Since the server table requirements and the processing overhead
for Longstring or Unicode are just a little more than for a
255 %variable, it is recommended that you use a Longstring or Unicode in the
XmlDoc API methods wherever you might be using a
String Len 255 %variable.
Xml and Serial methods
You should use a Longstring or Unicode %variable to hold the result of the Xml or Serial methods — the total concatenated length of all markup and character content in a document (or subtree, for Serial) — which will most likely exceed 255 bytes. Thus, the first invocation of the Xml method below will never fail (for length reasons) but the second will usually cause a request cancellation:
%ls Longstring %ls = %doc:Serial %ss String Len 255 %ss = %doc:Serial
Usually you should use a Longstring or Unicode %variable to hold the result of the Value or ValueDefault methods. For example, the first two invocations of the Value method below will succeed but the third will cause a request cancellation:
%ss String Len 255 %ls Longstring %doc:LoadXml('<top> ' With - $Lstr_Left('a', 300) With ' <little>' - With 'Less than 256 chars</little> </top>') %ls = %doc:Value('/top/big') %ss = %doc:Value('/top/little') %ss = %doc:Value('/top/big')
As noted above, the best approach here is to use Longstring or Unicode %variables
where you might use
String Len 255 %variables.
Besides the Xml and Value methods, other XmlDoc API methods either cannot return a value longer than 255 bytes or, with typical XML documents, are unlikely to do so. If you have a namespace URI that exceeds 255 bytes, it may be necessary to use a Longstring or Unicode %variable.
For example, the first two invocations of the Uri method below will succeed, but the third will cause a request cancellation:
%ss String Len 255 %ls Longstring %doc:LoadXml('<top><a:inner xmlns:a="urn:' With - $Lstr_Left('big', 300, '_') With '"/></top>') %ss = %doc:URI('*') %ls = %doc:URI('*/*') %ss = %doc:URI('*/*')
As noted above, the easiest approach here is to use Longstring or Unicode
%variables where you might use
String Len 255 %variables.
Conventions and terminology for XmlDoc API methods
In addition to those described in Notation conventions for methods, the following conventions are also used in the individual XmlDoc API method descriptions:
- Symbols used in the syntax include the following.
Usually, they represent method objects; in actual code, they may be
replaced by object variables of the indicated class or by method invocations that return
such object variables:
- Denotes an abstract class (short for “node reference”) for methods that operate on a node and that can be used with either an XmlNode or an XmlDoc. If an XmlDoc, the node for the operation is the root node.
- Denotes an object of class XmlDoc.
- Denotes an object of class XmlNode.
- Denotes an object of class XmlNodelist.
- Although the terms "XmlNode" and "node" are closely related, effort is made to distinguish them as necessary in the method descriptions. An XmlNode is an object that points to a node in an XmlDoc. Similarly, an XmlNodelist is an object that contains a set, or list, of XmlNodes selected from a particular XmlDoc. Strictly speaking, a "nodelist" does not exist, but the term is occasionally used as an abbreviation or generalization of XmlNodelist.
- Null objects, null strings, empty results
- A Null object is one that has been deleted or that has not been instantiated. A "null" string is a zero-length string value. The text in the method descriptions distinguishes these two terms.
- Object-type arguments must not be Null, unless that argument explicitly allows Null. Hence, a Null argument typically causes a request cancellation. Currently, no XmlDoc API methods allow Null object arguments, and the "Request Cancellation Errors" section for each method does not include this condition.
- Some methods that have an XPath argument allow the result of the XPath expression to be the empty set of nodes; most, however, will cancel the request if this happens. Each method that has an XPath argument will either list the empty XPath result as a request cancellation error, or will explain the operation of the method when the XPath result is the empty nodeset.