Serial (XmlDoc/XmlNode function): Difference between revisions
m (1 revision) |
m (1 revision) |
(No difference)
|
Revision as of 17:46, 25 January 2011
Serialize selected subtree as string (XmlDoc and XmlNode classes)
This function converts an XmlDoc subtree to the UTF-8 or EBCDIC
text string representation of the subtree.
(This process is called serialization,
because the text representation of a document is called the serial
form.)
Syntax
%string = nr:Serial[( [xpath], [options], [AddTrailingDelimiter= boolean])] Throws XPathError
Syntax terms
%utfOrEbcd | A string variable for the string serialization of the subtree, encoded either in UTF-8 or, if the EBCDIC option (see below) is used, in EBCDIC. | ||||||
---|---|---|---|---|---|---|---|
nr | An XmlDoc or XmlNode, used as the context node for the XPath expression. If an XmlDoc, the Root node is the context node. | ||||||
XPath | A Unicode string that is an XPath expression that results in a nodelist, the head of which is the top of the subtree to serialize. This is an optional argument; its default is a period (.), that is, the node referenced by the method object (nr).
Prior to Sirius Mods version 7.6, this is an EBCDIC string. | ||||||
options | A blank delimited string that can contain one or more of the following:
| ||||||
AddTrailingDelimiter=bool | This Boolean, name required parameter determines whether a final line-end character is added to the serialization when one of the Serial method line-end options (LF, CR, or CRLF) is specified.
The default value of AddTrailingDelimiter is True, so Serial specified with a line-end option adds a trailing line-end by default. If AddTrailingDelimiter is specified as False, no final line-end character is added. Specifying the AddTrailingDelimiter argument without also specifying one of the line-end options has no effect on the resulting serialization. AddTrailingDelimiter is new as of Sirius Mods version 7.0. It may be useful if a digital signature must be created which includes line-end characters between XML tags, but the XmlDoc does not contain those line-end Text nodes. |
Usage notes
- To obtain a Longstring that is the UTF-8 serialization of an entire XmlDoc, including the "XML declaration," use ??Xml.
- The options argument values may be specified in any case. For example, XmlDecl and xmldecl are interchangeable.
- Line-end/whitespace characters:
- Using one of the line-end character options (CR, LF, CRLF) produces output that is similar to the BothCompact option of the Print method (??Print).
- If one of the line-end (CR, LF, CRLF) options is specified, and an element to be serialized has the xml:space="preserve" attribute, then within the serialization of that element and its descendants, no line-end characters are inserted. In addition, the xml:space="default" attribute has no effect under these options: specified by itself, it does not influence serialization, nor does it cause resumption of readability line-ends or indents if they were suspended by a containing xml:space="preserve".
- As of version 6.7, the Serial method uses the hexadecimal
character references specified in the XML Canonicalization specification
(:hp0 color=SirLink.http://www.w3.org/TR/xml-c14n:ehp0.) to
display the following whitespace characters:
- For Attribute nodes: tab, carriage return (CR), and linefeed (LF)
- For Text nodes: carriage return
Since the character references are not subject to the standard XML whitespace normalization (?? refid=normwhi.), a serialized document (or subtree) that is then deserialized will retain this whitespace.
These character references are used:
- tab
- 	
- CR
- 
- LF
- 

The EBCDIC and corresponding ASCII encodings of the characters is:
- &thinsp.
- EBCDIC ASCII
- tab
- X'05' X'09'
- CR
- X'0D' X'0D'
- LF
- X'25' X'0A'
- As of Sirius Mods version 7.6, Attribute values are always serialized within double-quotation-mark (") delimiters, and a double-quotation mark character in an attribute value is serialized as ". Prior to version 7.6, this convention was not strictly observed.
- Canonicalization:
Canonicalization refers to
a particular serialization of an XML document that is
unique, yet still a logically equivalent representation
of the document.
Exclusive canonicalization is canonicalization augmented by rules for
preserving or excluding the namespace context (declaration) of nodes when
only a portion of an XML document is serialized.
Therefore, if a portion (subtree) of an XML document is exclusively
canonicalized, it is
serialized uniquely and is "substantially independent of its XML context"
(that is, contains all essential and no extraneous information from its
ancestor nodes).
This independence makes the subtree suitable for working with digital signatures.
Some of the many requirements for canonicalization are provided automatically
by specifying the Serial method with no options specified.
For example, UTF-8 encoding and exclusion of the XML declaration, if any,
are provided by default by Serial.
Specifying ExclCanonical, which is new as of Sirius Mods version 7.0,
adds the following features to the no-option default:
- Sorting of namespace declarations (based on the prefix being declared) and of attributes (based on the namespace URI followed by the local name). The sort order is from lowest to highest, and it uses the Unicode code ordering (for example, numbers are lower than letters).
- For empty elements, serialization with both a start tag and an end tag, instead of using a single "empty element tag."
- The suppression of any Comment nodes that may be present in the subtree. Comment nodes are suppressed unless the WithComments option is specified along with ExclCanonical. For an example, see item ?? refid=namspx5..
- Special namespace declaration handling: A namespace declaration is produced only if it is utilized by an element or attribute in the subtree. The declaration is produced in the start-tag of an element that uses it (or has an attribute using it), unless the parent of the element is in the subtree and the declaration is in scope at the parent. For examples, see items ?? refid=namspx1. and ?? refid=namspx2..
- Attribute values are always serialized within
double-quotation-mark (") delimiters,
and a double-quotation mark character in an attribute value is serialized
as ".
With or without the ExclCanonical option,
these special characters in attribute values are serialized
as entity and hexadecimal character references:
- The ampersand (&) is serialized as &
- The less-than symbol (<) is serialized as <
- The carriage return (CR) character is serialized as 
- The linefeed (LF) character is serialized as 

- The tab character is serialized as 	
For examples, see item ?? refid=namspx6..
- Within Text nodes, the following characters are
serialized as entity and hexadecimal character references:
If you specify Serial with no options:
- The less-than symbol (<) is serialized as <
- The ampersand (&) is serialized as &
- The carriage return (CR) character is serialized as 
If you specify the ExclCanonical option, the following is also true:
- The greater-than symbol (>) is serialized as >
For examples, see item ?? refid=namspx6..
- If serializing the Root of an XmlDoc, a linefeed character is inserted between the children of the Root. This character is represented exactly by X'25' if the EBCDIC option of Serial is used; otherwise it is represented by X'0A'. Note: No linefeed is inserted if the XmlDoc has one PI or Comment node and does not have an Element node. In this case (which is allowed by Janus SOAP), the XML document is not well-formed and therefore the canonicalization specifications ignore it.
- If the subtree to be serialized is a single node that is either of these:
- A PI child of the Root
- A single node that is a Comment child of the Root and the WithComments option is specified
Then a linefeed character is added after the PI or Comment if there is a following Element sibling, or is added before the PI or Comment if there is a preceding Element sibling. Note: No linefeed is inserted if the XmlDoc does not have an Element node. In this case (which is allowed by Janus SOAP), the XML document is not well-formed and therefore the canonicalization specifications ignore it.
Qualifications/exceptions:
- The canonicalization specifications, especially exclusive canonicalization, include references to the serialization of a subset of a document. The ExclCanonical option is based not on a subset but on a subtree.
- Although the ExclCanonical and SortCanonical options use the "Unicode" sort sequence, this is currently limited to Unicode values less than 256 (as of version &NUNCVSN. of Janus SOAP), so it is accomplished with an 8-byte EBCDIC to 8-byte Unicode table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII translation.
- The specifications support an argument to canonicalization that is a list of namespace declarations that are to be "forced" into the serialization. The ExclCanonical option does not provide this support.
A series of examples of the effects of the ExclCanonical option begins with item ?? refid=namspx1..
Examples
- In the following example, the Serial method EBCDIC
formatting of a document is shown.
A Print statement display of the default UTF-8 formatting of Serial is a
string that is not readily decipherable.
Begin %doc is Object XmlDoc %sl is Object Stringlist %doc = New %sl = New text to %sl <top> <a> 05 </a> <c> <d att="val"/> </c> </top> end text Call %doc:LoadXml(%sl) Print 'Serial method output follows:' Print %doc:Serial('top', 'ebcdic') End
The example results follow:
Serial method output follows: <top><a>05</a><c><d att="val"/></c></top>
- This and the remaining examples show various aspects of
the ExclCanonical option.
The examples use the EBCDIC option to display
the result.
If using ExclCanonical for digital signature processing, you probably
should omit the EBCDIC option and use the default encoding, UTF-8.
Under exclusive canonicalization, a namespace is not serialized if it is not
necessary.
In this example, the subtree to be serialized is displayed in blue font
in the request code that follows:
Begin %doc is Object XmlDoc %doc = New %l is longstring %sl is object stringlist %sl = New text to %sl <top> :hp2 color=blue.<a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1"> <p1:b/> <p2:b/> </a> </top> end text Call %doc:LoadXml(%sl) Print 'Exclcan via ParseLines:' %sl = New %l=%doc:Serial('top/a', 'EBCDIC exclcanonical indent 2 lf') %sl:Parselines(%l) %sl:Print End
The exclusive canonical serialization (displayed, after being parsed from string to Stringlist, with line breaks and indent for the sake of clarity) omits the declaration for p3, because it is not utilized in the serialized subtree:
<a> <p1:b xmlns:p1="urn:p1"></p1:b> <p2:b xmlns:p2="urn:p2"></p2:b> </a>
An element utilizes an in-scope namespace declaration in either of these cases:
- The element is prefixed and the declaration is of that prefix.
- The element is unprefixed and it is a default namespace declaration.
An attribute utilizes an in-scope namespace declaration if the attribute is prefixed and the declaration is of that prefix.
In the preceding example, there was no alternative to removing the non-utilized declaration for p3, but if it were utilized by a descendant element "lower" in the document tree, it would be moved to that element.
Another application of the utilization rule is shown in the next example.
- Under exclusive canonicalization, namespaces are imported to where
they are needed.
Using the same type of request as in example ?? refid=namspx1 page=no. above,
the w element is the subtree to serialize (display form, blue font):
<a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1">
- hp2 color=blue.
Exclusive canonical serialization (display form), which gets required namespace declarations from an ancestor of the serialized subtree:
<w> <p1:b xmlns:p1="urn:p1"></p1:b> <p2:b xmlns:p2="urn:p2"></p2:b> </w>
- PIs and Comments
Using the same type of request as in example ?? refid=namspx1 page=no. above,
this is the subtree to be serialized (display form):
<a> <w> <?pi-without-data?> </w> </a>
Exclusive canonical serialization (display form), which omits the Comment node:
<a> <w> <?pi-without-data?> </w> </a>
Note: To include the Comment node, specify also the WithComments option of Serial.
- Character references
Using the same type of request as in example ?? refid=namspx1 page=no. above,
this is the subtree to serialize (display form):
<doc> <comp>val>"0" val<"10"</comp> <comp expr='val>"0"'></comp> <norm attr=' ' 
	 ' '/> <white>	
</white> </doc>
This is the result from Serial method with no options specified (display form, and the <white> element has a line that wraps to emphasize the non-visible linefeed character it contains):
<doc> <comp>val>"0" val<"10"</comp> <comp expr='val>"0"'></comp> <norm attr=" ' 
	 ' "/> <white> 
 </white> </doc>
The exclusive canonical serialization follows (display form, wrapped <white> element line has no indent).
<doc> <comp>val:hp2 color=blue.>"0" val<"10"</comp> <comp expr=:hp2 color=blue."val>:hp2 color=blue."0:hp2 color=blue.""></comp> <norm attr=" ' 
	 ' ">:hp2 color=blue.</norm> <white> 
 </white> </doc>
The differences from no-option Serial (blue font) include:
- The greater-than symbol (>) within a text node is serialized as >.
- Attribute values are enclosed in double-quotation marks (").
- A double-quotation mark in an attribute value is serialized as ".
- An empty element is serialized with two tags (a start tag followed by an end tag), not with a single empty-element tag.
Request-Cancellation Errors
- XPath is invalid.
- Result of XPath is empty.
- Options are invalid.
- Insufficient free space exists in CCATEMP.
See also
- The subroutine that serializes an XmlDoc and sends it as a Web response is ??WebSend, described below.
- Additional serializing methods include:
- Xml
- AddXml (HttpRequest class, described in the Janus SocketsR.)
- To deserialize a string, use ??LoadXml or ??WebReceive.
- For more information about using XPath expressions, see XPath.