XmlDoc API serialization options: Difference between revisions

From m204wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 124: Line 124:
<td>This indicates that the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>), if the value of the [[Version (XmlDoc property)|Version]] property is a non-null string, and if the <var>XmlDoc</var> is not empty.
<td>This indicates that the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>), if the value of the [[Version (XmlDoc property)|Version]] property is a non-null string, and if the <var>XmlDoc</var> is not empty.


<var>XmlDecl</var> may only be specified if the top of the subtree being serialized is the Root node.</td></tr>
<var>XmlDecl</var> may only be specified if the top of the subtree being serialized is the <var>Root</var> node.</td></tr>


<tr><td>
<tr><td>
Line 130: Line 130:
<tr><td>In methods:<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br>[[Xml (XmlDoc function)|Xml]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</td></tr>
<tr><td>In methods:<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br>[[Xml (XmlDoc function)|Xml]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</td></tr>
</table></td>
</table></td>
<td>These indicate whether or not the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>). <var>AllowXmlDecl</var> (the default) may only be specified if the value of the [[Version (XmlDoc property)|Version]] property is a non-null string, and if the top of the subtree being serialized is the Root node. <var>AllowXmlDecl</var> and <var>NoXmlDecl</var> may not both be specified.</td></tr>
<td>These indicate whether or not the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>). <var>AllowXmlDecl</var> (the default) may only be specified if the value of the [[Version (XmlDoc property)|Version]] property is a non-null string, and if the top of the subtree being serialized is the <var>Root</var> node. <var>AllowXmlDecl</var> and <var>NoXmlDecl</var> may not both be specified.</td></tr>
</table>
</table>


Line 176: Line 176:


The <var>XmlDoc</var>, <code>%doc</code>, above is not a legal XML document,
The <var>XmlDoc</var>, <code>%doc</code>, above is not a legal XML document,
because the ampersand (<tt>&</tt>) is not a legal name character.
because the ampersand (<code>&</code>) is not a legal name character.
Similarly, for
Similarly, for
an untranslatable Unicode character added to a document
an untranslatable Unicode character added to a document
Line 220: Line 220:
specified along with <var>ExclCanonical</var>.
specified along with <var>ExclCanonical</var>.


For an example, see [[#PIs and Comments]]item [[??]] refid=namspx5..
For an example, see [[#PIs and Comments|"PIs and Comments"]].
<li>Special namespace declaration handling: A namespace declaration is produced
<li>Special namespace declaration handling: A namespace declaration is produced
only if it is utilized by an element or attribute in the subtree.
only if it is utilized by an element or attribute in the subtree.
Line 228: Line 228:
declaration is in scope at the parent.
declaration is in scope at the parent.


For examples, see items [[??]] refid=namspx1. and [[??]] refid=namspx2..
For examples, see [[#Namespace serialization|"Namespace serialization"]] and [[#Namespace importing|"Namespace importing"]].
<li>''Attribute values'' are always serialized within
<li>''Attribute values'' are always serialized within
double-quotation-mark (<tt>"</tt>) delimiters,
double-quotation-mark (<code>"</code>) delimiters,
and a double-quotation mark character in an attribute value is serialized
and a double-quotation mark character in an attribute value is serialized
as <tt>&amp;quot;</tt>.
as <code>&amp;quot;</code>.


With or without the ExclCanonical option,
With or without the <var>ExclCanonical</var> option,
these special characters in attribute values are serialized
these special characters in attribute values are serialized
as entity and hexadecimal character references:
as entity and hexadecimal character references:
<ul>
<ul>
<li>The ampersand (&) is serialized as <tt>&amp;amp;</tt>
<li>The ampersand (&) is serialized as <code>&amp;amp;</code>
<li>The less-than symbol (<) is serialized as <tt>&amp;lt;</tt>
<li>The less-than symbol (<) is serialized as <code>&amp;lt;</code>
<li>The carriage return (CR) character is serialized as <tt>&amp;#xD;</tt>
<li>The carriage return (CR) character is serialized as <code>&amp;#xD;</code>
<li>The linefeed (LF) character is serialized as <tt>&amp;#xA;</tt>
<li>The linefeed (LF) character is serialized as <code>&amp;#xA;</code>
<li>The tab character is serialized as <tt>&amp;#x9;</tt>
<li>The tab character is serialized as <code>&amp;#x9;</code>
</ul>
</ul>


For examples, see item [[??]] refid=namspx6..
For examples, see [[#Character references|"Character references"]].


<li>Within ''Text nodes'', the following characters are
<li>Within ''Text nodes'', the following characters are
serialized as entity and hexadecimal character references:
serialized as entity and hexadecimal character references:
 
<p>
If you specify <var>Serial</var> with no options:
If you specify <code>Serial</code> with no options:</p>
<ul>
<ul>
<li>The less-than symbol (<) is serialized as <tt>&amp;lt;</tt>
<li>The less-than symbol (<tt><</tt>) is serialized as <code>&amp;lt;</code>
<li>The ampersand (&) is serialized as <tt>&amp;amp;</tt>
<li>The ampersand (<tt>&</tt>) is serialized as <code>&amp;amp;</code>
<li>The carriage return (CR) character is serialized as <tt>&amp;#xD;</tt>
<li>The carriage return (CR) character is serialized as <code>&amp;#xD;</code>
</ul>
</ul>


If you specify the ExclCanonical option, the following is ''also'' true:
If you specify the <var>ExclCanonical</var> option, the following is ''also'' true:
<ul>
<ul>
<li>The greater-than symbol (>) is serialized as <tt>&amp;gt;</tt>
<li>The greater-than symbol (<tt>></tt>) is serialized as <code>&amp;gt;</code>
</ul>
</ul>


For examples, see item [[??]] refid=namspx6..
For examples, see [[#Character references|"Character references"]].
<li>If serializing the Root of an <var>XmlDoc</var>, a linefeed character
<li>If serializing the <var>Root</var> of an <var>XmlDoc</var>, a linefeed character
is inserted ''between'' the children of the Root.
is inserted ''between'' the children of the <var>Root</var>.
This character is represented exactly
This character is represented exactly
by <tt>X'25'</tt> if the <tt>EBCDIC</tt> option of <var>Serial</var> is used; otherwise
by <code>X'25'</code> if the <var>EBCDIC</var> option of <var>Serial</var> is used; otherwise
it is represented by <tt>X'0A'</tt>.
it is represented by <code>X'0A'</code>.
 
<p>
'''Note:'''
'''Note:'''
No linefeed is inserted if the <var>XmlDoc</var> has one PI or Comment node and
No linefeed is inserted if the <var>XmlDoc</var> has one <var>PI</var> or <var>Comment</var> node and does not have an <var>Element</var> node.
does not have an Element node.
In this case (which is allowed by [[Janus SOAP]]), the XML document is not well-formed
In this case (which is allowed by [[Janus SOAP]]), the XML document is not well-formed
and therefore the canonicalization specifications ignore it.
and therefore the canonicalization specifications ignore it.</p>
<li>If the subtree to be serialized is a single node that is either of these:
<li>If the subtree to be serialized is a single node that is either of these:
<ul>
<ul>
<li>A PI child of the Root
<li>A <var>PI</var> child of the <var>Root</var>
<li>A single node that is a Comment child of the Root and
<li>A single node that is a <var>Comment</var> child of the <var>Root</var> and
the <tt>WithComments</tt> option is specified
the <var>WithComments</var> option is specified
</ul>
</ul>


Then a linefeed character is added after the PI or Comment if
Then a linefeed character is added after the <var>PI</var> or <var>Comment</var> if
there is a following Element sibling, or is added before the PI or Comment
there is a following <var>Element</var> sibling, or is added before the <var>PI</var> or <var>Comment</var> if there is a preceding <var>Element</var> sibling.
if there is a preceding Element sibling.


'''Note:'''
'''Note:'''
No linefeed is inserted if the <var>XmlDoc</var> does not have an Element node.
No linefeed is inserted if the <var>XmlDoc</var> does not have an <var>Element</var> node.
In this case (which is allowed by [[Janus SOAP]]), the XML document is not well-formed
In this case (which is allowed by [[Janus SOAP]]), the XML document is not well-formed
and therefore the canonicalization specifications ignore it.
and therefore the canonicalization specifications ignore it.
Line 296: Line 294:
exclusive canonicalization, include references to the
exclusive canonicalization, include references to the
serialization of a ''subset'' of a document.
serialization of a ''subset'' of a document.
The ExclCanonical option is based not on a subset but on a ''subtree''.
The <var>ExclCanonical</var> option is based not on a subset but on a ''subtree''.
<li>Although the ExclCanonical and SortCanonical options use
<li>Although the <var>ExclCanonical</var> and <var>SortCanonical</var> options use
the "<var>Unicode</var>" sort sequence,
the "Unicode" sort sequence,
this is currently limited to <var>Unicode</var> values less than 256 (as
this is currently limited to Unicode values less than 256 (as
of version &NUNCVSN. of [[Janus SOAP]]),
of version 7.7 of [[Janus SOAP]]),
so it is accomplished with an 8-byte EBCDIC to 8-byte
so it is accomplished with an 8-byte EBCDIC to 8-byte
<var>Unicode</var> table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII
Unicode table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII
translation.
translation.
<li>The specifications support an argument to canonicalization that
<li>The specifications support an argument to canonicalization that
is a list of namespace declarations that are to be "forced" into the
is a list of namespace declarations that are to be "forced" into the
serialization.
serialization.
The ExclCanonical option does not provide this support.
The <var>ExclCanonical</var> option does not provide this support.
</ul>
</ul>


====Examples====
====Examples====
The following examples show various aspects of
The following examples show various aspects of
the <tt>ExclCanonical</tt> option.
the <var>ExclCanonical</var> option.
The examples use the <tt>EBCDIC</tt> option to display
The examples use the <var>EBCDIC</var> option to display
the result.
the result.
If using ExclCanonical for digital signature processing, you probably
If using <var>ExclCanonical</var> for digital signature processing, you probably
should omit the EBCDIC option and use the default encoding, UTF-8.
should omit the <var>EBCDIC</var> option and use the default encoding, UTF-8.


=====Namespace serialization=====
=====Namespace serialization=====
Line 349: Line 347:
The exclusive canonical serialization (displayed, after being parsed from string
The exclusive canonical serialization (displayed, after being parsed from string
to <var>Stringlist</var>, with line breaks and indent for the sake of clarity)
to <var>Stringlist</var>, with line breaks and indent for the sake of clarity)
omits the declaration for <tt>p3</tt>,
omits the declaration for <code>p3</code>,
because it is not utilized in the serialized subtree:
because it is not utilized in the serialized subtree:
<p class="code"> <a>
<p class="code"> <a>
Line 398: Line 396:
=====PIs and Comments=====
=====PIs and Comments=====


Using the same type of request as in example [[??]] refid=namspx1 page=no. above,
Using the same type of request as in the [[#Namespace serialization|Namespace serialization]] example, this is the subtree to be serialized (display form):
this is the subtree to be serialized (display form):
<p class="code"><a>
<p class="code"><a>
   <!-- Comment 1 -->
   <!-- Comment 1 -->
Line 417: Line 414:
</p>
</p>
'''Note:'''
'''Note:'''
To include the Comment node, specify also the <tt>WithComments</tt>
To include the Comment node, specify also the <var>WithComments</var>
option of <var>Serial</var>.
option of <var>Serial</var>.


=====Character references=====
=====Character references=====


Using the same type of request as in example [[??]] refid=namspx1 page=no. above,
Using the same type of request as in the [[#Namespace serialization|"Namespace serialization"]] example,
this is the subtree to serialize (display form):
this is the subtree to serialize (display form):
<p class="code"><doc>
<p class="code"><doc>
Line 432: Line 429:
</p>
</p>


This is the result from <var>Serial</var> method ''with no options'' specified
This is the result from the <var>Serial</var> method ''with no options'' specified
(display form, and the <tt><white></tt> element has a line that wraps
(display form, and the <code><white></code> element has a line that wraps
to emphasize the non-visible linefeed character it contains):
to emphasize the non-visible linefeed character it contains):
<p class="code"><doc>
<p class="code"><doc>
Line 445: Line 442:


The exclusive canonical serialization follows (display form,
The exclusive canonical serialization follows (display form,
wrapped <tt><white></tt> element line has no indent).
wrapped <code><white></code> element line has no indent).
<p class="code"><doc>
<p class="code"><doc>
   <comp>val<span class="boldGreen">&amp;gt;</span>'''"0" val&amp;lt;"10"</comp>
   <comp>val<span class="boldGreen">&amp;gt;</span>'''"0" val&amp;lt;"10"</comp>
Line 458: Line 455:
<ul>
<ul>
<li>The greater-than symbol (>) within a text node is serialized
<li>The greater-than symbol (>) within a text node is serialized
as <tt>&amp;gt;</tt>.
as <code>&amp;gt;</code>.
<li>Attribute values are enclosed in double-quotation marks (<tt>"</tt>).
<li>Attribute values are enclosed in double-quotation marks (<code>"</code>).
<li>A double-quotation mark in an attribute value is serialized
<li>A double-quotation mark in an attribute value is serialized
as <tt>&amp;quot;</tt>.
as <code>&amp;quot;</code>.
<li>An empty element is serialized with two tags (a start tag
<li>An empty element is serialized with two tags (a start tag
followed by an end tag), not with a single empty-element tag.
followed by an end tag), not with a single empty-element tag.
</ul>
</ul>

Revision as of 00:08, 10 February 2011

Multiple methods in the XmlDoc API serialize (produce the text-string representation of) the contents of an XmlDoc or XmlDoc subtree. These methods include Serial, WebSend, Xml, and Print, Audit, and Trace. Each of these methods has an "options" parameter that is a blank-delimited string (not case-sensitive) of one or more options which control aspects of the output format. These options are summarized below;

OptionDescription
CharacterEncodeAll
In methods:
Print
Audit
Trace
Use character encoding in all contexts to display Unicode characters that do not translate to EBCDIC. If this option is not specified (as of Sirius Mods 7.6), only non-translatable Unicode characters in Attribute or Element values are displayed as character references.

For more information about this option, see "Printing untranslatable Unicode characters", below.

The CharacterEncodeAll option is available as of Sirius Mods version 7.6.
EBCDIC
In methods:
Serial
This indicates that the serialization should be in EBCDIC rather than UTF-8. UTF-8 encoding is provided by default.

Selecting EBCDIC under Sirius Mods 7.6 or higher causes conversion via the Unicode tables of the subtree content, which is stored in Unicode.

Serializing to UTF-8 involves no translation: the stored Unicode characters are merely encoded as UTF-8.
ExclCanonical
In methods:
Serial
This indicates that the output of the serialization will be in exclusive XML canonical form, as defined in the W3C "Exclusive XML Canonicalization" specification (http://www.w3.org/tr/xml-exc-c14n), which is an extension of the "XML Canonicalization" specification (http://www.w3.org/TR/xml-c14n). These specifications constrain serializations to facilitate processing such as digital signatures.

This option, added in Sirius Mods version 7.0, is described in greater detail in "Canonicalization" below. Specifying any of the Serial method CR, LF, CRLF, or Indent options when you also specify ExclCanonical is allowed. Although the resulting output will not be completely canonical, it may be what you require for the purposes of a digital signature, for example. The formatting addressed by those options is defined in the Exclusive Canonicalization specification and covered by the ExclCanonical option.

Similarly, the effect of the XmlDecl option contradicts the Exclusive Canonicalization specification. If you do specify the XmlDecl and ExclCanonical options together, however, the serialized XML Declaration is followed by a linefeed character.
Indent n
In methods:
Serial
WebSend
Xml
Print
Audit
Trace
Inserts space characters (and line-ends, as described for the next option) into the serialized string such that if the string is broken at the line-ends and displayed as a tree, the display of each lower level in the subtree is indented n spaces from the previous level's starting point.

If serialized output with an Indent value of 2 is displayed as a tree, the spacing is as in the following:

<top> <leaf1 xx="yy">value</leaf1> <sub> <leaf2>value</leaf2> </sub> </top>

One of the line-end options, below, must also be specified.

n is a non-negative integer, and its maximum value (as of Sirius Mods version 7.0) is 254.
CR
LF
CRLF
In methods:
Serial
Xml
WebSend
Line-end options for the method output:
CR Insert a carriage-return character.
LF Insert a linefeed character.
CRLF Insert a carriage-return character followed by a linefeed character.
You specify one of the line-end options above to provide line breaks in the output after any of the following is serialized:
  • An element start-tag, if it has any non-text node children
  • An element end tag
  • An empty element tag
  • A processing instruction (PI)
  • A comment
  • A text node, if it has any siblings
Note: If one of these line-end options is specified and an AddTrailingDelimiter=false argument is also specified, no line-end character is added at the end of the serialized subtree.
NoEmptyElt
In methods:
Serial
WebSend
Xml
Print
Audit
Trace
Deprecated as of Sirius Mods version 7.0, this option ensures that all empty elements are serialized with a start tag followed by an end tag. For example:
     <middleName></middleName> 
If NoEmptyElt is not specified, the default is to serialize an empty element with an empty element tag; using the same example as above, this would be:
     <middleName/> 
The ExclCanonical option provides the same empty element serialization as NoEmptyElement.
OmitNullElement
In methods:
Serial
WebSend
Xml
Print
Audit
Trace
An Element node that has no children and no Attributes will not be serialized, unless it is the top level Element in the subtree being serialized. The serialization of a child-less and Attribute-less Element is omitted, even if the Element's serialization would contain Namespace declarations in its start tag.

If an Element node has no Attributes, but has (only) Element children (one or more), and all of its children are Attribute-less and child-less, then that parent Element is serialized, even though its content in the serialization is empty. That parent is serialized with a start tag and an end tag (and an inserted line separator, if called for by the serializing method's parameter options). For example, if the Serial method display of a particular XmlDoc in tree format is the following when OmitNullElement is not specified:

<top> <middle> <empty/> <p:empty2 xmlns:p="uri:stuff"/> </middle> </top>

Here is the display of the XmlDoc with the OmitNullElement option specified:

<top> <middle> </middle> </top>

But if you attempt to display only the empty subtree of the XmlDoc using OmitNullElement, the empty node is not suppressed, and the result is:

</empty>

The OmitNullElement option is available as of Sirius Mods version 7.3.
SortCanonical
In methods:
Serial
WebSend
Xml
Print
Audit
Trace
Deprecated as of Sirius Mods version 7.0, SortCanonical serializes namespace declarations (based on the prefix being declared) and attributes (based on the namespace URI followed by the local name) in sorted order. This can be useful, for instance, when using Serial to serialize a portion of an XML document for a signature.

The sort order for namespace declarations and attributes is from lowest to highest, and it uses the Unicode code ordering (for example, numbers are lower than letters).

Added in Sirius Mods version 6.9, this option is superseded by the ExclCanonical option.
WithComments
In methods:
Serial
This indicates that all Comment nodes in the specified subtree are to be included in the serialized output.

Note: This option, added in Sirius Mods version 7.0, is only a supplement to the ExclCanonical option: specifying WithComments without specifying ExclCanonical has no effect. Specifying ExclCanonical without specifying WithComments causes all Comment nodes to be suppressed from the result.

XmlDecl
In methods:
Serial
This indicates that the serialized XmlDoc will contain the "XML Declaration" (<?xml version=...?>), if the value of the Version property is a non-null string, and if the XmlDoc is not empty. XmlDecl may only be specified if the top of the subtree being serialized is the Root node.
AllowXmlDecl
NoXmlDecl
In methods:
WebSend
Xml
Print
Audit
Trace
These indicate whether or not the serialized XmlDoc will contain the "XML Declaration" (<?xml version=...?>). AllowXmlDecl (the default) may only be specified if the value of the Version property is a non-null string, and if the top of the subtree being serialized is the Root node. AllowXmlDecl and NoXmlDecl may not both be specified.

Printing untranslatable Unicode characters

As of Sirius Mods version 7.6, XmlDoc content is stored in Unicode. To serialize to EBCDIC, the Print method uses the Unicode tables to convert the XmlDoc content.

One feature of the conversion from Unicode is that the Print method displays non-translatable Unicode characters stored in Attribute or Element values as character references. For example:

%doc:AddElement('top', '&#x2122;':U) %doc:Print

The result of this fragment is:

<top>&#x2122;</top>

However, with default serialization options, when an untranslatable Unicode character occurs in a context other than Element or Attribute value (that is, a name, comment, or PI), character encoding is not used. Because it is an element name, for example, the following statements result in a request cancellation:

%doc:AddElement('&#x2122;':U) %doc:Print

The Print method fails, attempting to translate the element name, the U+2122 character, to EBCDIC. This request cancellation can be prevented by using the CharacterEncodeAll option:

%doc:AddElement('&#x2122;':U) %doc:Print(, 'CharacterEncodeAll')

The result of the above fragment is:

<&#x2122;/>

Note: The result of a Print with CharacterEncodeAll can be misleading. Request cancellation is avoided, but it produces multiple EBCDIC characters where only a single Unicode character is stored.

The XmlDoc, %doc, above is not a legal XML document, because the ampersand (&) is not a legal name character. Similarly, for an untranslatable Unicode character added to a document with AddComment or AddPI: printing with CharacterEncodeAll produces a stream of characters that informs about a single character reference but, if deserialized, would result in multiple stored characters. The standard XML syntax does not recognize character references as such in names, Comments, and PIs.

Canonicalization

Canonicalization refers to a particular serialization of an XML document that is unique, yet still a logically equivalent representation of the document. Exclusive canonicalization is canonicalization augmented by rules for preserving or excluding the namespace context (declaration) of nodes when only a portion of an XML document is serialized.

Therefore, if a portion (subtree) of an XML document is exclusively canonicalized, it is serialized uniquely and is "substantially independent of its XML context" (that is, contains all essential and no extraneous information from its ancestor nodes). This independence makes the subtree suitable for working with digital signatures.

Some of the many requirements for canonicalization are provided automatically by specifying the Serial method with no options specified. For example, UTF-8 encoding and exclusion of the XML declaration, if any, are provided by default by Serial. Specifying ExclCanonical, which is new as of Sirius Mods version 7.0, adds the following features to the no-option default:

  • Sorting of namespace declarations (based on the prefix being declared) and of attributes (based on the namespace URI followed by the local name). The sort order is from lowest to highest, and it uses the Unicode code ordering (for example, numbers are lower than letters).
  • For empty elements, serialization with both a start tag and an end tag, instead of using a single "empty element tag."
  • The suppression of any Comment nodes that may be present in the subtree. Comment nodes are suppressed unless the WithComments option is specified along with ExclCanonical. For an example, see "PIs and Comments".
  • Special namespace declaration handling: A namespace declaration is produced only if it is utilized by an element or attribute in the subtree. The declaration is produced in the start-tag of an element that uses it (or has an attribute using it), unless the parent of the element is in the subtree and the declaration is in scope at the parent. For examples, see "Namespace serialization" and "Namespace importing".
  • Attribute values are always serialized within double-quotation-mark (") delimiters, and a double-quotation mark character in an attribute value is serialized as &quot;. With or without the ExclCanonical option, these special characters in attribute values are serialized as entity and hexadecimal character references:
    • The ampersand (&) is serialized as &amp;
    • The less-than symbol (<) is serialized as &lt;
    • The carriage return (CR) character is serialized as &#xD;
    • The linefeed (LF) character is serialized as &#xA;
    • The tab character is serialized as &#x9;

    For examples, see "Character references".

  • Within Text nodes, the following characters are serialized as entity and hexadecimal character references:

    If you specify Serial with no options:

    • The less-than symbol (<) is serialized as &lt;
    • The ampersand (&) is serialized as &amp;
    • The carriage return (CR) character is serialized as &#xD;

    If you specify the ExclCanonical option, the following is also true:

    • The greater-than symbol (>) is serialized as &gt;

    For examples, see "Character references".

  • If serializing the Root of an XmlDoc, a linefeed character is inserted between the children of the Root. This character is represented exactly by X'25' if the EBCDIC option of Serial is used; otherwise it is represented by X'0A'.

    Note: No linefeed is inserted if the XmlDoc has one PI or Comment node and does not have an Element node. In this case (which is allowed by Janus SOAP), the XML document is not well-formed and therefore the canonicalization specifications ignore it.

  • If the subtree to be serialized is a single node that is either of these:
    • A PI child of the Root
    • A single node that is a Comment child of the Root and the WithComments option is specified

    Then a linefeed character is added after the PI or Comment if there is a following Element sibling, or is added before the PI or Comment if there is a preceding Element sibling.

    Note: No linefeed is inserted if the XmlDoc does not have an Element node. In this case (which is allowed by Janus SOAP), the XML document is not well-formed and therefore the canonicalization specifications ignore it.

Qualifications/exceptions

  • The canonicalization specifications, especially exclusive canonicalization, include references to the serialization of a subset of a document. The ExclCanonical option is based not on a subset but on a subtree.
  • Although the ExclCanonical and SortCanonical options use the "Unicode" sort sequence, this is currently limited to Unicode values less than 256 (as of version 7.7 of Janus SOAP), so it is accomplished with an 8-byte EBCDIC to 8-byte Unicode table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII translation.
  • The specifications support an argument to canonicalization that is a list of namespace declarations that are to be "forced" into the serialization. The ExclCanonical option does not provide this support.

Examples

The following examples show various aspects of the ExclCanonical option. The examples use the EBCDIC option to display the result. If using ExclCanonical for digital signature processing, you probably should omit the EBCDIC option and use the default encoding, UTF-8.

Namespace serialization

Under exclusive canonicalization, a namespace is not serialized if it is not necessary. In this example, the subtree to be serialized is displayed in green font in the request code that follows:

Begin %doc is Object XmlDoc %doc = New %l is longstring %sl is object stringlist %sl = New text to %sl <top> <a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1"> <p1:b/> <p2:b/> </a> </top> end text Call %doc:LoadXml(%sl) Print 'Exclcan via ParseLines:' %sl = New %l=%doc:Serial('top/a', 'EBCDIC exclcanonical indent 2 lf') %sl:Parselines(%l) %sl:Print End

The exclusive canonical serialization (displayed, after being parsed from string to Stringlist, with line breaks and indent for the sake of clarity) omits the declaration for p3, because it is not utilized in the serialized subtree:

<a> <p1:b xmlns:p1="urn:p1"></p1:b> <p2:b xmlns:p2="urn:p2"></p2:b> </a>

An element utilizes an in-scope namespace declaration in either of these cases:

  • The element is prefixed and the declaration is of that prefix.
  • The element is unprefixed and it is a default namespace declaration.

An attribute utilizes an in-scope namespace declaration if the attribute is prefixed and the declaration is of that prefix.

In the preceding example, there was no alternative to removing the non-utilized declaration for p3, but if it were utilized by a descendant element "lower" in the document tree, it would be moved to that element.

Another application of the utilization rule is shown in the next example.

Namespace importing

Under exclusive canonicalization, namespaces are imported to where they are needed.

Using the same type of request as in the preceding example, the w element is the subtree to serialize (green font):

<a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1"> <w> <p1:b/> <p2:b/> </w> </a>

Exclusive canonical serialization (display form), which gets required namespace declarations from an ancestor of the serialized subtree:

<w> <p1:b xmlns:p1="urn:p1"></p1:b> <p2:b xmlns:p2="urn:p2"></p2:b> </w>

PIs and Comments

Using the same type of request as in the Namespace serialization example, this is the subtree to be serialized (display form):

<a> <w> <?pi-without-data?> </w> </a>

Exclusive canonical serialization (display form), which omits the Comment node:

<a> <w> <?pi-without-data?> </w> </a>

Note: To include the Comment node, specify also the WithComments option of Serial.

Character references

Using the same type of request as in the "Namespace serialization" example, this is the subtree to serialize (display form):

<doc> <comp>val>"0" val&lt;"10"</comp> <comp expr='val>"0"'></comp> <norm attr=' &apos; &#xD;&#xA;&#x9; &apos; '/> <white>&#x9;&#xD;&#xA;</white> </doc>

This is the result from the Serial method with no options specified (display form, and the <white> element has a line that wraps to emphasize the non-visible linefeed character it contains):

<doc> <comp>val>"0" val&lt;"10"</comp> <comp expr='val>"0"'></comp> <norm attr=" ' &#xD;&#xA;&#x9; ' "/> <white> &#xD; </white> </doc>

The exclusive canonical serialization follows (display form, wrapped <white> element line has no indent).

<doc> <comp>val&gt;"0" val&lt;"10"</comp> <comp expr="val>&quot;0&quot;"></comp> <norm attr=" ' &#xD;&#xA;&#x9; ' "></norm> <white> &#xD; </white> </doc>

The differences from no-option Serial (green font) include:

  • The greater-than symbol (>) within a text node is serialized as &gt;.
  • Attribute values are enclosed in double-quotation marks (").
  • A double-quotation mark in an attribute value is serialized as &quot;.
  • An empty element is serialized with two tags (a start tag followed by an end tag), not with a single empty-element tag.