XmlDoc API serialization options: Difference between revisions
mNo edit summary |
mNo edit summary |
||
(30 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
Multiple methods in the [[XmlDoc API]] serialize (produce the text-string representation of) the contents of an XmlDoc or XmlDoc subtree. These methods include [[Serial (XmlDoc/XmlNode function)|Serial]], [[WebSend (XmlDoc subroutine)|WebSend]], [[Xml (XmlDoc function)|Xml]], and [[Print (XmlDoc/XmlNode subroutine)|Print]], [[Audit (XmlDoc/XmlNode subroutine)|Audit]], and [[Trace (XmlDoc/XmlNode subroutine)|Trace]]. The [[AddXml]] method of the [[HttpRequest]] class is | Multiple methods in the [[XmlDoc API]] serialize (produce the text-string representation of) the contents of an <var>XmlDoc</var> or <var>XmlDoc</var> subtree. These methods include <var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var>, <var>[[WebSend (XmlDoc subroutine)|WebSend]]</var>, <var>[[Xml (XmlDoc function)|Xml]]</var>, and <var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var>, <var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var>, and <var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var>. The <var>[[AddXml (HttpRequest subroutine)|AddXml]]</var> method of the <var>[[HttpRequest class|HttpRequest]]</var> class is also included below; it also serializes an <var>XmlDoc</var> and is comparable to the <var>Websend</var> method. | ||
Each of the serialization methods has an "options" parameter that is a blank-delimited string (not case-sensitive) of one or more options which control aspects of the output format. These options are summarized below | Each of the serialization methods has an "options" parameter that is a blank-delimited string (not case-sensitive) of one or more options which control aspects of the output format. These options are summarized in the table below. Following the table are several topics that concern <var>XmlDoc</var> serialization. | ||
==Option descriptions== | ==Option descriptions== | ||
Line 8: | Line 8: | ||
For direct access to a particular option in the table, you can use the following "index" of the options: | For direct access to a particular option in the table, you can use the following "index" of the options: | ||
<ul> | <ul> | ||
<li>[[#AllowNoXmlDecl|AllowXmlDecl]] | <li><var>[[#AllowNoXmlDecl|AllowXmlDecl]]</var> | ||
<li>[[#outformat|AttributeCompact]] | <li><var>[[#outformat|AttributeCompact]]</var> | ||
<li>[[#outformat|BothCompact]] | <li><var>[[#outformat|BothCompact]]</var> | ||
<li>[[#CharacterEncodeAll|CharacterEncodeAll]] | <li><var>[[#CharacterEncodeAll|CharacterEncodeAll]]</var> | ||
<li>[[#outformat|Compact]] | <li><var>[[#outformat|Compact]]</var> | ||
<li>[[#linend|CR]] | <li><var>[[#linend|CR]]</var> | ||
<li>[[#linend|CRLF]] | <li><var>[[#linend|CRLF]]</var> | ||
<li>[[#EBCDIC|EBCDIC]] | <li><var>[[#EBCDIC|EBCDIC]]</var> | ||
<li>[[#outformat|ElementCompact]] | <li><var>[[#outformat|ElementCompact]]</var> | ||
<li>[[#ExclCanonical|ExclCanonical]] | <li><var>[[#ExclCanonical|ExclCanonical]]</var> | ||
<li>[[#outformat|Expanded]] | <li><var>[[#outformat|Expanded]]</var> | ||
<li>[[#Indentn|Indent n]] | <li><var>[[#Indentn|Indent n]]</var> | ||
<li>[[#linend|LF]] | <li><var>[[#linend|LF]]</var> | ||
<li>[[#linend|Newline]] | <li><var>[[#linend|Newline]]</var> | ||
<li>[[#NoEmptyElt|NoEmptyElt]] | <li><var>[[#NoEmptyElt|NoEmptyElt]]</var> | ||
<li>[[#AllowNoXmlDecl|NoXmlDecl]] | <li><var>[[#AllowNoXmlDecl|NoXmlDecl]]</var> | ||
<li>[[#OmitNullElement|OmitNullElement]] | <li><var>[[#OmitNullElement|OmitNullElement]]</var> | ||
<li>[[#SortCanonical|SortCanonical]] | <li><var>[[#SortCanonical|SortCanonical]]</var> | ||
<li>[[#WithComments|WithComments]] | <li><var>[[#UTF-8|UTF-8]]</var> | ||
<li>[[#XmlDecl|XmlDecl]] | <li><var>[[#WithComments|WithComments]]</var> | ||
<li><var>[[#XmlDecl|XmlDecl]]</var> | |||
</ul> | </ul> | ||
<table class="syntaxTable"> | <table class="syntaxTable"> | ||
<tr><th>Option</th><th>Description</th> | <tr class="head"><th>Option</th><th>Description</th> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="CharacterEncodeAll"></div><td>'''CharacterEncodeAll'''</td></tr> | ||
<tr><td>In methods:<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</td></tr> | <tr><td>In methods:<br><var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var><br><var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var><br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var><br><var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>Use character encoding in all contexts to display Unicode characters that do not translate to EBCDIC. If this option is not specified (as of | <td>Use character encoding in all contexts to display Unicode characters that do not translate to EBCDIC. If this option is not specified (as of <var class="product">Sirius Mods</var> 7.6), only non-translatable Unicode characters in <var>Attribute</var> or <var>Element</var> values are displayed as character references. | ||
For more information about this option, see [[# | For more information about this option, see [[#EBCDIC serialization of untranslatable Unicode characters|"EBCDIC serialization of untranslatable Unicode characters"]], below. | ||
The <var>CharacterEncodeAll</var> option is available as of | The <var>CharacterEncodeAll</var> option is available as of <var class="product">Sirius Mods</var> version 7.6. It is available for the <var>Serial</var> method starting with <var class="product">Sirius Mods</var> version 8.0.</td></tr> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="EBCDIC"></div><td>'''EBCDIC'''</td></tr> | ||
<tr><td>In methods:<br>[[Serial (XmlDoc/XmlNode function)|Serial]]</td></tr> | <tr><td>In methods:<br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>This indicates that the serialization | <td>This indicates that the serialization is to be in EBCDIC rather than UTF-8. The <var>Serial</var> method provides UTF-8 encoding by default. | ||
Since XmlDocs are stored in Unicode (under | Since XmlDocs are stored in Unicode (under <var class="product">Sirius Mods</var> 7.6 or higher,), | ||
serializing to UTF-8 involves no translation: the stored Unicode characters are merely encoded as UTF-8. Serializing to EBCDIC causes conversion of the subtree content via the [[Unicode#Support for the ASCII subset of Unicode|Unicode tables]]. Prior to | serializing to UTF-8 involves no translation: the stored Unicode characters are merely encoded as UTF-8. Serializing to EBCDIC causes conversion of the subtree content via the [[Unicode#Support for the ASCII subset of Unicode|Unicode tables]].The [[XmlDoc_API_serialization_options#EBCDIC_serialization_of_untranslatable_Unicode_characters|serialization of untranslatable characters]] is desribed below. Prior to <var class="product">Sirius Mods</var> 7.6, XmlDocs are stored in EBCDIC. | ||
</td> </tr> | </td> </tr> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="ExclCanonical"></div><td>'''ExclCanonical'''</td></tr> | ||
<tr><td>In methods:<br>[[Serial (XmlDoc/XmlNode function)|Serial]]</td></tr> | <tr><td>In methods:<br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>This indicates that the output of the serialization will be in exclusive XML canonical form, as defined in the W3C "Exclusive XML Canonicalization" specification (http://www.w3.org/tr/xml-exc-c14n), which is an extension of the "XML Canonicalization" specification (http://www.w3.org/TR/xml-c14n). These specifications constrain serializations to facilitate processing such as digital signatures. | <td>This indicates that the output of the serialization will be in exclusive XML canonical form, as defined in the W3C "Exclusive XML Canonicalization" specification (http://www.w3.org/tr/xml-exc-c14n), which is an extension of the "XML Canonicalization" specification (http://www.w3.org/TR/xml-c14n). These specifications constrain serializations to facilitate processing such as digital signatures. | ||
This option, added in | This option, added in <var class="product">Sirius Mods</var> version 7.0, is described in greater detail in [[#Canonicalization|"Canonicalization"]] below. | ||
Specifying any of the <var>Serial</var> method <var>CR</var>, <var>LF</var>, <var>CRLF</var>, or <var>Indent</var> options when you also specify <var>ExclCanonical</var> is allowed. Although the resulting output will not be completely canonical, it may be what you require for the purposes of a digital signature, for example. The formatting addressed by those options is defined in the Exclusive Canonicalization specification and covered by the <var>ExclCanonical</var> option. | Specifying any of the <var>Serial</var> method <var>CR</var>, <var>LF</var>, <var>CRLF</var>, or <var>Indent</var> options when you also specify <var>ExclCanonical</var> is allowed. Although the resulting output will not be completely canonical, it may be what you require for the purposes of a digital signature, for example. The formatting addressed by those options is defined in the Exclusive Canonicalization specification and covered by the <var>ExclCanonical</var> option. | ||
Line 63: | Line 64: | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="Indentn"></div><td>'''Indent <i>n</i>'''</td></tr> | ||
<tr><td>In methods:<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Serial (XmlDoc/XmlNode function)|Serial]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br>[[Xml (XmlDoc function)|Xml]]</td></tr> | <tr><td>In methods:<br><var>[[AddXml (HttpRequest subroutine)|AddXml]]</var><br><var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var><br><var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var><br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var><br><var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var><br><var>[[WebSend (XmlDoc subroutine)|WebSend]]</var><br><var>[[Xml (XmlDoc function)|Xml]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>Inserts space characters (and line-ends, as described for the next option) into the serialized string such that if the string is broken at the line-ends and displayed as a tree, the display of each lower level (child element) in the subtree is indented ''n'' spaces from the starting point of the previous level (parent element). | <td>Inserts space characters (and line-ends, as described for the next option) into the serialized string such that if the string is broken at the line-ends and displayed as a tree, the display of each lower level (child element) in the subtree is indented ''n'' spaces from the starting point of the previous level (parent element). | ||
Line 76: | Line 77: | ||
</p> | </p> | ||
<i>n</i> is a non-negative integer, and its maximum value (as of | <i>n</i> is a non-negative integer, and its maximum value (as of <var class="product">Sirius Mods</var> version 7.0) is 254. | ||
For the <var>Print</var>, <var>Audit</var>, and <var>Trace</var> methods only: if the <var>Indent</var> option is omitted, the default indent is 3 spaces. | For the <var>Print</var>, <var>Audit</var>, and <var>Trace</var> methods only: if the <var>Indent</var> option is omitted, the default indent is 3 spaces. | ||
Line 82: | Line 83: | ||
For the <var>Serial</var> method only: One of the line-end options, below, must also be specified. | For the <var>Serial</var> method only: One of the line-end options, below, must also be specified. | ||
<br> | <br> | ||
For <var>WebSend</var> only: <var>Indent</var> may be used with one of the line-end options, below, including <var>Newline</var>. If <var>Indent</var> is specified and no line-end options are also specified, <var>Newline</var> is implied. | For <var>AddXml</var> and <var>WebSend</var> only: <var>Indent</var> may be used with one of the line-end options, below, including <var>Newline</var>. If <var>Indent</var> is specified and no line-end options are also specified, <var>Newline</var> is implied. | ||
</td></tr> | </td></tr> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="linend"></div><td>'''CR<br>LF<br>CRLF<br>Newline'''</td></tr> | ||
<tr><td>In methods:<br>[[Serial (XmlDoc/XmlNode function)|Serial]]<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br> | <tr><td>In methods:<br><var>[[AddXml (HttpRequest subroutine)|AddXml]]</var><br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var><br><var>[[WebSend (XmlDoc subroutine)|WebSend]]</var><br> | ||
[[Xml (XmlDoc function)|Xml]]</td></tr> | <var>[[Xml (XmlDoc function)|Xml]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>Line-end options for the method output: | <td>Line-end options for the method output: | ||
<table> | <table> | ||
<tr><th>CR</th> | <tr><th><var>CR</var></th> | ||
<td>Insert a carriage-return character. </td></tr> | <td>Insert a carriage-return character. </td></tr> | ||
<tr><th>LF</th> | <tr><th><var>LF</var></th> | ||
<td>Insert a linefeed character. </td></tr> | <td>Insert a linefeed character. </td></tr> | ||
<tr><th>CRLF</th> | <tr><th><var>CRLF</var></th> | ||
<td>Insert a carriage-return character followed by a linefeed character. </td></tr> | <td>Insert a carriage-return character followed by a linefeed character. </td></tr> | ||
<tr><th>Newline</th> | <tr><th><var>Newline</var></th> | ||
<td>Insert the line-end sequence defined for this | <td>Insert the line-end sequence defined for this <var>HttpRequest</var> object, or for this <var class="product">Janus Web Server</var> connection (by the <var>[[JANUS DEFINE]]</var> command or <var>[[$Web_Set]]</var>), as the [[LineEnd (HttpRequest property)|line-end]] sequence in the above cases. Available for <var>AddXml</var> and <var>WebSend</var> only. </td></tr> | ||
</table> | </table> | ||
You specify one of the line-end options above to provide line breaks in the output after any of the following is serialized: <ul> <li>An element start-tag, if it has any non-text node children <li>An element end tag <li>An empty element tag <li>A processing instruction (PI) <li>A comment <li>A text node, if it has any siblings </ul> | You specify one of the line-end options above to provide line breaks in the output after any of the following is serialized: <ul> <li>An element start-tag, if it has any non-text node children <li>An element end tag <li>An empty element tag <li>A processing instruction (PI) <li>A comment <li>A text node, if it has any siblings </ul> | ||
Using one of these line-end options produces output that is similar to the <var>[[#outformat|BothCompact]]</var> option.</td></tr> | |||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="NoEmptyElt"></div><td>'''NoEmptyElt'''</td></tr> | ||
<tr><td>In methods:<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Serial (XmlDoc/XmlNode function)|Serial]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br>[[Xml (XmlDoc function)|Xml]]</td></tr> | <tr><td>In methods:<br><var>[[AddXml (HttpRequest subroutine)|AddXml]]</var><br><var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var><br><var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var><br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var><br><var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var><br><var>[[WebSend (XmlDoc subroutine)|WebSend]]</var><br><var>[[Xml (XmlDoc function)|Xml]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>Deprecated as of | <td>Deprecated as of <var class="product">Sirius Mods</var> version 7.0, this option ensures that all empty elements are serialized with a start tag followed by an end tag. For example: <pre> <middleName></middleName> </pre> | ||
If <var>NoEmptyElt</var> is not specified, the default is to serialize an empty element with an empty element tag; using the same example as above, this would be: <pre> <middleName/> </pre> | If <var>NoEmptyElt</var> is not specified, the default is to serialize an empty element with an empty element tag; using the same example as above, this would be: <pre> <middleName/> </pre> | ||
The <var>ExclCanonical</var> option provides the same empty element serialization as <var>NoEmptyElt</var>. Also, the [[NoEmptyElement (XmlNode property)|NoEmptyElement]] <var>XmlNode</var> property specifies whether to serialize childless nodes using a separate start tag and end tag.</td></tr> | The <var>ExclCanonical</var> option provides the same empty element serialization as <var>NoEmptyElt</var>. Also, the <var>[[NoEmptyElement (XmlNode property)|NoEmptyElement]]</var> <var>XmlNode</var> property specifies whether to serialize childless nodes using a separate start tag and end tag.</td></tr> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="OmitNullElement"></div><td>'''OmitNullElement'''</td></tr> | ||
<tr><td>In methods:<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Serial (XmlDoc/XmlNode function)|Serial]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br> | <tr><td>In methods:<br><var>[[AddXml (HttpRequest subroutine)|AddXml]]</var><br><var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var><br><var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var><br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var><br><var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var><br><var>[[WebSend (XmlDoc subroutine)|WebSend]]</var><br> | ||
[[Xml (XmlDoc function)|Xml]]</td></tr> | <var>[[Xml (XmlDoc function)|Xml]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>An <var>Element</var> node that has no children and no <var>Attributes</var> will not be serialized, unless it is the top level <var>Element</var> in the subtree being serialized. The serialization of a child-less and <var>Attribute</var>-less <var>Element</var> is omitted, even if the serialization of the <var>Element</var> would contain <var>Namespace</var> declarations in its start tag. | <td>An <var>Element</var> node that has no children and no <var>Attributes</var> will not be serialized, unless it is the top level <var>Element</var> in the subtree being serialized. The serialization of a child-less and <var>Attribute</var>-less <var>Element</var> is omitted, even if the serialization of the <var>Element</var> would contain <var>Namespace</var> declarations in its start tag. | ||
Line 128: | Line 129: | ||
</top> | </top> | ||
</p> | </p> | ||
Here is the display of the <var>XmlDoc</var> with the OmitNullElement option specified: | Here is the display of the <var>XmlDoc</var> with the <var>OmitNullElement</var> option specified: | ||
<p class="code"> <top> | <p class="code"> <top> | ||
<middle> | <middle> | ||
Line 135: | Line 136: | ||
</p> | </p> | ||
But if you attempt to display only the <code>empty</code> subtree of the <var>XmlDoc</var> using <var>OmitNullElement</var>, the <code>empty</code> node is not suppressed, and the result is: | But if you attempt to display only the <code>empty</code> subtree of the <var>XmlDoc</var> using <var>OmitNullElement</var>, the <code>empty</code> node is not suppressed, and the result is: | ||
<p class=" | <p class="output"> </empty> | ||
</p> | </p> | ||
The <var>OmitNullElement</var> option is available as of | The <var>OmitNullElement</var> option is available as of <var class="product">Sirius Mods</var> version 7.3.</td></tr> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="outformat"></div><td>'''Compact<br>Expanded<br>AttributeCompact<br>ElementCompact<br>BothCompact'''</td></tr> | ||
<tr><td>In methods:<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</td></tr> | <tr><td>In methods:<br><var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var><br><var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var><br><var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>''One'' of the following mutually exclusive '''output formats''': | <td>''One'' of the following mutually exclusive '''output formats''': | ||
<table> | <table> | ||
<tr><th>Compact</th> | <tr><th><var>Compact</var></th> | ||
<td>This is the default. An element's entire start tag is printed on a single line, which includes attributes and namespace declarations. If it has no children or has a single Text child, and does '''not''' have attributes nor namespace declarations, then the Text child is serialized on the same line as the start and end tags. For example: | <td>This is the default. An element's entire start tag is printed on a single line, which includes attributes and namespace declarations. If it has no children or has a single <var>Text</var> child, and does '''not''' have attributes nor namespace declarations, then the <var>Text</var> child is serialized on the same line as the start and end tags. For example: | ||
<p class="code"> <top> | <p class="code"> <top> | ||
<in1 a="xyz" b="foo"> | <in1 a="xyz" b="foo"> | ||
Line 154: | Line 155: | ||
</top> | </top> | ||
</p></td></tr> | </p></td></tr> | ||
<tr><th>Expanded</th> | <tr><th><var>Expanded</var></th> | ||
<td>A new line is started for each attribute, namespace declaration, and child. For example: | <td>A new line is started for each attribute, namespace declaration, and child. For example: | ||
<p class="code"> <top> | <p class="code"> <top> | ||
Line 168: | Line 169: | ||
</top> | </top> | ||
</p></td></tr> | </p></td></tr> | ||
<tr><th>AttributeCompact</th> | <tr><th><var>AttributeCompact</var></th> | ||
<td>Attributes and namespace declarations are printed on the same line as the start tag. For example: | <td>Attributes and namespace declarations are printed on the same line as the start tag. For example: | ||
<p class="code"> <top> | <p class="code"> <top> | ||
Line 180: | Line 181: | ||
</p></td></tr> | </p></td></tr> | ||
<tr><th>ElementCompact</th> | <tr><th><var>ElementCompact</var></th> | ||
<td>An entire element is printed on one line, if it has no attributes nor namespace declarations and has no children other than possibly a Text child. For example: | <td>An entire element is printed on one line, if it has no attributes nor namespace declarations and has no children other than possibly a Text child. For example: | ||
<p class="code"> <top> | <p class="code"> <top> | ||
Line 193: | Line 194: | ||
</p></td></tr> | </p></td></tr> | ||
<tr><th>BothCompact</th> | <tr><th><var>BothCompact</var></th> | ||
<td>The most compacted format, this combines the effect of AttributeCompact and ElementCompact. It displays on one line an element that has no children or that has a single Text child. | <td>The most compacted format, this combines the effect of <var>AttributeCompact</var> and <var>ElementCompact</var>. It displays on one line an element that has no children or that has a single <var>Text</var> child. | ||
<p class="code"> <top> | <p class="code"> <top> | ||
<in1 a="xyz" b="foo">content1</in1> | <in1 a="xyz" b="foo">content1</in1> | ||
Line 202: | Line 203: | ||
</table> | </table> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="SortCanonical"></div><td>'''SortCanonical'''</td></tr> | ||
<tr><td>In methods:<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Serial (XmlDoc/XmlNode function)|Serial]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br>[[Xml (XmlDoc function)|Xml]]</td></tr> | <tr><td>In methods:<br><var>[[AddXml (HttpRequest subroutine)|AddXml]]</var><br><var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var><br><var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var><br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var><br><var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var><br><var>[[WebSend (XmlDoc subroutine)|WebSend]]</var><br><var>[[Xml (XmlDoc function)|Xml]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>Deprecated as of | <td>Deprecated as of <var class="product">Sirius Mods</var> version 7.0, <var>SortCanonical</var> serializes namespace declarations (based on the prefix being declared) and attributes (based on the namespace URI followed by the local name) in sorted order. This can be useful, for instance, when using <var>Serial</var> to serialize a portion of an XML document for a signature. | ||
The sort order for namespace declarations and attributes is from lowest to highest, and it uses the <var>Unicode</var> code ordering (for example, numbers are lower than letters). | The sort order for namespace declarations and attributes is from lowest to highest, and it uses the <var>Unicode</var> code ordering (for example, numbers are lower than letters). | ||
Added in | Added in <var class="product">Sirius Mods</var> version 6.9 as a step towards support for canonicalization, this option is superseded by the <var>ExclCanonical</var> option.</td></tr> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="UTF-8"></div><td>'''UTF-8'''</td></tr> | ||
<tr><td>In methods:<br>[[Serial (XmlDoc/XmlNode function)|Serial]]</td></tr> | <tr><td>In methods:<br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>This indicates that all Comment nodes in the specified subtree are to be included in the serialized output. | <td>This indicates that the serialization should be in UTF-8. This is the default. | ||
<p>'''Note:''' This option, added in | </td> </tr> | ||
<tr><td> | |||
<table class="noBorder"><tr><div id="WithComments"></div><td>'''WithComments'''</td></tr> | |||
<tr><td>In methods:<br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var></td></tr> | |||
</table></td> | |||
<td>This indicates that all <var>Comment</var> nodes in the specified subtree are to be included in the serialized output. | |||
<p>'''Note:''' This option, added in <var class="product">Sirius Mods</var> version 7.0, is only a supplement to the <var>ExclCanonical</var> option: specifying <var>WithComments</var> without specifying <var>ExclCanonical</var> has no effect. Specifying <var>ExclCanonical</var> without specifying <var>WithComments</var> causes all <var>Comment</var> nodes to be suppressed from the result.</p></td></tr> | |||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="XmlDecl"></div><td>'''XmlDecl'''</td></tr> | ||
<tr><td>In methods:<br>[[Serial (XmlDoc/XmlNode function)|Serial]]</td></tr> | <tr><td>In methods:<br><var>[[Serial (XmlDoc/XmlNode function)|Serial]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>This indicates that the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>), if the value of the [[Version (XmlDoc property)|Version]] property is a non-null string, and if the <var>XmlDoc</var> is not empty. | <td>This indicates that the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>), if the value of the <var>[[Version (XmlDoc property)|Version]]</var> property is a non-null string, and if the <var>XmlDoc</var> is not empty. | ||
<var>XmlDecl</var> may only be specified if the top of the subtree being serialized is the <var>Root</var> node.</td></tr> | <var>XmlDecl</var> may only be specified if the top of the subtree being serialized is the <var>Root</var> node.</td></tr> | ||
<tr><td> | <tr><td> | ||
<table class=" | <table class="noBorder"><tr><div id="AllowNoXmlDecl"></div><td>'''AllowXmlDecl<br>NoXmlDecl'''</td></tr> | ||
<tr><td>In methods:<br>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]<br>[[Print (XmlDoc/XmlNode subroutine)|Print]]<br>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]<br>[[WebSend (XmlDoc subroutine)|WebSend]]<br>[[Xml (XmlDoc function)|Xml]]</td></tr> | <tr><td>In methods:<br><var>[[AddXml (HttpRequest subroutine)|AddXml]]</var><br><var>[[Audit (XmlDoc/XmlNode subroutine)|Audit]]</var><br><var>[[Print (XmlDoc/XmlNode subroutine)|Print]]</var><br><var>[[Trace (XmlDoc/XmlNode subroutine)|Trace]]</var><br><var>[[WebSend (XmlDoc subroutine)|WebSend]]</var><br><var>[[Xml (XmlDoc function)|Xml]]</var></td></tr> | ||
</table></td> | </table></td> | ||
<td>These indicate whether or not the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>). <var>AllowXmlDecl</var> (the default) may only be specified if the value of the [[Version (XmlDoc property)|Version]] property is a non-null string, and if the top of the subtree being serialized is the <var>Root</var> node. <var>AllowXmlDecl</var> and <var>NoXmlDecl</var> may not both be specified.</td></tr> | <td>These indicate whether or not the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<code><?xml version=...?></code>). <var>AllowXmlDecl</var> (the default) may only be specified if the value of the <var>[[Version (XmlDoc property)|Version]]</var> property is a non-null string, and if the top of the subtree being serialized is the <var>Root</var> node. <var>AllowXmlDecl</var> and <var>NoXmlDecl</var> may not both be specified.</td></tr> | ||
</table> | </table> | ||
== | ==EBCDIC serialization of untranslatable Unicode characters== | ||
As of | As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDoc</var> content is stored in Unicode. | ||
The methods that support EBCDIC serialization (<var>Print</var>, <var> | The methods that support EBCDIC serialization (<var>Serial</var> with <var>EBCDIC</var> option, <var>Print</var>, <var>Audit</var>, <var>Trace</var>) use the [[Unicode#Support for the ASCII subset of Unicode|Unicode tables]] to convert the <var>XmlDoc</var> content. | ||
One feature of the conversion from Unicode is that | One feature of the conversion from Unicode is that | ||
Line 248: | Line 256: | ||
</p> | </p> | ||
However | <p class="note">'''Note:''' prior to version 8.0 of the <var class="product">Sirius Mods</var>, the <var>Serial</var> method did not convert non-translatable Unicode characters to character references; the result was, instead, a request cancellation. </p> | ||
Unicode character occurs in a context other than <var>Element</var> or <var>Attribute</var> value | |||
(that is, a name, comment, or PI), character encoding is '''not''' used. | However, when an untranslatable Unicode character occurs in a context other than <var>Element</var> or <var>Attribute</var> value | ||
Because it is an | (that is, a name, comment, or PI), and the default serialization options are in effect, character encoding is '''not''' used. | ||
Because it is an <var>Element</var> name, for example, the following statements result in a | |||
request cancellation: | request cancellation: | ||
<p class="code"> %doc:AddElement('&#x2122;':U) | <p class="code"> %doc:AddElement('&#x2122;':U) | ||
Line 268: | Line 277: | ||
<p class="output"> <&#x2122;/> | <p class="output"> <&#x2122;/> | ||
</p> | </p> | ||
'''Note:''' | <blockquote class="note"> | ||
The result | <p>'''Note:''' | ||
Request cancellation is avoided, but it | The result with <var>CharacterEncodeAll</var> can be misleading. | ||
Request cancellation is avoided, but it can produce a serialization that is <b>not</b> equivalent to the portion of the <var>XmlDoc</var> that was serialized. </p> | |||
The serialized result above is not a legal XML document, | The serialized result above is not a legal XML document, | ||
because the ampersand (< | because the ampersand (<tt>&</tt>) is not a legal name character. | ||
Similarly, for | Similarly, for | ||
an untranslatable Unicode character added to a document | an untranslatable Unicode character added to a document | ||
with [[AddComment (XmlDoc/XmlNode function)|AddComment]] or [[AddPI (XmlDoc/XmlNode function)|AddPI]], | with <var>[[AddComment (XmlDoc/XmlNode function)|AddComment]]</var> or <var>[[AddPI (XmlDoc/XmlNode function)|AddPI]]</var>, EBCDIC serialization with <var>CharacterEncodeAll</var> | ||
produces a stream of characters that | produces a stream of characters that can be displayed, but, if those characters are deserialized, the result is not the same <var>XmlDoc</var> content. | ||
The XML standard does not provide for character references in names, Comments, and PIs. | |||
The standard | |||
names, Comments, and PIs. | For example: | ||
<p class="code">%d:AddComment('&#x2122;') | |||
%d:AddComment('&#x2122;':U) | |||
%d:Print(, 'CharacterEncodeAll') | |||
</p> | |||
The above results in: | |||
<p class="output"><nowiki><!--&#x2122;--> | |||
<!--&#x2122;--> | |||
</nowiki></p> | |||
This may lead you to believe that the two Comment nodes in the <var>XmlDoc</var> are identical, but the first one contains the 8 characters <code>&#x2122;</code>, whereas the second comment contains a single Unicode "trademark" character (<b>™</b>). | |||
</blockquote> | |||
==Serialization and the "xml:space" attribute== | |||
The effect of the <tt>xml:space</tt> attribute on the serialization of an <var>Element</var> that has | |||
the <code>xml:space="preserve"</code> or <code>xml:space="default"</code> attribute depends | |||
on the serialization method: | |||
<ul> | |||
<li><var>Print</var>, <var>Audit</var>, and <var>Trace</var>:<br> | |||
The <code>xml:space="preserve"</code> and <code>xml:space="default"</code> attributes do '''not''' affect | |||
the serialized output. | |||
<li><var>Serial</var>, <var>WebSend</var>, and <var>Xml</var>:<br> | |||
If one of the [[#linend|line-end]] options or <var>[[#indentn|Indent]]</var> | |||
is specified, and an element to be serialized has the | |||
<code>xml:space="preserve"</code> attribute, then | |||
within the serialization of that element and its descendants, no line-end nor indent characters are inserted. | |||
The <code>xml:space="default"</code> attribute does not influence serialization, regardless of method options, | |||
nor does it cause resumption of the insertion of readability line-ends or indents if they were | |||
suspended by a containing <code>xml:space="preserve"</code>. | |||
</ul> | |||
==Displaying whitespace characters in serializations== | |||
The serialization methods use the hexadecimal | |||
character references specified in the XML Canonicalization specification | |||
(http://www.w3.org/TR/xml-c14n) to display the following whitespace characters: | |||
<ul> | |||
<li>For <var>Attribute</var> nodes: tab, carriage return, and linefeed | |||
<li>For <var>Text</var> nodes: carriage return | |||
</ul> | |||
Since the character references are not subject to the standard XML [[XML processing in Janus SOAP#Normalizing whitespace characters|whitespace normalization]], | |||
a serialized document (or subtree) that is then deserialized will retain this whitespace. | |||
These character references are used: | |||
<table> | |||
<tr><td>'''tab'''<td>&#x9;</tr> | |||
<tr><td>'''carriage return'''<td>&#xD;</tr> | |||
<tr><td>'''linefeed'''<td>&#xA;</tr> | |||
</table> | |||
<br> | |||
The EBCDIC and corresponding ASCII encodings of the characters is: | |||
<table> | |||
<tr class="head"><th><th>EBCDIC<th>ASCII</tr> | |||
<tr><td>'''tab'''<td>X'05'<td>X'09'</tr> | |||
<tr><td>'''carriage return'''<td>X'0D'<td>X'0D'</tr> | |||
<tr><td>'''linefeed'''<td>X'25'<td>X'0A'</tr> | |||
</table> | |||
==Canonicalization== | ==Canonicalization== | ||
Line 303: | Line 371: | ||
For example, UTF-8 encoding and exclusion of the XML declaration, if any, | For example, UTF-8 encoding and exclusion of the XML declaration, if any, | ||
are provided by default by <var>Serial</var>. | are provided by default by <var>Serial</var>. | ||
Specifying <var>ExclCanonical</var>, which is new as of | Specifying <var>ExclCanonical</var>, which is new as of <var class="product">Sirius Mods</var> version 7.0, | ||
adds the following features to the no-option default: | adds the following features to the no-option default: | ||
<ul> | <ul> | ||
Line 311: | Line 379: | ||
The sort order is from lowest | The sort order is from lowest | ||
to highest, and it uses the Unicode code ordering (for example, numbers | to highest, and it uses the Unicode code ordering (for example, numbers | ||
are lower than letters). | are lower than letters). </li> | ||
<li>For empty elements, serialization with both a start tag and an end tag, | <li>For empty elements, serialization with both a start tag and an end tag, | ||
instead of using a single "empty element tag." | instead of using a single "empty element tag." </li> | ||
<li>The suppression of any <var>Comment</var> nodes that may be present in the subtree. | <li>The suppression of any <var>Comment</var> nodes that may be present in the subtree. | ||
<var>Comment</var> nodes are suppressed unless the <var>WithComments</var> option is | <var>Comment</var> nodes are suppressed unless the <var>WithComments</var> option is | ||
specified along with <var>ExclCanonical</var>. | specified along with <var>ExclCanonical</var>. | ||
<p> | |||
For an example, see [[#PIs and Comments|PIs and Comments]]. </p></li> | |||
<li>Special namespace declaration handling: A namespace declaration is produced | <li>Special namespace declaration handling: A namespace declaration is produced | ||
only if it is utilized by an element or attribute in the subtree. | only if it is utilized by an element or attribute in the subtree. | ||
Line 325: | Line 396: | ||
it), unless the parent of the element is in the subtree and the | it), unless the parent of the element is in the subtree and the | ||
declaration is in scope at the parent. | declaration is in scope at the parent. | ||
<p> | |||
For examples, see [[#Namespace serialization|Namespace serialization]] and [[#Namespace importing|Namespace importing]]. </p></li> | |||
<li>''Attribute values'' are always serialized within | <li>''Attribute values'' are always serialized within | ||
double-quotation-mark (<code>"</code>) delimiters, | double-quotation-mark (<code>"</code>) delimiters, | ||
Line 336: | Line 408: | ||
as entity and hexadecimal character references: | as entity and hexadecimal character references: | ||
<ul> | <ul> | ||
<li>The ampersand (&) is serialized as <code>& | <li>The ampersand (&) is serialized as <code>&</code> </li> | ||
<li>The less-than symbol (<) is serialized as <code>&lt;</code> | <li>The less-than symbol (<) is serialized as <code>&lt;</code> </li> | ||
<li>The carriage return (CR) character is serialized as <code>&#xD;</code> | <li>The carriage return (CR) character is serialized as <code>&#xD;</code> </li> | ||
<li>The linefeed (LF) character is serialized as <code>&#xA;</code> | <li>The linefeed (LF) character is serialized as <code>&#xA;</code> </li> | ||
<li>The tab character is serialized as <code>&#x9;</code> | <li>The tab character is serialized as <code>&#x9;</code> </li> | ||
</ul> | </ul> | ||
For examples, see [[#Character references| | For examples, see [[#Character references|Character references]]. </li> | ||
<li>Within ''Text nodes'', the following characters are | <li>Within ''Text nodes'', the following characters are | ||
Line 350: | Line 422: | ||
If you specify <code>Serial</code> with no options:</p> | If you specify <code>Serial</code> with no options:</p> | ||
<ul> | <ul> | ||
<li>The less-than symbol (<tt><</tt>) is serialized as <code>&lt;</code> | <li>The less-than symbol (<tt><</tt>) is serialized as <code>&lt;</code> </li> | ||
<li>The ampersand (<tt>&</tt>) is serialized as <code>& | <li>The ampersand (<tt>&</tt>) is serialized as <code>&</code> </li> | ||
<li>The carriage return (CR) character is serialized as <code>&#xD;</code> | <li>The carriage return (CR) character is serialized as <code>&#xD;</code> </li> | ||
</ul> | </ul> | ||
If you specify the <var>ExclCanonical</var> option, the following is ''also'' true: | If you specify the <var>ExclCanonical</var> option, the following is ''also'' true: | ||
<ul> | <ul> | ||
<li>The greater-than symbol (<tt>></tt>) is serialized as <code>&gt;</code> | <li>The greater-than symbol (<tt>></tt>) is serialized as <code>&gt;</code> </li> | ||
</ul> | </ul> | ||
<p> | |||
For examples, see [[#Character references|Character references]]. </p></li> | |||
<li>If serializing the <var>Root</var> of an <var>XmlDoc</var>, a linefeed character | <li>If serializing the <var>Root</var> of an <var>XmlDoc</var>, a linefeed character | ||
is inserted ''between'' the children of the <var>Root</var>. | is inserted ''between'' the children of the <var>Root</var>. | ||
Line 366: | Line 439: | ||
by <code>X'25'</code> if the <var>EBCDIC</var> option of <var>Serial</var> is used; otherwise | by <code>X'25'</code> if the <var>EBCDIC</var> option of <var>Serial</var> is used; otherwise | ||
it is represented by <code>X'0A'</code>. | it is represented by <code>X'0A'</code>. | ||
<p> | <p class="note">'''Note:''' | ||
'''Note:''' | |||
No linefeed is inserted if the <var>XmlDoc</var> has one <var>PI</var> or <var>Comment</var> node and does not have an <var>Element</var> node. | No linefeed is inserted if the <var>XmlDoc</var> has one <var>PI</var> or <var>Comment</var> node and does not have an <var>Element</var> node. | ||
In this case (which is allowed by [[Janus SOAP]]), the XML document is not well-formed | In this case (which is allowed by <var class="product">[[Janus SOAP]]</var>), the XML document is not well-formed | ||
and therefore the canonicalization specifications ignore it.</p> | and therefore the canonicalization specifications ignore it.</p></li> | ||
<li>If the subtree to be serialized is a single node that is either of these: | <li>If the subtree to be serialized is a single node that is either of these: | ||
<ul> | <ul> | ||
<li>A <var>PI</var> child of the <var>Root</var> | <li>A <var>PI</var> child of the <var>Root</var> </li> | ||
<li>A single node that is a <var>Comment</var> child of the <var>Root</var> and | <li>A single node that is a <var>Comment</var> child of the <var>Root</var> and | ||
the <var>WithComments</var> option is specified | the <var>WithComments</var> option is specified </li> | ||
</ul> | </ul> | ||
Line 381: | Line 455: | ||
there is a following <var>Element</var> sibling, or is added before the <var>PI</var> or <var>Comment</var> if there is a preceding <var>Element</var> sibling. | there is a following <var>Element</var> sibling, or is added before the <var>PI</var> or <var>Comment</var> if there is a preceding <var>Element</var> sibling. | ||
'''Note:''' | <p class="note">'''Note:''' | ||
No linefeed is inserted if the <var>XmlDoc</var> does not have an <var>Element</var> node. | No linefeed is inserted if the <var>XmlDoc</var> does not have an <var>Element</var> node. | ||
In this case (which is allowed by | In this case (which is allowed by <var class="product">J</var>anus SOAP]]), the XML document is not well-formed | ||
and therefore the canonicalization specifications ignore it. | and therefore the canonicalization specifications ignore it. </p></li> | ||
</ul> | </ul> | ||
Line 392: | Line 466: | ||
exclusive canonicalization, include references to the | exclusive canonicalization, include references to the | ||
serialization of a ''subset'' of a document. | serialization of a ''subset'' of a document. | ||
The <var>ExclCanonical</var> option is based not on a subset but on a ''subtree''. | The <var>ExclCanonical</var> option is based not on a subset but on a ''subtree''. </li> | ||
<li>Although the <var>ExclCanonical</var> and <var>SortCanonical</var> options use | <li>Although the <var>ExclCanonical</var> and <var>SortCanonical</var> options use | ||
the "Unicode" sort sequence, | the "Unicode" sort sequence, | ||
this is currently limited to Unicode values less than 256 (as | this is currently limited to Unicode values less than 256 (as | ||
of version 7.7 of | of version 7.7 of <var class="product">J</var>anus SOAP]]), | ||
so it is accomplished with an 8-byte EBCDIC to 8-byte | so it is accomplished with an 8-byte EBCDIC to 8-byte | ||
Unicode table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII | Unicode table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII | ||
translation. | translation. </li> | ||
<li>The specifications support an argument to canonicalization that | <li>The specifications support an argument to canonicalization that | ||
is a list of namespace declarations that are to be "forced" into the | is a list of namespace declarations that are to be "forced" into the | ||
serialization. | serialization. | ||
The <var>ExclCanonical</var> option does not provide this support. | The <var>ExclCanonical</var> option does not provide this support. </li> | ||
</ul> | </ul> | ||
Line 492: | Line 568: | ||
</w> | </w> | ||
</p> | </p> | ||
=====PIs and Comments===== | =====PIs and Comments===== | ||
Using the same type of request as in the [[#Namespace serialization|Namespace serialization]] example, this is the subtree to be serialized (display form): | Using the same type of request as in the [[#Namespace serialization|Namespace serialization]] example, this is the subtree to be serialized (display form): | ||
<p class="code"><a> | <p class="code"><a> | ||
<!-- Comment 1 --> | |||
<w> | <w> | ||
<?pi-without-data?> | <?pi-without-data?> | ||
Line 504: | Line 580: | ||
Exclusive canonical serialization (display form), | Exclusive canonical serialization (display form), | ||
which omits the Comment node: | which omits the <var>Comment</var> node: | ||
<p class="code"><a> | <p class="code"><a> | ||
<w> | <w> | ||
Line 511: | Line 587: | ||
</a> | </a> | ||
</p> | </p> | ||
'''Note:''' | <p class="note">'''Note:''' | ||
To include the Comment node, specify also the <var>WithComments</var> | To include the <var>Comment</var> node, specify also the <var>WithComments</var> | ||
option of <var>Serial</var>. | option of <var>Serial</var>. </p> | ||
=====Character references===== | =====Character references===== | ||
Using the same type of request as in the [[#Namespace serialization|Namespace serialization]] example, | |||
Using the same type of request as in the [[#Namespace serialization| | |||
this is the subtree to serialize (display form): | this is the subtree to serialize (display form): | ||
<p class="code"><doc> | <p class="code"><doc> | ||
Line 550: | Line 625: | ||
</p> | </p> | ||
The differences from no-option <var>Serial</var> | The differences (<span class="boldGreen">green</span> font) from no-option <var>Serial</var> include: | ||
<ul> | <ul> | ||
<li>The greater-than symbol (>) within a text node is serialized | <li>The greater-than symbol (<tt>></tt>) within a text node is serialized | ||
as <code>&gt;</code>. | as <code>&gt;</code>. | ||
<li>Attribute values are enclosed in double-quotation marks (<code>"</code>). | <li>Attribute values are enclosed in double-quotation marks (<code>"</code>). | ||
Line 560: | Line 635: | ||
followed by an end tag), not with a single empty-element tag. | followed by an end tag), not with a single empty-element tag. | ||
</ul> | </ul> | ||
==See also== | |||
<ul> | |||
<li>For additional discussion about serialization, see [[XmlDoc API#Transport: receiving and sending XML|Transport: receiving and sending XML]].</li> | |||
</li> | |||
[[Category: Janus SOAP]] |
Latest revision as of 20:38, 13 March 2014
Multiple methods in the XmlDoc API serialize (produce the text-string representation of) the contents of an XmlDoc or XmlDoc subtree. These methods include Serial, WebSend, Xml, and Print, Audit, and Trace. The AddXml method of the HttpRequest class is also included below; it also serializes an XmlDoc and is comparable to the Websend method.
Each of the serialization methods has an "options" parameter that is a blank-delimited string (not case-sensitive) of one or more options which control aspects of the output format. These options are summarized in the table below. Following the table are several topics that concern XmlDoc serialization.
Option descriptions
The options parameter options available to an XmlDoc API serialization method vary with the individual method. The table below includes all the options from all the methods. The individual option descriptions specify the methods for which the option is available, and the individual method pages specify the options available for that method.
For direct access to a particular option in the table, you can use the following "index" of the options:
- AllowXmlDecl
- AttributeCompact
- BothCompact
- CharacterEncodeAll
- Compact
- CR
- CRLF
- EBCDIC
- ElementCompact
- ExclCanonical
- Expanded
- Indent n
- LF
- Newline
- NoEmptyElt
- NoXmlDecl
- OmitNullElement
- SortCanonical
- UTF-8
- WithComments
- XmlDecl
Option | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Use character encoding in all contexts to display Unicode characters that do not translate to EBCDIC. If this option is not specified (as of Sirius Mods 7.6), only non-translatable Unicode characters in Attribute or Element values are displayed as character references.
For more information about this option, see "EBCDIC serialization of untranslatable Unicode characters", below. The CharacterEncodeAll option is available as of Sirius Mods version 7.6. It is available for the Serial method starting with Sirius Mods version 8.0. | ||||||||||||
|
This indicates that the serialization is to be in EBCDIC rather than UTF-8. The Serial method provides UTF-8 encoding by default.
Since XmlDocs are stored in Unicode (under Sirius Mods 7.6 or higher,), serializing to UTF-8 involves no translation: the stored Unicode characters are merely encoded as UTF-8. Serializing to EBCDIC causes conversion of the subtree content via the Unicode tables.The serialization of untranslatable characters is desribed below. Prior to Sirius Mods 7.6, XmlDocs are stored in EBCDIC. | ||||||||||||
|
This indicates that the output of the serialization will be in exclusive XML canonical form, as defined in the W3C "Exclusive XML Canonicalization" specification (http://www.w3.org/tr/xml-exc-c14n), which is an extension of the "XML Canonicalization" specification (http://www.w3.org/TR/xml-c14n). These specifications constrain serializations to facilitate processing such as digital signatures.
This option, added in Sirius Mods version 7.0, is described in greater detail in "Canonicalization" below. Specifying any of the Serial method CR, LF, CRLF, or Indent options when you also specify ExclCanonical is allowed. Although the resulting output will not be completely canonical, it may be what you require for the purposes of a digital signature, for example. The formatting addressed by those options is defined in the Exclusive Canonicalization specification and covered by the ExclCanonical option. Similarly, the effect of the XmlDecl option contradicts the Exclusive Canonicalization specification. If you do specify the XmlDecl and ExclCanonical options together, however, the serialized XML Declaration is followed by a linefeed character. | ||||||||||||
|
Inserts space characters (and line-ends, as described for the next option) into the serialized string such that if the string is broken at the line-ends and displayed as a tree, the display of each lower level (child element) in the subtree is indented n spaces from the starting point of the previous level (parent element).
If serialized output with an Indent value of <top> <leaf1 xx="yy">value</leaf1> <sub> <leaf2>value</leaf2> </sub> </top> n is a non-negative integer, and its maximum value (as of Sirius Mods version 7.0) is 254. For the Print, Audit, and Trace methods only: if the Indent option is omitted, the default indent is 3 spaces.
| ||||||||||||
|
Line-end options for the method output:
| ||||||||||||
|
Deprecated as of Sirius Mods version 7.0, this option ensures that all empty elements are serialized with a start tag followed by an end tag. For example: <middleName></middleName>If NoEmptyElt is not specified, the default is to serialize an empty element with an empty element tag; using the same example as above, this would be: <middleName/>The ExclCanonical option provides the same empty element serialization as NoEmptyElt. Also, the NoEmptyElement XmlNode property specifies whether to serialize childless nodes using a separate start tag and end tag. | ||||||||||||
|
An Element node that has no children and no Attributes will not be serialized, unless it is the top level Element in the subtree being serialized. The serialization of a child-less and Attribute-less Element is omitted, even if the serialization of the Element would contain Namespace declarations in its start tag.
If an Element node has no Attributes, but has (only) Element children (one or more), and all of its children are Attribute-less and child-less, then that parent Element is serialized, even though its content in the serialization is empty. That parent is serialized with a start tag and an end tag (and an inserted line separator, if called for by the serializing method's parameter options). For example, if the Serial method display of a particular XmlDoc in tree format is the following when OmitNullElement is not specified: <top> <middle> <empty/> <p:empty2 xmlns:p="uri:stuff"/> </middle> </top> Here is the display of the XmlDoc with the OmitNullElement option specified: <top> <middle> </middle> </top> But if you attempt to display only the </empty> The OmitNullElement option is available as of Sirius Mods version 7.3. | ||||||||||||
|
One of the following mutually exclusive output formats:
| ||||||||||||
|
Deprecated as of Sirius Mods version 7.0, SortCanonical serializes namespace declarations (based on the prefix being declared) and attributes (based on the namespace URI followed by the local name) in sorted order. This can be useful, for instance, when using Serial to serialize a portion of an XML document for a signature.
The sort order for namespace declarations and attributes is from lowest to highest, and it uses the Unicode code ordering (for example, numbers are lower than letters). Added in Sirius Mods version 6.9 as a step towards support for canonicalization, this option is superseded by the ExclCanonical option. | ||||||||||||
|
This indicates that the serialization should be in UTF-8. This is the default. | ||||||||||||
|
This indicates that all Comment nodes in the specified subtree are to be included in the serialized output.
Note: This option, added in Sirius Mods version 7.0, is only a supplement to the ExclCanonical option: specifying WithComments without specifying ExclCanonical has no effect. Specifying ExclCanonical without specifying WithComments causes all Comment nodes to be suppressed from the result. | ||||||||||||
|
This indicates that the serialized XmlDoc will contain the "XML Declaration" (<?xml version=...?> ), if the value of the Version property is a non-null string, and if the XmlDoc is not empty.
XmlDecl may only be specified if the top of the subtree being serialized is the Root node. | ||||||||||||
|
These indicate whether or not the serialized XmlDoc will contain the "XML Declaration" (<?xml version=...?> ). AllowXmlDecl (the default) may only be specified if the value of the Version property is a non-null string, and if the top of the subtree being serialized is the Root node. AllowXmlDecl and NoXmlDecl may not both be specified. |
EBCDIC serialization of untranslatable Unicode characters
As of Sirius Mods version 7.6, XmlDoc content is stored in Unicode. The methods that support EBCDIC serialization (Serial with EBCDIC option, Print, Audit, Trace) use the Unicode tables to convert the XmlDoc content.
One feature of the conversion from Unicode is that the serializing method displays non-translatable Unicode characters stored in Attribute or Element values as character references. For example:
%doc:AddElement('top', '™':U) %doc:Print
The result of this fragment is:
<top>™</top>
Note: prior to version 8.0 of the Sirius Mods, the Serial method did not convert non-translatable Unicode characters to character references; the result was, instead, a request cancellation.
However, when an untranslatable Unicode character occurs in a context other than Element or Attribute value (that is, a name, comment, or PI), and the default serialization options are in effect, character encoding is not used. Because it is an Element name, for example, the following statements result in a request cancellation:
%doc:AddElement('™':U) %doc:Print
The Print method fails, attempting to translate the element name, the U+2122 character, to EBCDIC. This request cancellation can be prevented by using the CharacterEncodeAll option:
%doc:AddElement('™':U) %doc:Print(, 'CharacterEncodeAll')
The result of the above fragment is:
<™/>
Note: The result with CharacterEncodeAll can be misleading. Request cancellation is avoided, but it can produce a serialization that is not equivalent to the portion of the XmlDoc that was serialized.
The serialized result above is not a legal XML document, because the ampersand (&) is not a legal name character. Similarly, for an untranslatable Unicode character added to a document with AddComment or AddPI, EBCDIC serialization with CharacterEncodeAll produces a stream of characters that can be displayed, but, if those characters are deserialized, the result is not the same XmlDoc content. The XML standard does not provide for character references in names, Comments, and PIs.
For example:
%d:AddComment('™') %d:AddComment('™':U) %d:Print(, 'CharacterEncodeAll')
The above results in:
<!--™--> <!--™-->
This may lead you to believe that the two Comment nodes in the XmlDoc are identical, but the first one contains the 8 characters
™
, whereas the second comment contains a single Unicode "trademark" character (™).
Serialization and the "xml:space" attribute
The effect of the xml:space attribute on the serialization of an Element that has
the xml:space="preserve"
or xml:space="default"
attribute depends
on the serialization method:
- Print, Audit, and Trace:
Thexml:space="preserve"
andxml:space="default"
attributes do not affect the serialized output. - Serial, WebSend, and Xml:
If one of the line-end options or Indent is specified, and an element to be serialized has thexml:space="preserve"
attribute, then within the serialization of that element and its descendants, no line-end nor indent characters are inserted. Thexml:space="default"
attribute does not influence serialization, regardless of method options, nor does it cause resumption of the insertion of readability line-ends or indents if they were suspended by a containingxml:space="preserve"
.
Displaying whitespace characters in serializations
The serialization methods use the hexadecimal character references specified in the XML Canonicalization specification (http://www.w3.org/TR/xml-c14n) to display the following whitespace characters:
- For Attribute nodes: tab, carriage return, and linefeed
- For Text nodes: carriage return
Since the character references are not subject to the standard XML whitespace normalization, a serialized document (or subtree) that is then deserialized will retain this whitespace.
These character references are used:
tab | 	 |
carriage return | 
 |
linefeed | 
 |
The EBCDIC and corresponding ASCII encodings of the characters is:
EBCDIC | ASCII | |
---|---|---|
tab | X'05' | X'09' |
carriage return | X'0D' | X'0D' |
linefeed | X'25' | X'0A' |
Canonicalization
Canonicalization refers to a particular serialization of an XML document that is unique, yet still a logically equivalent representation of the document. Exclusive canonicalization is canonicalization augmented by rules for preserving or excluding the namespace context (declaration) of nodes when only a portion of an XML document is serialized.
Therefore, if a portion (subtree) of an XML document is exclusively canonicalized, it is serialized uniquely and is "substantially independent of its XML context" (that is, contains all essential and no extraneous information from its ancestor nodes). This independence makes the subtree suitable for working with digital signatures.
Some of the many requirements for canonicalization are provided automatically by specifying the Serial method with no options specified. For example, UTF-8 encoding and exclusion of the XML declaration, if any, are provided by default by Serial. Specifying ExclCanonical, which is new as of Sirius Mods version 7.0, adds the following features to the no-option default:
- Sorting of namespace declarations (based on the prefix being declared) and of attributes (based on the namespace URI followed by the local name). The sort order is from lowest to highest, and it uses the Unicode code ordering (for example, numbers are lower than letters).
- For empty elements, serialization with both a start tag and an end tag, instead of using a single "empty element tag."
- The suppression of any Comment nodes that may be present in the subtree.
Comment nodes are suppressed unless the WithComments option is
specified along with ExclCanonical.
For an example, see PIs and Comments.
- Special namespace declaration handling: A namespace declaration is produced
only if it is utilized by an element or attribute in the subtree.
The declaration is produced in the
start-tag of an element that uses it (or has an attribute using
it), unless the parent of the element is in the subtree and the
declaration is in scope at the parent.
For examples, see Namespace serialization and Namespace importing.
- Attribute values are always serialized within
double-quotation-mark (
"
) delimiters, and a double-quotation mark character in an attribute value is serialized as"
. With or without the ExclCanonical option, these special characters in attribute values are serialized as entity and hexadecimal character references:- The ampersand (&) is serialized as
&
- The less-than symbol (<) is serialized as
<
- The carriage return (CR) character is serialized as

- The linefeed (LF) character is serialized as


- The tab character is serialized as
	
- The ampersand (&) is serialized as
- Within Text nodes, the following characters are
serialized as entity and hexadecimal character references:
If you specify
Serial
with no options:- The less-than symbol (<) is serialized as
<
- The ampersand (&) is serialized as
&
- The carriage return (CR) character is serialized as

If you specify the ExclCanonical option, the following is also true:
- The greater-than symbol (>) is serialized as
>
For examples, see Character references.
- The less-than symbol (<) is serialized as
- If serializing the Root of an XmlDoc, a linefeed character
is inserted between the children of the Root.
This character is represented exactly
by
X'25'
if the EBCDIC option of Serial is used; otherwise it is represented byX'0A'
.Note: No linefeed is inserted if the XmlDoc has one PI or Comment node and does not have an Element node. In this case (which is allowed by Janus SOAP), the XML document is not well-formed and therefore the canonicalization specifications ignore it.
- If the subtree to be serialized is a single node that is either of these:
- A PI child of the Root
- A single node that is a Comment child of the Root and the WithComments option is specified
Then a linefeed character is added after the PI or Comment if there is a following Element sibling, or is added before the PI or Comment if there is a preceding Element sibling.
Note: No linefeed is inserted if the XmlDoc does not have an Element node. In this case (which is allowed by Janus SOAP]]), the XML document is not well-formed and therefore the canonicalization specifications ignore it.
Qualifications/exceptions
- The canonicalization specifications, especially exclusive canonicalization, include references to the serialization of a subset of a document. The ExclCanonical option is based not on a subset but on a subtree.
- Although the ExclCanonical and SortCanonical options use the "Unicode" sort sequence, this is currently limited to Unicode values less than 256 (as of version 7.7 of Janus SOAP]]), so it is accomplished with an 8-byte EBCDIC to 8-byte Unicode table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII translation.
- The specifications support an argument to canonicalization that is a list of namespace declarations that are to be "forced" into the serialization. The ExclCanonical option does not provide this support.
Examples
The following examples show various aspects of the ExclCanonical option. The examples use the EBCDIC option to display the result. If using ExclCanonical for digital signature processing, you probably should omit the EBCDIC option and use the default encoding, UTF-8.
Namespace serialization
Under exclusive canonicalization, a namespace is not serialized if it is not necessary. In this example, the subtree to be serialized is displayed in green font in the request code that follows:
Begin %doc is Object XmlDoc %doc = New %l is longstring %sl is object stringlist %sl = New text to %sl <top> <a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1"> <p1:b/> <p2:b/> </a> </top> end text Call %doc:LoadXml(%sl) Print 'Exclcan via ParseLines:' %sl = New %l=%doc:Serial('top/a', 'EBCDIC exclcanonical indent 2 lf') %sl:Parselines(%l) %sl:Print End
The exclusive canonical serialization (displayed, after being parsed from string
to Stringlist, with line breaks and indent for the sake of clarity)
omits the declaration for p3
,
because it is not utilized in the serialized subtree:
<a> <p1:b xmlns:p1="urn:p1"></p1:b> <p2:b xmlns:p2="urn:p2"></p2:b> </a>
An element utilizes an in-scope namespace declaration in either of these cases:
- The element is prefixed and the declaration is of that prefix.
- The element is unprefixed and it is a default namespace declaration.
An attribute utilizes an in-scope namespace declaration if the attribute is prefixed and the declaration is of that prefix.
In the preceding example, there was no alternative to removing the non-utilized
declaration for p3
, but if it were utilized by a
descendant element "lower" in the document tree, it would be
moved to that element.
Another application of the utilization rule is shown in the next example.
Namespace importing
Under exclusive canonicalization, namespaces are imported to where they are needed.
Using the same type of request as in the preceding example,
the w
element is the subtree to serialize (green font):
<a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1"> <w> <p1:b/> <p2:b/> </w> </a>
Exclusive canonical serialization (display form), which gets required namespace declarations from an ancestor of the serialized subtree:
<w> <p1:b xmlns:p1="urn:p1"></p1:b> <p2:b xmlns:p2="urn:p2"></p2:b> </w>
PIs and Comments
Using the same type of request as in the Namespace serialization example, this is the subtree to be serialized (display form):
<a> <!-- Comment 1 --> <w> <?pi-without-data?> </w> </a>
Exclusive canonical serialization (display form), which omits the Comment node:
<a> <w> <?pi-without-data?> </w> </a>
Note: To include the Comment node, specify also the WithComments option of Serial.
Character references
Using the same type of request as in the Namespace serialization example, this is the subtree to serialize (display form):
<doc> <comp>val>"0" val<"10"</comp> <comp expr='val>"0"'></comp> <norm attr=' ' 
	 ' '/> <white>	
</white> </doc>
This is the result from the Serial method with no options specified
(display form, and the <white>
element has a line that wraps
to emphasize the non-visible linefeed character it contains):
<doc> <comp>val>"0" val<"10"</comp> <comp expr='val>"0"'></comp> <norm attr=" ' 
	 ' "/> <white> 
 </white> </doc>
The exclusive canonical serialization follows (display form,
wrapped <white>
element line has no indent).
<doc> <comp>val>"0" val<"10"</comp> <comp expr="val>"0""></comp> <norm attr=" ' 
	 ' "></norm> <white> 
 </white> </doc>
The differences (green font) from no-option Serial include:
- The greater-than symbol (>) within a text node is serialized
as
>
. - Attribute values are enclosed in double-quotation marks (
"
). - A double-quotation mark in an attribute value is serialized
as
"
. - An empty element is serialized with two tags (a start tag followed by an end tag), not with a single empty-element tag.
See also
- For additional discussion about serialization, see Transport: receiving and sending XML.