|
|
Line 22: |
Line 22: |
|
| |
|
| <tr><th>options</th> | | <tr><th>options</th> |
| <td>A blank delimited string that can contain one or more of the following options, which are identified below and described in greater detail in [["XmlDoc API serialization options"]]: | | <td>A blank delimited string that can contain one or more of the following options, which are identified below and described in greater detail in [[XmlDoc API serialization options|"XmlDoc API serialization options"]]: |
| <ul> | | <ul> |
| <li><b>EBCDIC</b> | | <li><b>EBCDIC</b><br> |
| This indicates that the serialization should be in EBCDIC rather than UTF-8. UTF-8 encoding is provided by default.
| | Produces serialized output in EBCDIC text rather than the default encoding, UTF-8. |
| Selecting <tt>EBCDIC</tt> under ''Sirius Mods'' 7.6 or higher causes conversion via the <var>Unicode</var> tables of the subtree content, which is stored in <var>Unicode</var>. For more information about the <var>Unicode</var> tables, see [[??]] refid=u80..
| |
| <var>Serial</var>izing to UTF-8 involves no translation: the stored <var>Unicode</var> characters are merely encoded as UTF-8.
| |
|
| |
|
| <li><b>ExclCanonical</b> | | <li><b>ExclCanonical</b><br> |
| This indicates that the output of the serialization will be in exclusive XML canonical form, as defined in the W3C "Exclusive XML Canonicalization" specification (http://www.w3.org/tr/xml-exc-c14n) , which is an extension of the "XML Canonicalization" specification (http://www.w3.org/TR/xml-c14n) . These specifications constrain serializations to facilitate processing such as digital signatures.
| | Produces serialized output in exclusive XML canonical form, as defined in the W3C "Exclusive XML Canonicalization" specification (http://www.w3.org/tr/xml-exc-c14n). |
| This option, added in ''Sirius Mods'' version 7.0, is described in greater detail in the list item for [[??]] refid=canotes reftxt=Canonicalization. in the "Usage Notes" section, below.
| |
| Specifying any of the <var>Serial</var> method CR, LF, CRLF, or Indent options when you also specify ExclCanonical is allowed. Although the resulting output will not be completely canonical, it may be what you require for the purposes of a digital signature, for example. The formatting addressed by those options is defined in the Exclusive Canonicalization specification and covered by the ExclCanonical option.
| |
| Similarly, the effect of the XmlDecl option contradicts the Exclusive Canonicalization specification. If you do specify the XmlDecl and ExclCanonical options together, however, the serialized XML Declaration is followed by a linefeed character.
| |
|
| |
|
| <li><b>Indent</b> <i><b>n</b></i> | | <li><b>Indent</b> <i><b>n</b></i><br> |
| Inserts space characters (and line-ends, as described for the next option) into the serialized string such that if the string is broken at the line-ends and displayed as a tree, the display of each lower level in the subtree is indented ''n'' spaces from the previous level's starting point. | | Inserts space characters (and line-ends, as described for the next option) into the serialized string such that if the string is broken at the line-ends and displayed as a tree, the display of each lower level in the subtree is indented ''n'' spaces from the previous level's starting point. |
| If serialized output with an Indent value of 2 is displayed as a tree, the spacing is as in the following: <pre> <top> <leaf1 xx="yy">value</leaf1> <sub> <leaf2>value</leaf2> </sub> </top> </pre>
| | <p> |
| One of the line-end options, below, must also be specified. | | One of the line-end options, below, must also be specified.</p> |
| <i>n</i> is a non-negative integer, and its maximum value (as of ''Sirius Mods'' version 7.0) is 254.
| | <li><b>CR</b> (carriage-return), <b>LF</b> (linefeed), or <b>CRLF</b> (carriage-return followed by a linefeed)<br> |
| <li>One of the '''line-end options''' below, to provide line breaks in the output after any of the following is serialized: <ul> <li>An element start-tag, if it has any non-text node children <li>An element end tag <li>An empty element tag <li>A processing instruction (PI) <li>A comment <li>A text node, if it has any siblings </ul> | | Inserts one of these line-end options to provide line breaks in the serialized output |
| </p>
| | <p> |
| | | If an <code>AddTrailingDelimiter=false</code> argument is also specified, no line-end character is added at the end of the serialized subtree. </p> |
| <table class="syntaxNested"> | |
| <tr><th>CR</th>
| |
| <td>Insert a carriage-return character as the line-end sequence in the above cases. </td></tr>
| |
| <tr><th>LF</th>
| |
| <td>Insert a linefeed character as the line-end sequence in the above cases. </td></tr>
| |
| <tr><th>CRLF</th>
| |
| <td>Insert a carriage-return character followed by a linefeed character as the line-end sequence in the above cases. </td></tr>
| |
| </table> | |
| '''Note:''' If one of these line-end options is specified and an <tt>AddTrailingDelimiter=false</tt> argument is also specified, no line-end character is added at the end of the serialized subtree.
| |
|
| |
|
| <li><b>NoEmptyElt</b> | | <li><b>NoEmptyElt</b> |
| Deprecated as of ''Sirius Mods'' version 7.0, this option ensures that all empty elements are serialized with a start tag followed by an end tag. For example: <pre> <middleName></middleName> </pre> | | Deprecated as of ''Sirius Mods'' version 7.0, this option ensures that all empty elements are serialized with a start tag followed by an end tag. For example: |
| If NoEmptyElt is not specified, the default is to serialize an empty element with an empty element tag; using the same example as above, this would be: <pre> <middleName/> </pre>
| | <p class="code"> <middleName></middleName> </p> |
| The ExclCanonical option provides the same empty element serialization as NoEmptyElement.
| | The default is to serialize an empty element with an empty element tag (as in <code><middleName/></code>). |
|
| |
|
| <li><b>OmitNullElement</b> | | <li><b>OmitNullElement</b> |
| An Element node that has no children and no Attributes will not be serialized, unless it is the top level Element in the subtree being serialized. The serialization of a child-less and Attribute-less Element is omitted, even if the Element's serialization would contain Namespace declarations in its start tag. | | An <var>Element</var> node that has no children and no <var>Attributes</var> will not be serialized, unless it is the top level <var>Element</var> in the subtree being serialized. |
| If an Element node has no Attributes, but has (only) Element children (one or more), and all of its children are Attribute-less and child-less, then that parent Element is serialized, even though its content in the serialization is empty. That parent is serialized with a start tag and an end tag (and an inserted line separator, if called for by the serializing method's parameter options).
| |
| For example, if the <var>Serial</var> method display of a particular <var>XmlDoc</var> in tree format is the following when OmitNullElement is ''not'' specified: <pre> <top> <middle> <empty/> <p:empty2 xmlns:p="uri:stuff"/> </middle> </top> </pre>
| |
| Here is the display of the <var>XmlDoc</var> with the OmitNullElement option specified: <pre> <top> <middle> </middle> </top> </pre>
| |
| But if you attempt to display only the <tt>empty</tt> subtree of %d using OmitNullElement, the <tt>empty</tt> node is not suppressed, and the result is: <pre> <empty/> </pre>
| |
| The OmitNullElement option is available as of ''Sirius Mods'' version 7.3.
| |
|
| |
|
| <li><b>SortCanonical</b> | | <li><b>SortCanonical</b><br> |
|
| |
|
| Deprecated as of ''Sirius Mods'' version 7.0, SortCanonical serializes namespace declarations (based on the prefix being declared) and attributes (based on the namespace URI followed by the local name) in sorted order. This can be useful, for instance, when using <var>Serial</var> to serialize a portion of an XML document for a signature.
| | This deprecated option serializes namespace declarations and attributes in sorted order (from lowest to highest with Unicode code ordering). |
| The sort order for namespace declarations and attributes is from lowest to highest, and it uses the <var>Unicode</var> code ordering (for example, numbers are lower than letters).
| | It is superseded by the <var>ExclCanonical</var> option. |
| Added in ''Sirius Mods'' version 6.9, this option is superseded by the ExclCanonical option. <br>
| |
|
| |
|
| <li><b>WithComments</b> | | <li><b>WithComments</b><br> |
| This indicates that all Comment nodes in the specified subtree are to be included in the serialized output.
| | Includes in the serialized output all <var>Comment</var> nodes in the specified subtree. |
| <p>'''Note:''' This option, added in ''Sirius Mods'' version 7.0, is only a supplement to the ExclCanonical option: specifying WithComments without specifying ExclCanonical has no effect. Specifying ExclCanonical without specifying WithComments causes all Comment nodes to be suppressed from the result.</p> | | <p>'''Note:''' This option is only a supplement to the <var>ExclCanonical</var> option: specifying <var>WithComments</var> without specifying <var>ExclCanonical</var> has no effect. Specifying <var>ExclCanonical</var> without specifying <var>WithComments</var> causes all <var>Comment</var> nodes to be suppressed from the result.</p> |
|
| |
|
| <li><b>XmlDecl</b> | | <li><b>XmlDecl</b> |
| This indicates that the serialized <var>XmlDoc</var> will contain the "XML Declaration" (<?xml version=...?>), if the value of the Version property (??[[Version (XmlDoc property)|Version]]) is a non-null string, and if the <var>XmlDoc</var> is not empty.
| | Ensures that the serialization will contain the "XML Declaration" (<code><?xml version=...?></code>), if the value of the [[Version (XmlDoc property)|Version]] property is a non-null string, and if the <var>XmlDoc</var> is not empty. |
| XmlDecl may only be specified if the top of the subtree being serialized is the Root node. | | <var>XmlDecl</var> may only be specified if the top of the subtree being serialized is the <var>Root</var> node. |
| The XmlDecl option is new in ''Sirius Mods'' version 6.7. </ul>
| |
| </td></tr> | |
|
| |
|
| <tr><th>AddTrailingDelimiter</th> | | <tr><th>AddTrailingDelimiter</th> |
Line 152: |
Line 130: |
| as <tt>&quot;</tt>. | | as <tt>&quot;</tt>. |
| Prior to version 7.6, this convention was not strictly observed. | | Prior to version 7.6, this convention was not strictly observed. |
| <li>Canonicalization:
| |
|
| |
| Canonicalization refers to
| |
| a particular serialization of an XML document that is
| |
| unique, yet still a logically equivalent representation
| |
| of the document.
| |
| Exclusive canonicalization is canonicalization augmented by rules for
| |
| preserving or excluding the namespace context (declaration) of nodes when
| |
| ''only a portion of an XML document'' is serialized.
| |
|
| |
| Therefore, if a portion (subtree) of an XML document is exclusively
| |
| canonicalized, it is
| |
| serialized uniquely and is "substantially independent of its XML context"
| |
| (that is, contains all essential and no extraneous information from its
| |
| ancestor nodes).
| |
| This independence makes the subtree suitable for working with digital signatures.
| |
|
| |
| Some of the many requirements for canonicalization are provided automatically
| |
| by specifying the <var>Serial</var> method with no options specified.
| |
| For example, UTF-8 encoding and exclusion of the XML declaration, if any,
| |
| are provided by default by <var>Serial</var>.
| |
| Specifying <tt>ExclCanonical</tt>, which is new as of ''Sirius Mods'' version 7.0,
| |
| adds the following features to the no-option default:
| |
| <ul>
| |
| <li>Sorting of namespace declarations (based on the prefix
| |
| being declared) and of attributes (based on the namespace URI followed by the
| |
| local name).
| |
| The sort order is from lowest
| |
| to highest, and it uses the <var>Unicode</var> code ordering (for example, numbers
| |
| are lower than letters).
| |
| <li>For empty elements, serialization with both a start tag and an end tag,
| |
| instead of using a single "empty element tag."
| |
| <li>The suppression of any Comment nodes that may be present in the subtree.
| |
| Comment nodes are suppressed unless the <tt>WithComments</tt> option is
| |
| specified along with ExclCanonical.
| |
|
| |
| For an example, see item [[??]] refid=namspx5..
| |
| <li>Special namespace declaration handling: A namespace declaration is produced
| |
| only if it is utilized by an element or attribute in the subtree.
| |
| The declaration is produced in the
| |
| start-tag of an element that uses it (or has an attribute using
| |
| it), unless the parent of the element is in the subtree and the
| |
| declaration is in scope at the parent.
| |
|
| |
| For examples, see items [[??]] refid=namspx1. and [[??]] refid=namspx2..
| |
| <li>''Attribute values'' are always serialized within
| |
| double-quotation-mark (<tt>"</tt>) delimiters,
| |
| and a double-quotation mark character in an attribute value is serialized
| |
| as <tt>&quot;</tt>.
| |
|
| |
| With or without the ExclCanonical option,
| |
| these special characters in attribute values are serialized
| |
| as entity and hexadecimal character references:
| |
| <ul>
| |
| <li>The ampersand (&) is serialized as <tt>&amp;</tt>
| |
| <li>The less-than symbol (<) is serialized as <tt>&lt;</tt>
| |
| <li>The carriage return (CR) character is serialized as <tt>&#xD;</tt>
| |
| <li>The linefeed (LF) character is serialized as <tt>&#xA;</tt>
| |
| <li>The tab character is serialized as <tt>&#x9;</tt>
| |
| </ul>
| |
|
| |
| For examples, see item [[??]] refid=namspx6..
| |
|
| |
| <li>Within ''Text nodes'', the following characters are
| |
| serialized as entity and hexadecimal character references:
| |
|
| |
| If you specify <var>Serial</var> with no options:
| |
| <ul>
| |
| <li>The less-than symbol (<) is serialized as <tt>&lt;</tt>
| |
| <li>The ampersand (&) is serialized as <tt>&amp;</tt>
| |
| <li>The carriage return (CR) character is serialized as <tt>&#xD;</tt>
| |
| </ul>
| |
|
| |
| If you specify the ExclCanonical option, the following is ''also'' true:
| |
| <ul>
| |
| <li>The greater-than symbol (>) is serialized as <tt>&gt;</tt>
| |
| </ul>
| |
|
| |
| For examples, see item [[??]] refid=namspx6..
| |
| <li>If serializing the Root of an <var>XmlDoc</var>, a linefeed character
| |
| is inserted ''between'' the children of the Root.
| |
| This character is represented exactly
| |
| by <tt>X'25'</tt> if the <tt>EBCDIC</tt> option of <var>Serial</var> is used; otherwise
| |
| it is represented by <tt>X'0A'</tt>.
| |
| '''Note:'''
| |
| No linefeed is inserted if the <var>XmlDoc</var> has one PI or Comment node and
| |
| does not have an Element node.
| |
| In this case (which is allowed by [[Janus SOAP]]), the XML document is not well-formed
| |
| and therefore the canonicalization specifications ignore it.
| |
| <li>If the subtree to be serialized is a single node that is either of these:
| |
| <ul>
| |
| <li>A PI child of the Root
| |
| <li>A single node that is a Comment child of the Root and
| |
| the <tt>WithComments</tt> option is specified
| |
| </ul>
| |
|
| |
| Then a linefeed character is added after the PI or Comment if
| |
| there is a following Element sibling, or is added before the PI or Comment
| |
| if there is a preceding Element sibling.
| |
| '''Note:'''
| |
| No linefeed is inserted if the <var>XmlDoc</var> does not have an Element node.
| |
| In this case (which is allowed by [[Janus SOAP]]), the XML document is not well-formed
| |
| and therefore the canonicalization specifications ignore it.
| |
| </ul>
| |
|
| |
| Qualifications/exceptions:
| |
| <ul>
| |
| <li>The canonicalization specifications, especially
| |
| exclusive canonicalization, include references to the
| |
| serialization of a ''subset'' of a document.
| |
| The ExclCanonical option is based not on a subset but on a ''subtree''.
| |
| <li>Although the ExclCanonical and SortCanonical options use
| |
| the "<var>Unicode</var>" sort sequence,
| |
| this is currently limited to <var>Unicode</var> values less than 256 (as
| |
| of version &NUNCVSN. of [[Janus SOAP]]),
| |
| so it is accomplished with an 8-byte EBCDIC to 8-byte
| |
| <var>Unicode</var> table, which is (for all intents and purposes) merely an EBCDIC-to-ASCII
| |
| translation.
| |
| <li>The specifications support an argument to canonicalization that
| |
| is a list of namespace declarations that are to be "forced" into the
| |
| serialization.
| |
| The ExclCanonical option does not provide this support.
| |
| </ul>
| |
|
| |
| A series of examples of the effects of the ExclCanonical option
| |
| begins with item [[??]] refid=namspx1..
| |
| </ul> | | </ul> |
|
| |
|
Line 313: |
Line 165: |
| <nowiki><top><a><b>05</b></a><c><d att="val"/></c></top></nowiki> | | <nowiki><top><a><b>05</b></a><c><d att="val"/></c></top></nowiki> |
| </p> | | </p> |
| <li>This and the remaining examples show various aspects of
| |
| the <tt>ExclCanonical</tt> option.
| |
| The examples use the <tt>EBCDIC</tt> option to display
| |
| the result.
| |
| If using ExclCanonical for digital signature processing, you probably
| |
| should omit the EBCDIC option and use the default encoding, UTF-8.
| |
|
| |
| Under exclusive canonicalization, a namespace is not serialized if it is not
| |
| necessary.
| |
| In this example, the subtree to be serialized is displayed in blue font
| |
| in the request code that follows:
| |
| <p class="code">Begin
| |
| %doc is Object XmlDoc
| |
| %doc = New
| |
| %l is longstring
| |
| %sl is object stringlist
| |
| %sl = New
| |
| text to %sl
| |
| <top>
| |
| <a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1">
| |
| <p1:b/>
| |
| <p2:b/>
| |
| </a>
| |
| </top>
| |
| end text
| |
|
| |
| Call %doc:LoadXml(%sl)
| |
| Print 'Exclcan via ParseLines:'
| |
| %sl = New
| |
| %l=%doc:Serial('top/a', 'EBCDIC exclcanonical indent 2 lf')
| |
| %sl:Parselines(%l)
| |
| %sl:Print
| |
| End
| |
| </p>
| |
|
| |
| The exclusive canonical serialization (displayed, after being parsed from string
| |
| to <var>Stringlist</var>, with line breaks and indent for the sake of clarity)
| |
| omits the declaration for <tt>p3</tt>,
| |
| because it is not utilized in the serialized subtree:
| |
| <p class="code"><a>
| |
| <p1:b xmlns:p1="urn:p1"></p1:b>
| |
| <p2:b xmlns:p2="urn:p2"></p2:b>
| |
| </a>
| |
| </p>
| |
|
| |
| An element '''utilizes''' an in-scope
| |
| namespace declaration in either of these cases:
| |
| <ul>
| |
| <li>The element is prefixed and the declaration is of that prefix.
| |
| <li>The element is unprefixed and it is a default namespace declaration.
| |
| </ul>
| |
|
| |
| An attribute '''utilizes''' an in-scope
| |
| namespace declaration if
| |
| the attribute is prefixed and the declaration is of that prefix.
| |
|
| |
| In the preceding example, there was no alternative to removing the non-utilized
| |
| declaration for <tt>p3</tt>, but if it were utilized by a
| |
| descendant element "lower" in the document tree, it would be
| |
| moved to that element.
| |
|
| |
| Another application of the utilization rule is shown in the next example.
| |
|
| |
| <li>Under exclusive canonicalization, namespaces are imported to where
| |
| they are needed.
| |
|
| |
| Using the same type of request as in example [[??]] refid=namspx1 page=no. above,
| |
| the <tt>w</tt> element is the subtree to serialize (display form, blue font):
| |
| <p class="code"><a xmlns:p3="urn:p3" xmlns:p2="urn:p2" xmlns:p1="urn:p1">
| |
| <w>
| |
| <p1:b/>
| |
| <p2:b/>
| |
| </w>
| |
| </a>
| |
| </p>
| |
|
| |
| Exclusive canonical serialization (display form), which gets required namespace
| |
| declarations from an ancestor of the serialized subtree:
| |
| <p class="code"><w>
| |
| <p1:b xmlns:p1="urn:p1"></p1:b>
| |
| <p2:b xmlns:p2="urn:p2"></p2:b>
| |
| </w>
| |
| </p>
| |
| <li>PIs and Comments
| |
|
| |
| Using the same type of request as in example [[??]] refid=namspx1 page=no. above,
| |
| this is the subtree to be serialized (display form):
| |
| <p class="code"><a>
| |
| <!-- Comment 1 -->
| |
| <w>
| |
| <?pi-without-data?>
| |
| </w>
| |
| </a>
| |
| </p>
| |
|
| |
| Exclusive canonical serialization (display form),
| |
| which omits the Comment node:
| |
| <p class="code"><a>
| |
| <w>
| |
| <?pi-without-data?>
| |
| </w>
| |
| </a>
| |
| </p>
| |
| '''Note:'''
| |
| To include the Comment node, specify also the <tt>WithComments</tt>
| |
| option of <var>Serial</var>.
| |
|
| |
| <li>Character references
| |
|
| |
| Using the same type of request as in example [[??]] refid=namspx1 page=no. above,
| |
| this is the subtree to serialize (display form):
| |
| <p class="code"><doc>
| |
| <comp>val>"0" val&lt;"10"</comp>
| |
| <comp expr='val>"0"'></comp>
| |
| <norm attr=' &apos; &#xD;&#xA;&#x9; &apos; '/>
| |
| <white>&#x9;&#xD;&#xA;</white>
| |
| </doc>
| |
| </p>
| |
|
| |
| This is the result from <var>Serial</var> method ''with no options'' specified
| |
| (display form, and the <tt><white></tt> element has a line that wraps
| |
| to emphasize the non-visible linefeed character it contains):
| |
| <p class="code"><doc>
| |
| <comp>val>"0" val&lt;"10"</comp>
| |
| <comp expr='val>"0"'></comp>
| |
| <norm attr=" ' &#xD;&#xA;&#x9; ' "/>
| |
| <white> &#xD;
| |
| </white>
| |
| </doc>
| |
| </p>
| |
|
| |
| The exclusive canonical serialization follows (display form,
| |
| wrapped <tt><white></tt> element line has no indent).
| |
| <p class="code"><doc>
| |
| <comp>val:hp2 color=blue.&gt;'''"0" val&lt;"10"</comp>
| |
| <comp expr=:hp2 color=blue."'''val>:hp2 color=blue.&quot;'''0:hp2 color=blue.&quot;'''"></comp>
| |
| <norm attr=" ' &#xD;&#xA;&#x9; ' ">:hp2 color=blue.</norm>'''
| |
| <white> &#xD;
| |
| </white>
| |
| </doc>
| |
| </p>
| |
|
| |
| The differences from no-option <var>Serial</var> (blue font) include:
| |
| <ul>
| |
| <li>The greater-than symbol (>) within a text node is serialized
| |
| as <tt>&gt;</tt>.
| |
| <li>Attribute values are enclosed in double-quotation marks (<tt>"</tt>).
| |
| <li>A double-quotation mark in an attribute value is serialized
| |
| as <tt>&quot;</tt>.
| |
| <li>An empty element is serialized with two tags (a start tag
| |
| followed by an end tag), not with a single empty-element tag.
| |
| </ul>
| |
| </ol> | | </ol> |
|
| |
|