XML processing in Janus SOAP: Difference between revisions

From m204wiki
Jump to navigation Jump to search
m (minor cleanup)
 
(37 intermediate revisions by 3 users not shown)
Line 5: Line 5:
Comments with "&NSPRVSN" are places that could be revisited when "SOAP rule" support available
Comments with "&NSPRVSN" are places that could be revisited when "SOAP rule" support available
-->  
-->  
[[Janus SOAP]] provides User Language programmers with a substantial set of facilities for processing eXtensible Markup
<var class="product">[[Janus SOAP]]</var> provides <var class="product">[[SOUL]]</var> programmers with a substantial set of facilities for processing eXtensible Markup
Language (XML) documents.
Language (XML) documents.
Among other benefits,
Among other benefits, this enables rich and automated Web services based on a shared and open Web infrastructure.
this enables rich and automated Web services based on a shared and open Web infrastructure.
The design of this XML support is based on various standards, such as XML and [[XPath]].
The design of this XML support is based on various standards, such as XML and XPath.
Many sections in this article refer to these and other standards,
Many sections in this article refer to these and other standards,
for example, [[#Simple Object Access Protocol (SOAP)|Simple Object Access Protocol (SOAP)]].
for example, [[#Simple Object Access Protocol (SOAP)|Simple Object Access Protocol (SOAP)]].
However, it is important to recognize:
However, it is important to recognize:
<ul>
<blockquote><var class="product">Janus SOAP</var> enables you to process <i><b>any XML document</b></i>, whether or not you are using SOAP messages and envelopes.
<li>''Janus SOAP'' enables you to process <i><b>any XML document</b></i>, whether or not you are using
</blockquote>
SOAP messages and envelopes.
</ul>
   
   
XML support is provided in two disjoint sets of classes in ''Janus SOAP'':
XML support is provided in two disjoint sets of classes in <var class="product">Janus SOAP</var>:
<dl>
<dl>
<dt>[[XmlDoc API]]
<dt>[[XmlDoc API]]
<dd>The methods in these classes allow you to convert a character stream XML document into an
<dd>The methods in these classes allow you to convert a character stream XML document into an
internal format (an [[XmlDoc class|XmlDoc object]]) or to programmatically create an XmlDoc, to access and modify an
internal format (an <var>[[XmlDoc class|XmlDoc]]</var> object) or to programmatically create an <var>XmlDoc</var>, to access and modify an
XmlDoc, and to convert an XmlDoc into a character stream XML document.
<var>XmlDoc</var>, and to convert an <var>XmlDoc</var> into a character stream XML document.
 
<dt>[[XmlParser API]]
<dt>[[XmlParser API]]
<dd>This set of classes provides for event-based extraction of information from an XML document in
<dd>This set of classes provides for event-based extraction of information from an XML document in
Line 29: Line 27:
This can be beneficial when only a relatively small part of the XML document is to be processed.
This can be beneficial when only a relatively small part of the XML document is to be processed.
</dl>
</dl>
==Standards relevant to Janus SOAP XML facilities==
==Standards relevant to Janus SOAP XML facilities==
===eXtensible Markup Language (XML)===
===eXtensible Markup Language (XML)===
XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can
XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can be used for structuring almost any kind of data.
be used for structuring almost any kind of data.
Although the word "markup" reveals that the roots of XML are from
Although the word &ldquo;markup&rdquo; reveals that the roots of XML are from
document processing, and indeed the outermost entity in XML is called a
document processing, and indeed the outermost entity in XML is called a
&ldquo;document,&rdquo; XML is ideally suited to structuring almost any kind of
"document," XML is ideally suited to structuring almost any kind of
data that is exchanged between or within applications,
data that is exchanged between or within applications,
particularly (although by no means exclusively) if they are communicating on a network.
particularly (although by no means exclusively) if they are communicating on a network.
   
   
The syntax of XML provides for hierarchical structuring of data (again, the outer
The syntax of XML provides for hierarchical structuring of data (again, the outer
entity is called a document)
entity is called a document) into the principle type called an '''element'''.
into the principle type called an '''element'''.
Elements and the other components of an XML document are described in [[#XML|XML]].
Elements and the other components of an XML document are described in [[#XML|XML]].
   
   
Line 56: Line 54:
An XML document can be considered an abstract object: when XML
An XML document can be considered an abstract object: when XML
is used for interchange between applications,
is used for interchange between applications,
it is usually &ldquo;serialized&ldquo;, or transmitted, completely
it is usually "serialized", or transmitted, completely
in character form.
in character form.
The advantage of this is that it is human-readable and can be
The advantage of this is that it is human-readable and can be
Line 63: Line 61:
Additionally, standard network protocols can be used to exchange documents
Additionally, standard network protocols can be used to exchange documents
between a wide variety of applications on a wide variety of platforms.
between a wide variety of applications on a wide variety of platforms.
As the
As the World Wide Web has demonstrated, using characters as the basis for
World Wide Web has demonstrated, using characters as the basis for
information interchange is extremely powerful and flexible.
information interchange is extremely powerful and flexible.
   
   
Beyond these core properties which make XML very attractive for structuring
Beyond these core properties which make XML very attractive for structuring
data, it has become the basis for a large family of standards.
data, it has become the basis for a large family of standards.
Often these standards are referred to as the XML &ldquo;family,&rdquo; in part
Often these standards are referred to as the XML "family," in part
because they are managed by the XML Working Group of the W3C.
because they are managed by the XML Working Group of the W3C.
Some of these important standards are
Some of these important standards are
XML Schema, XML Stylesheet Transformations, XML Query, and Web Services
XML Schema, XML Stylesheet Transformations, XML Query, and Web Services
Description Language (WSDL).
Description Language (WSDL).
See http://www.w3c.org
See http://www.w3c.org for more information about these and other standards related to XML.
for more information about these and other standards related to XML.
   
   
Quoting from <i><b>XML in a Nutshell (2nd ed</b></i>) (see [[#References|References]]),
Quoting from <i>XML in a Nutshell (2nd ed)</i> (see [[#References|References]]):
<ul>
 
<li>XML offers the tantalizing possibility of truly cross-platform, long term
<blockquote>XML offers the tantalizing possibility of truly cross-platform, long term
data formats. ...
data formats. ... XML delivers portable data.
XML delivers portable data.
In many ways, XML is the most portable ... format designed since the ASCII text file.
In many ways, XML is the most portable ... format designed since the ASCII text file.
</ul>
</blockquote>
   
   
You can use XML strictly as an internal datastructure in your application,
You can use XML strictly as an internal data structure in your application,
or in ''Model 204'' files, or with operating system files, or with other programs using
or in <var class="product">Model 204</var> files, or with operating system files, or with other programs using some communication mechanism.
some communication mechanism.
The simple, character-based format of XML enhances such communication.
The simple, character-based format of XML enhances such communication.
You can communicate with the Web (HTTP), either as a server application
You can communicate with the Web (HTTP), either as a server application (for example,
(for example,
using <var class="product">[[Janus Web Server]]</var>) or making client XML requests (for example, using <var class="product">[[Janus Sockets]]</var> [[HTTP Helper]]).
using [[Janus Web Server]]) or making client XML requests (for example, using [[Janus Sockets]] HTTP Helper).
You can use native <var class="product">Model 204</var> IODEV communication facilities, or <var class="product">Model 204</var> MQ Series, or
You can use native ''Model 204'' IODEV communication facilities, or ''Model 204'' MQ Series, or
any facility that can send and receive streams of characters.
any facility that can send and receive streams of characters.
===Simple Object Access Protocol (SOAP)===
===Simple Object Access Protocol (SOAP)===
The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports
The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports the exchange of structured
the exchange of structured
information between Web-based applications.
information between Web-based applications.
SOAP employs XML to serialize the objects passed between applications.
SOAP employs XML to serialize the objects passed between applications.
SOAP can be used in combination with a variety of existing firewall-friendly
SOAP can be used in combination with a variety of existing firewall-friendly
Internet protocols and formats including HTTP, SMTP, and MIME.
Internet protocols and formats including HTTP, SMTP, and MIME.
SOAP supports a wide range of application paradigms, from messaging systems to
SOAP supports a wide range of application paradigms, from messaging systems to Remote Procedure Call (RPC).
Remote Procedure Call (RPC).
   
   
SOAP is an excellent standard for information exchange between applications,
SOAP is an excellent standard for information exchange between applications,
so good that it is the reason for the name ''Janus SOAP''.
so good that it is the reason for the name <var class="product">Janus SOAP</var>.
It is important to recognize the following, however:
It is important to recognize the following, however:
<ul>
<var class="product">Janus SOAP</var> enables you to process <i><b>any XML document</b></i>, whether
<li>''Janus SOAP'' enables you to process <i><b>any XML document</b></i>''', whether
or not you are using SOAP messages and envelopes.
or not you are using SOAP messages and envelopes'''.
</ul>
   
   
<!-- &NSPRVSN -->
<!-- &NSPRVSN -->
In fact, with the current version,
In fact, with the current version, although you can readily process formal SOAP
although you can readily process formal SOAP
messages, there are no features specially oriented toward that: all features are
messages, there are no features specially oriented toward that: all features are
generalized for handling any kind of XML document.
generalized for handling any kind of XML document.
Later versions will add more functionality to incorporate the standard processing
Later versions will add more functionality to incorporate the standard processing
of SOAP messages, so your
of SOAP messages, so your application will only need to deal with the application-specific
application will only need to deal with the application-specific
parts of the messages.
parts of the messages.
====Example SOAP request====
====Example SOAP request====
This example SOAP message is a request to a SOAP server:
This example SOAP message is a request to a SOAP server:
<pre>
<p class="code">POST /StockQuote HTTP/1.1
    POST /StockQuote HTTP/1.1
Host: www.stockquoteserver.com
    Host: www.stockquoteserver.com
Content-Type: text/xml; charset="utf-8"
    Content-Type: text/xml; charset="utf-8"
Content-Length: nnnn
    Content-Length: nnnn
SOAPAction: "Some-URI"
    SOAPAction: "Some-URI"
   
   
    <SoapEnv:Envelope
<nowiki><SoapEnv:Envelope
      xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/"
      SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
  SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
      <SoapEnv:Body>
  <SoapEnv:Body>
          <m:GetLastTradePrice
      <m:GetLastTradePrice
            xmlns:m="http://sirius-software.com/samp/JSOAP/1">
        xmlns:m="http://sirius-software.com/samp/JSOAP/1">
            <symbol>EMC</symbol>
        <symbol>EMC</symbol>
          </m:GetLastTradePrice>
      </m:GetLastTradePrice>
      </SoapEnv:Body>
  </SoapEnv:Body>
    </SoapEnv:Envelope>
</SoapEnv:Envelope></nowiki>
</pre>
</p>
 
====Example SOAP response====
====Example SOAP response====
This example SOAP message could be a response to
This example SOAP message could be a response to
the above message:
the above message:
<pre>
<p class="code">HTTP/1.1 200 OK
    HTTP/1.1 200 OK
Content-Type: text/xml; charset="utf-8"
    Content-Type: text/xml; charset="utf-8"
Content-Length: nnnn
    Content-Length: nnnn
   
   
    <SOAP-ENV:Envelope
<nowiki><SOAP-ENV:Envelope
      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>
  SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/></nowiki>
      <SOAP-ENV:Body>
  <SOAP-ENV:Body>
          <m:GetLastTradePriceResponse xmlns:m="Some-URI">
      <m:GetLastTradePriceResponse xmlns:m="Some-URI">
              <Price>34.5</Price>
          <Price>34.5</Price>
          </m:GetLastTradePriceResponse>
      </m:GetLastTradePriceResponse>
      </SOAP-ENV:Body>
  </SOAP-ENV:Body>
    </SOAP-ENV:Envelope>
</SOAP-ENV:Envelope>
</pre>
</p>
 
===XML Path Language (XPath) in the XmlDoc API===
===XML Path Language (XPath) in the XmlDoc API===
XPath is a language designed specifically to select nodes from an XML document.
XPath is a language designed specifically to select nodes from an XML document.
It is very powerful, yet it is based on familiar syntax that mimics an
It is very powerful, yet it is based on familiar syntax that mimics an XML document's hierarchy.
XML document's hierarchy.
XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes on which to operate.
XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes
It is a key component of XSLT, XPointer, and XLink, and it has a common foundation with XML Query.
on which to operate.
It is a key component of XSLT, XPointer, and
XLink, and it has a common foundation with XML Query.
   
   
An introduction to the use of XPath is provided in
An introduction to the use of XPath is provided in
Line 173: Line 159:


==XML==
==XML==
As explained above, XML provides the basis for a large
As explained above, XML provides the basis for a large number of varied standards.
number of varied standards.
This section introduces the <i><b>W3C XML Recommendation</b></i>, that is, the XML standard.
This section introduces the <i><b>W3C XML Recommendation</b></i>, that is, the XML standard.
It gives you basic information about XML, explaining some of the concepts using the
It gives you basic information about XML, explaining some of the concepts using the XmlDoc API (that is,
XmlDoc API (that is,
the methods of the <var>XmlDoc</var>, <var>XmlNodelist</var>, and <var>XmlNode</var> classes).
the methods of the XmlDoc, XmlNodelist, and XmlNode classes).
This approach gives you concrete examples which you can try in <var class="product">SOUL</var>,
This approach gives you concrete examples which you can try in User Language,
and which may make the abstract concepts easier to understand.
and which may make the abstract concepts easier to understand.
   
   
The syntax of XML provides for hierarchical structuring of data (the outer
The syntax of XML provides for:
object is called a document) into '''elements'''.
<ul>
<li>Hierarchical structuring of data (the outer object is called a document) into '''elements'''.
<p>
An element has a name, which need not be unique within the document.
An element has a name, which need not be unique within the document.
An element can have any number of '''attributes''', each of which
An element can have any number of '''attributes''', each of which
has a name (which must be unique within that element &mdash; but not within the
has a name (which must be unique within that element &mdash; but not within the document) and a value.
document) and a value.
Within an element can be a series of values and ("sub-") elements, which provides XML with its hierarchical nature.</p>
Within an element can be a series of values and (&ldquo;sub-&rdquo;) elements,
 
which provides XML with its hierarchical nature.
<li>Assigning unique identifiers to elements;
<ul>
<li>There is also a provision for assigning unique identifiers to elements;
this provides even more structuring possibilities than simple hierarchy.
this provides even more structuring possibilities than simple hierarchy.
<p>
These identifiers are implemented with the element type definition
These identifiers are implemented with the element type definition
features provided with either Document Type Declarations or with XML Schema.
features provided with either Document Type Declarations or with XML Schema.
Element type definitions are omitted from our XML documentation; they
Element type definitions are omitted from our XML documentation; they
are not supported in the current version.
are not supported in the current version. </p>
<!-- &NSCHVSN -->
<!-- &NSCHVSN -->
</ul>
</ul>
   
   
An XML document has exactly one outer, or &ldquo;top-level&rdquo; element,
An XML document has exactly one outer, or "top-level," element, and this element
which contains, as descendants,
contains, as descendants, any other elements that may be in the document.
any other elements that may be in the document.
   
   
In addition to the data contained in elements and attributes, any
In addition to the data contained in elements and attributes, any
number of '''comments''' may appear wherever an element may appear.
number of '''comments''' may appear wherever an element may appear.
There
There is also a component called a processing instruction, or '''PI''',
is also a component called a processing instruction, or '''PI''',
which is effectively a comment that has a name.
which is effectively a comment that has a name.
   
   
All names (element names, attribute names, entity references,
All names (element names, attribute names, entity references, and PI targets) are case-sensitive; for example, a less-than symbol
and PI targets) are case-sensitive; for example, a less-than symbol
(<tt><</tt>) can be included in an attribute value if you use the characters
(<) can be included in an attribute value if you use the characters
<code>&amp;lt;</code> &mdash; but not if you use <code>&amp;LT;</code> or <code>&amp;Lt;</code>.
&ldquo;<tt>&amp;lt;</tt>&rdquo; &mdash; but not if you use &ldquo;<tt>&amp;LT;</tt>&rdquo;
or &ldquo;<tt>&amp;Lt;</tt>&rdquo;.
   
   
The rest of this section explains the syntax of XML and various rules
The rest of this section explains the syntax of XML and various rules
for XML documents, according to the <i><b>W3C XML Recommendation</b></i> (as mentioned in [[#References|References]],
for XML documents, according to the <i>W3C XML Recommendation</i> (as mentioned in [[#References|References]],
this includes both the XML specification per se, and the XML Namespaces
this includes both the XML specification per se, and the XML Namespaces specification).
specification).
In ([[#XML syntax|XML syntax]], below) and elsewhere as appropriate, you will find
In ([[#XML syntax|XML syntax]]) and elsewhere as appropriate, you will find
comments about limitations imposed by the XmlDoc API on the <i>W3C XML Recommendation</i>.
comments about limitations imposed by the XmlDoc API on the <i><b>W3C XML Recommendation</b></i>.
 
===XML example===
===XML example===
The next example illustrates the major components of an XML document.
The next example illustrates the major components of an XML document.
The formatting into separate, indented lines is
The formatting into separate, indented lines is provided for readability, but it is not significant for this and for most business data exchange applications.
provided for readability, but it is not significant for this and for most
The letter labels on the left are not part of the document; they are for the explanation which follows:
business data exchange applications.
<p class="code">X: <?xml version='1.1'?>
The letter labels on the left are not part of the document; they
A: &lt;!-- Purchase order follows -->
are for the explanation which follows:
B: <purchase_order>
<pre>
C:  <memo>Dave's order was "late"</memo>
    X: <?xml version='1.1'?>
D:  <?program-version 4.1?>
    A: <!-- Purchase order follows -->
E:  <pitm>
    B: <purchase_order>
      <partID>1234</partID>
    C:  <memo>Dave's order was "late"</memo>
F:    <price per="12" amt="1.280"/>
    D:  <?program-version 4.1?>
      <qty>36</qty>
    E:  <pitm>
G:  </pitm>
          <partID>1234</partID>
H:  <pitm>
    F:    <price per="12" amt="1.280"/>
I:    <price amt=".29"></price>
          <qty>36</qty>
      <partID>5678</partID>
    G:  </pitm>
      <qty>2</qty>
    H:  <pitm>
    </pitm>
    I:    <price amt=".29"></price>
  </purchase_order>
          <partID>5678</partID>
</p>
          <qty>2</qty>
        </pitm>
      </purchase_order>
</pre>
   
   
In the following explanation of each of the labeled lines above,
In the following explanation of each of the labeled lines above, references of the form '''['''<i>cnn</i>''']''', like <code>[B22]</code>,
references of the form '''['''<i><b>cnn</b></i>''']'''
are to productions in [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]] below.
are to productions in [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]].
<table class="thJustBold">
<dl>
<tr><th>X:
<dt>X:
<td><code><?xml version='1.1'?></code>
<dd><tt><?xml version='1.1'?></tt>
<p>
<br>
The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which
The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which
is the set of components preceding the top-level element.
is the set of components preceding the top-level element.
If XMLDecl is present it must:
If XMLDecl is present it must:</p>
<ul>
<ul>
<li>Be the first markup in the document (only whitespace may precede it).
<li>Be the first markup in the document (only whitespace may precede it).
<li>Specify at least the version (as of version 7.5 of the <var class="product">Sirius Mods</var>,
<li>Specify at least the version (as of version 7.5 of the <var class="product">Sirius Mods</var>,
&ldquo;1.0&rdquo; and &ldquo;1.1&rdquo; are the only valid versions).
"1.0" and "1.1" are the only valid versions).
</ul>
</ul>
The clauses in XMLDecl are positional, that is, they must be given in the order
The clauses in XMLDecl are positional, that is, they must be given in the order shown in the syntax.</td></tr>
shown in the syntax.
 
<dt>A:
<tr><th>A:</th>
<dd><tt>&lt;!-- Purchase order follows --></tt>
<td><code>&lt;!-- Purchase order follows --></code>
<br>
<p>
This is a comment at top-level.
This is a comment at top-level. [A1], [B22], and [D27] allow
[A1], [B22], and [D27] allow
zero or more comments and PIs before and after the top-level element.</p></td></tr>
zero or more comments and PIs before and after the top-level element.
 
<dt>B:
<tr><th>B:</th>
<dd><tt><purchase_order></tt>
<td><code><purchase_order></code>
<br>
<p>
This is the element start-tag or STag ([G40]) of the top-level element ([A1]).
This is the element start-tag or STag ([G40]) of the top-level element ([A1]).</p></td></tr>
<dt>C:
 
<dd><tt><memo>Dave's order was "late"</memo></tt>
<tr><th>C:</th>
<br>
<td><code><memo>Dave's order was "late"</memo></code>
With &ldquo;leaf&rdquo; elements (known in XML Schema as elements with simple content),
<p>
With "leaf" elements (known in XML Schema as elements with simple content),
that is, if the only thing between the STag and
that is, if the only thing between the STag and
Etag is CharData ([P14]), you can usually implement the information either as an
Etag is CharData ([P14]), you can usually implement the information either as an
element (text) or as an attribute of the parent element.
element (text) or as an attribute of the parent element.
This text example highlights one small distinction, namely that
This text example highlights one small distinction, namely that
AttValue ([M10]) has less flexibility:
AttValue ([M10]) has less flexibility:</p>
<ul>
<ul>
<li>If the value includes both apostrophes and quotation marks, either the
<li>If the value includes both apostrophes and quotation marks, either the
apostrophes or the quotes must be escaped.
apostrophes or the quotes must be escaped.
<li>CharData not only allows
<li>CharData not only allows
quotes and apostrophes, but it also allows CDSect [Q18].
quotes and apostrophes, but it also allows CDSect [Q18].
</ul>
</ul></td></tr>
<dt>D:
 
<dd><tt><?program-version 4.1?></tt>
<tr><th>D:</th>
<br>
<td><code><?program-version 4.1?></code>
<p>
This is a PI [V16].
This is a PI [V16].
Presumably the name (actually, the target) &ldquo;program-version&rdquo;
Presumably the name (actually, the target) "program-version" is used by the application reading this document.</p></td></tr>
is used by the application reading this document.
 
<dt>E:
<tr><th>E:</th>
<dd><tt><pitm></tt>
<td><code><pitm></code>
<br>
<p>
This is the STag of an element which is contained
This is the STag of an element which is contained
within another element and which contains child elements;
within another element and which contains child elements;
this allows you to group elements together.
this allows you to group elements together.</p></td></tr>
<dt>F:
 
<dd><tt><price per="12" amt="1.280"/></tt>
<tr><th>F:</th>
<br>
<td><code><price per="12" amt="1.280"/></code>
<p>
This is an example of the EmptyElemTag ([I44]), which can be useful
This is an example of the EmptyElemTag ([I44]), which can be useful
if an element contains no data (just the name can be meaningful to
if an element contains no data (just the name can be meaningful to
the application), or if it only contains data using attributes.
the application), or if it only contains data using attributes.</p></td></tr>
<dt>G:
 
<dd><tt></pitm></tt>
<tr><th>G:</th>
<br>
<td><code></pitm></code>
<p>
This is the ETag [H42] of an element.
This is the ETag [H42] of an element.
The name must exactly match the STag for the element (again, XML is case sensitive).
The name must exactly match the STag for the element (again, XML is case sensitive).</p></td></tr>
<dt>H:
 
<dd><tt><pitm></tt>
<tr><th>H:</th>
<br>
<td><code><pitm></code>
Here is another STag of an element;
<p>
it is the &ldquo;sibling&rdquo; of another with the same name.
Here is another STag of an element; it is the "sibling" of another with the same name.
The ability to have sub-elements and the ability to repeat elements with the
The ability to have sub-elements and the ability to repeat elements with the
same name in a given parent element are the important data modeling
same name in a given parent element are the important data modeling
distinctions between elements and attributes.
distinctions between elements and attributes.</p></td></tr>
<dt>I:
 
<dd><tt><price amt=".29"></price></tt>
<tr><th>I:</th>
<br>
<td><code><price amt=".29"></price></code>
<p>
Note that not all instances of a given element type (the price item
Note that not all instances of a given element type (the price item
is an element type) must have the same attributes, nor must they have
is an element type) must have the same attributes, nor must they have
the same sub-structure.
the same sub-structure. Also, these are optional:</p>
Also, these are optional:
<ul>
<ul>
<li>Whether an element has content.
<li>Whether an element has content.
<li>Whether to use an STag immediately followed by an ETag (as is done here)
<li>Whether to use an STag immediately followed by an ETag (as is done here)
or to use the EmptyElemTag (as is done above in item F).
or to use the EmptyElemTag (as is done above in item F).
</ul>
</ul></td></tr>
</dl>
</table>
 
===XML syntax===
===XML syntax===
This section contains a version of the XML syntax.
This section contains a version of the XML syntax.
It is taken from the <i><b>W3C XML Recommendation</b></i>, which is the authoritative reference:
It is taken from the <i>W3C XML Recommendation</i>, which is the authoritative reference:
<pre>
<p class="code"><nowiki>http://www.w3.org/TR/REC-xml</nowiki>
    http://www.w3.org/TR/REC-xml
</p>
</pre>
The syntax below has been changed from the standard in these ways:
The syntax below has been changed from the standard in these ways:
<ul>
<ul>
<li>The only structure in the XML syntax not supported
<li>The only structure in the XML syntax not supported
in the current version is the
in the current version is the
<!-- &NDTDVSN -->
<!-- &NDTDVSN -->
Document Type Declaration, or DTD, (&ldquo;<!DOCTYPE...>&rdquo;).
Document Type Declaration, or DTD, ("<!DOCTYPE...>").
Although a DTD can be tolerated if you use the DTD_IGNORE option
Although a DTD can be tolerated if you use the DTD_IGNORE option
of the deserialization functions ([[LoadXml (XmlDoc/XmlNode function)|LoadXml]],
of the deserialization functions (<var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>,
[[WebReceive (XmlDoc function)|WebReceive]], and [[ParseXml (HttpResponse function)|ParseXml]])
<var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, and <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>)
&mdash; the information contained in the
&mdash; the information contained in the DTD is not used nor made available to the <var class="product">SOUL</var> program.
DTD is not used nor made available to the User Language program.
   
   
Reflecting the absence of support for DTD,
Reflecting the absence of support for DTD, the productions in the syntax that follows are altered to remove those
the productions in the syntax that follows are altered to remove those
parts of an XML document introduced in the DTD.
parts of an XML document introduced in the DTD.
'''Note:'''
<p class="note">
Much of the functionality of document type declarations may be better
'''Note:''' Much of the functionality of document type declarations may be better
provided using XML Schema, which is planned for a future version.
provided using XML Schema, which is planned for a future version.</p>
<li>The Char, Name, NameStartChar, and NameChar productions are taken from
 
the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) .
<li>The Char, Name, NameStartChar, and NameChar productions are taken from the [http://www.w3.org/TR/xml11/ XML 1.1 recommendation].
As explained in [[#Char and Reference|Char and Reference]], only characters representable in 8-bit
As explained in [[#Char and Reference|Char and Reference]], only characters representable in 8-bit
EBCDIC were handled prior to <var class="product">Sirius Mods</var> version 7.6,
EBCDIC were handled prior to <var class="product">Sirius Mods</var> version 7.6, so fewer characters were supported in the production for
so fewer characters were supported in the production for
Char ([CA2]) in earlier <var class="product">Sirius Mods</var> releases.
Char ([CA2]) in earlier <var class="product">Sirius Mods</var> releases.
<li>The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version
<li>The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version
7.7, the maximum was 100).
7.7, the maximum was 100).
<li>The productions are re-ordered
 
(to make it easier to read the grammar), and letters are added before them,
<li>The productions are re-ordered (to make it easier to read the grammar), and letters are added before them,
so when [B22] is referred to in the text, you know that this is between
so when <code>[B22]</code> is referred to in the text, you know that this is between [A<i>nn</i>] and [C<i>nn</i>] in this grammar, and this is production [22] for the same
[Ann] and [Cnn] in this grammar, and this is production [22] for the same
non-terminal (in this case, <code>prolog</code>) in the <i>W3C XML Recommendation</i>.
non-terminal (in this case, <tt>prolog</tt>) in the <i><b>W3C XML Recommendation</b></i>.
</ul>
</ul>
   
   
The conventions used are:
The conventions used are:
<dl>
<table>
<dt>'<i>yyy</i>' (apostrophes) or "<i>yyy</i>" (quotes)
<tr><th>'<i>yyy</i>' or "<i>yyy</i>"</th>
<dd>Enclose an item <i><b>xxx</b></i> that must appear exactly as shown.
<td>Enclosed item, <i><b>yyy</b></i>, must appear exactly as shown.</td></tr>
<dt>#x<i>nn</i>
 
<dd>Specifies the character (in ISO-10646) with code
<tr><th>#x<i>nn</i></th>
value <i><b>nn</b></i>.
<td>Specifies the character (in ISO-10646) with code value <i><b>nn</b></i>.
For example, <tt>#x09 #x0D #x0A #x20</tt> specify the
<p>
tab, carriage return, linefeed, and space characters, respectively.
For example, <code>#x09 #x0D #x0A #x20</code> specify the
<dt>[^<i>abc</i>]
tab, carriage return, linefeed, and space characters, respectively.</p></td></tr>
<dd>Specifies any character except
 
<i><b>a</b></i>, <i><b>b</b></i>, or <i><b>c</b></i>.
<tr><th>[^<i>abc</i>]</th>
<dt>[<i>chars</i>]
<td>Specifies any character except <i><b>a</b></i>, <i><b>b</b></i>, or <i><b>c</b></i>.</td></tr>
<dd>Specifies any character within the set
 
<i><b>chars</b></i>, where <i><b>chars</b></i> can be the concatenation of these sets:
<tr><th>[<i>chars</i>]</th>
<td>Specifies any character within the set <i><b>chars</b></i>, where <i><b>chars</b></i> can be the concatenation of these sets:
<ul>
<ul>
<li><i><b>y</b></i>, meaning the single character <i><b>y</b></i>
<li><i><b>y</b></i>, meaning the single character <i><b>y</b></i>
<li><i><b>y</b></i>'''-'''<i><b>z</b></i>, meaning characters in the range from
<li><i><b>y</b></i>'''-'''<i><b>z</b></i>, meaning characters in the range from
<i><b>y</b></i> to <i><b>z</b></i>, inclusive
<i><b>y</b></i> to <i><b>z</b></i>, inclusive
</ul>
</ul>
The resulting set of
 
<i><b>chars</b></i> is the union of the specified sets.
The resulting set of <i><b>chars</b></i> is the union of the specified sets.</td></tr>
<dt><i>set1</i> - <i>set2</i> (&ldquo;-&rdquo; not enclosed in [...])
 
<dd>The set of strings described by <i><b>set1</b></i>, with the set of strings
<tr><th><i>set1</i> - <i>set2</i> ("-" not enclosed in [...])</th>
described by <i><b>set2</b></i> removed.
<td>The set of strings described by <i><b>set1</b></i>, with the set of strings
<dt>|
described by <i><b>set2</b></i> removed.</td></tr>
<dd>Separates alternatives.
 
<dt>?
<tr><th>|</th>
<dd>Follows an optional item.
<td>Separates alternatives.</td></tr>
<dt>*
 
<dd>Follows an item that can occur any number of times (even not at all).
<tr><th>?</th>
<dt>+
<td>Follows an optional item.</td></tr>
<dd>Follows an item that can occur one or more times.
 
<dt>(<i>abc</i>) (parentheses)
<tr><th>*</th>
<dd>Group items
<td>Follows an item that can occur any number of times (even not at all).</td></tr>
<dt>[<i>rule</i>] (&ldquo;to the right&rdquo;)
 
<dd>Marks an additional syntax rule.
<tr><th>+</th>
<dt>/*<i>comment</i>*/
<td>Follows an item that can occur one or more times.</td></tr>
<dd>Marks a comment.
 
</dl>
<tr><th>(<i>abc</i>) (parentheses)</th>
<td>Groups items.</td></tr>
 
<tr><th>[<i>rule</i>] ("to the right")</th>
<td>Marks an additional syntax rule.</td></tr>
 
<tr><th>/*<i>comment</i>*/</th>
<td>Marks a comment.</td></tr>
</table>
   
   
The syntax is shown in three sections:
The syntax is shown in three sections:
Line 418: Line 409:
<li>The major components
<li>The major components
<li>The productions that describe individual characters
<li>The productions that describe individual characters
<li>The components
<li>The components of the "XML Declaration" (<code><?xml version=...?></code>)
of the &ldquo;XML Declaration&rdquo; (<tt><?xml version=...?></tt>)
</ol>
</ol>
====Syntax of document, element, Attribute, Comment, PI====
====Syntax of document, element, Attribute, Comment, PI====
<pre style="skel">
<p class="code">[A1]  document     ::= (prolog element Misc*) (Char* RestrictedChar Char*)
[A1]  document     ::= (prolog element Misc*)
 
                        - (Char* RestrictedChar Char*)
[B22] prolog       ::= XMLDecl? Misc*
[B22] prolog       ::= XMLDecl? Misc*
 
[C23] XMLDecl     ::= '<?xml' VersionInfo
[C23] XMLDecl       ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
                      EncodingDecl? SDDecl? S? '?>'
 
[D27] Misc         ::= Comment | PI | S
[D27] Misc         ::= Comment | PI | S
[E3]  S           ::= (#x20 | #x9   /* Whitespace */
 
                      | #xD | #xA)+
[E3]  S             ::= (#x20 | #x9 /* Whitespace */ | #xD | #xA)+
[F39] element     ::= STag content ETag  [Element Type Match]
 
                      | EmptyElemTag
[F39] element       ::= STag content ETag  [Element Type Match] | EmptyElemTag
[G40] STag         ::= '<' Name (S Attribute)* S? '>'  [Unique Att]
 
[H42] ETag         ::= '</' Name S? '>'
[G40] STag         ::= '<' Name (S Attribute)* S? '>'  [Unique Att]
 
[H42] ETag         ::= '</' Name S? '>'
   
   
[I44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [Unique Att]
[I44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [Unique Att]
   
   
[NSC] NameStartChar ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[NSC] NameStartChar ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
      [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
                            [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
      [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
                            [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
      [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
   
   
[NC]  NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
[NC]  NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
                        [#x0300-#x036F] | [#x203F-#x2040]
   
   
[NA]  Name          ::= NameStartChar (NameChar)*
[NA]  Name          ::= NameStartChar (NameChar)*
</pre>
</p>
   
   
Within an XML document, the maximum length of a name (for example,
Within an XML document, the maximum length of a name (for example, each of the prefix part the the local part of
each of the prefix part the the local part of
an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7,
an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7,
the maximum length was 100 characters).
the maximum length was 100 characters).
Element and attribute names are also subject to
Element and attribute names are also subject to
restrictions related to XML Namespaces; see [[#Name and namespace syntax|Name and namespace syntax]].)
restrictions related to XML Namespaces; see [[#Name and namespace syntax|Name and namespace syntax]].
   
   
<pre style="skel">
<p class="code">[L41] Attribute    ::= Name Eq AttValue
[L41] Attribute    ::= Name Eq AttValue
 
[M10] AttValue      ::= '"' ([^<&"] | Reference)* '"'
[M10] AttValue      ::= '"' ([^<&"] | Reference)* '"'
                         | "'" ([^<&'] | Reference)* "'"
                         | "'" ([^<&'] | Reference)* "'"
Line 463: Line 453:
[N25] Eq            ::= S? '=' S?
[N25] Eq            ::= S? '=' S?
   
   
[O43] content      ::= CharData? ( (element
[O43] content      ::= CharData? ( (element | Reference | CDSect | PI | Comment) CharData? )*
                        | Reference | CDSect | PI
                        | Comment) CharData? )*
   
   
[P14] CharData      ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
[P14] CharData      ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
[Q18] CDSect        ::= CDStart CData CDEnd
[Q18] CDSect        ::= CDStart CData CDEnd
[R19] CDStart      ::= '<![CDATA['
[R19] CDStart      ::= '<![CDATA['
[S20] CData        ::= (Char* - (Char* ']]>' Char*))
[S20] CData        ::= (Char* - (Char* ']]>' Char*))
[T21] CDEnd        ::= ']]>'
[T21] CDEnd        ::= ']]>'
   
   
[U15] Comment      ::= '<!--' ( (Char - '-')
[U15] Comment      ::= '&lt;!--' ( (Char - '-') | ('-' (Char - '-')) )* '-->'
                        | ('-' (Char - '-')) )* '-->'
   
   
[V16] PI            ::= '<?' PITarget (S (Char* -
[V16] PI            ::= '<?' PITarget (S (Char* (Char* '?>' Char*) ))? '?>'
                        (Char* '?>' Char*) ))? '?>'
[W17] PITarget      ::= Name - (('X' | 'x') ('M' | 'm')
                        ('L' | 'l'))
</pre>
   
   
[W17] PITarget      ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
</p>
====Char and Reference====
====Char and Reference====
<pre style="skel">
<p class="code">[CA2]  Char          ::= [#x1-#xD7FF] | [#xE000-#xFFFD]
[CA2]  Char          ::= [#x1-#xD7FF] | [#xE000-#xFFFD]
 
[CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F]
[CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]
                          | [#x7F-#x84] | [#x86-#x9F]
 
[CB67] Reference      ::= EntityRef | CharRef
[CB67] Reference      ::= EntityRef | CharRef
[CD68] EntityRef      ::= '&' Name ';'
[CD68] EntityRef      ::= '&' Name ';'
[CC66] CharRef        ::= '&amp;#' [0-9]+ ';'
 
                          | '&amp;#x' [0-9a-fA-F]+ ';'  [Legal Char]
[CC66] CharRef        ::= '&amp;#' [0-9]+ ';' | '&amp;#x' [0-9a-fA-F]+ ';'  [Legal Char]
</pre>
</p>
   
   
=====ISO-10646 and EBCDIC characters=====
=====ISO-10646 and EBCDIC characters=====
<ul>
<ul>
<li>Through <var class="product">Sirius Mods</var> version 7.5,
<li>Through <var class="product">Sirius Mods</var> version 7.5, XmlDocs were maintained in EBCDIC, and
XmlDocs were maintained in EBCDIC, and
production <code>[CA2]</code> above did not allow the full range of ISO-10646 characters shown in the <i>W3C XML Recommendation</i>.
production [CA2] above did not
(ISO-10646 is the standard for the universal character set, also known as Unicode.)
allow the full range of ISO-10646 characters shown in the <i><b>W3C XML Recommendation</b></i>.
(ISO-10646 is the standard for the universal character set, also known as
Unicode.)
The XmlDoc API might have rejected an XML document
The XmlDoc API might have rejected an XML document
because it contained an ISO-10646 character that could not be represented in EBCDIC.
because it contained an ISO-10646 character that could not be represented in EBCDIC.
   
   
As of <var class="product">Sirius Mods</var> version 7.6, XmlDocs are maintained in Unicode
As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDocs</var> are maintained in Unicode
as supported by the <var class="product">Sirius Mods</var>.
as supported by the <var class="product">Sirius Mods</var>.
This is why production [CA2] shows that
This is why production <code>[CA2]</code> shows that no Unicode characters greater then <code>U+FFFD</code> are allowed.
no Unicode characters greater then U+FFFD are allowed.
In addition, deserialization (with default options) of an XML document fails if the document
In addition, deserialization (with default options) of an XML document fails if the document
contains a Unicode character that is not translatable to EBCDIC.
contains a Unicode character that is not translatable to EBCDIC.
The AllowUntranslatable option of the deserialization methods lets you
The <var>AllowUntranslatable</var> option of the deserialization methods lets you circumvent this restriction.
circumvent this restriction.
The null character (#x0), normally restricted, is allowed in an XML
document if the XmlDoc's AllowNull property is set to <tt>True</tt>.
'''Note:'''
Using the standard
translation table provided with <var class="product">Sirius Mods</var> versions prior to 7.3,
many EBCDIC characters (such as X'FF'),
in addition to the &ldquo;control characters&rdquo; that were
explicitly prohibited,
were ''not'' legal XML characters
because they did not translate to any Unicode character.
   
   
The null character (<code>#x0</code>), normally restricted, is allowed in an XML
document if the <var>XmlDoc</var>'s <var>AllowNull</var> property is set to <code>True</code>.
<blockquote class="note">
<p>'''Note:''' Using the standard translation table provided with <var class="product">Sirius Mods</var> versions prior to 7.3,
many EBCDIC characters (such as <code>X'FF'</code>), in addition to the "control characters" that were
explicitly prohibited, were ''not'' legal XML characters because they did not translate to any Unicode character.</p>
<p>
In <var class="product">Sirius Mods</var> version 7.3, the standard translation table was modified significantly.
In <var class="product">Sirius Mods</var> version 7.3, the standard translation table was modified significantly.
For more information about supported characters and character translation
For more information about supported characters and character translation
issues as of version 7.3, see [[??]] refid=u80. and [[??]] refid=cxe2u..
issues as of version 7.3, see [[Unicode#Support for the ASCII subset of Unicode|Support for the ASCII subset of Unicode]] and [[Unicode#Corrected translations between ASCII/Unicode and EBCDIC|Corrected translations between ASCII/Unicode and EBCDIC]].</p> </blockquote>
<li>As stated in "[[XmlDoc API#Transport: sending and receiving XML|Transport: sending and receiving XML]]", UTF-8, UTF-16, and ISO-8859-x
 
<li>As stated in [[XmlDoc API#Transport: sending and receiving XML|Transport: sending and receiving XML]], UTF-8, UTF-16, and ISO-8859-x
encodings are accepted (note that these must be given in all-capital letters  within the XML declaration).
encodings are accepted (note that these must be given in all-capital letters  within the XML declaration).
<li>XPath comparisons are performed using Unicode.
<li>XPath comparisons are performed using Unicode.
As of version 7.3, it is the only type of ordered character comparison.
As of version 7.3, it is the only type of ordered character comparison.
Line 535: Line 518:
and could be controlled by the (now obsolete) [[XPathOrder (obsolete XmlDoc property)|XPathOrder]] property.
and could be controlled by the (now obsolete) [[XPathOrder (obsolete XmlDoc property)|XPathOrder]] property.
</ul>
</ul>
=====Entity references=====
=====Entity references=====
<ul>
<ul>
<li>One purpose of an EntityRef is to allow a sequence of characters that
<li>One purpose of an EntityRef is to allow a sequence of characters that
may be illegal in a particular context of an XML document.
may be illegal in a particular context of an XML document.
For example, within an element's content, the string &ldquo;]]>&rdquo; is not
For example, within an element's content, the string <code>]]></code> is not
allowed, so you may replace the greater-than symbol (>) with
allowed, so you may replace the greater-than symbol (<tt>></tt>) with either its character code in a CharRef, or with the predefined entity <code>&amp;gt;</code>:
either its character code in a CharRef, or with
<p class="code">]]&amp;gt;
the predefined entity <tt>&amp;gt;</tt>:
</p>
<pre>
<p>
    ]]&amp;gt;
A <code>Reference</code> (<code>EntityRef</code> or <code>CharRef</code>) is allowed only in an element's content (<code>[O43]</code>) or in <code>AttValue</code> (<code>[M10]</code>).</p>
</pre>
 
A Reference (EntityRef or CharRef)
is allowed only in an element's content ([O43]) or in AttValue ([M10]).
<li>There is a facility for defining your own entities in a DTD, but
<li>There is a facility for defining your own entities in a DTD, but
since DTDs are not supported in ''Janus SOAP'',
since DTDs are not supported in <var class="product">Janus SOAP</var>,
the only entity references supported are the five predefined entities:
the only entity references supported are the five predefined entities:
<dl>
 
<dt>&amp;amp;
<table class="thJustBold">
<dd>ampersand (&)
<tr><th>&amp;amp;</th>
<dt>&amp;apos;
<td>ampersand (<tt>&</tt>)</td></tr>
<dd>apostrophe (')
 
<dt>&amp;gt;
<tr><th>&amp;apos;</th>
<dd>greater than (>)
<td>apostrophe (<tt>'</tt>)</td></tr>
<dt>&amp;lt;
 
<dd>less than (<)
<tr><th>&amp;gt;</th>
<dt>&amp;quot;
<td>greater than (<tt>></tt>)</td></tr>
<dd>double quotation mark (")
 
</dl>
<tr><th>&amp;lt;</th>
'''Note:'''
<td>less than (<tt><</tt>)</td></tr>
As of <var class="product">Sirius Mods</var> version 7.6, you can use any of the XHTML entities
 
(listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2)
<tr><th>&amp;quot;</th>
<td>double quotation mark (<tt>"</tt>)</td></tr>
 
<tr><th>&amp;lsqb; <br>&amp;rsqb;</th>
<td>left and right square brackets (<tt>[</tt> <tt>]</tt>) <br>(as of Model&nbsp;204 7.6)</td></tr>
</table>
 
<blockquote class="note">
<p>'''Note:''' You can use any of the XHTML entities (listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2)
to represent Unicode characters when converting from EBCDIC to Unicode.
to represent Unicode characters when converting from EBCDIC to Unicode.
Character decoding must be in effect, however: you must be using
Character decoding must be in effect, however: you must be using
the [[U (String function)|U]] constant function
the <var>[[U (String function)|U]]</var> constant function or the <code>CharacterDecode=True</code> argument on the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> function. </p>
or the <tt>CharacterDecode=True</tt>
argument on the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] function.
   
   
You can load into an XmlDoc a character represented by such an entity
You can load into an <var>XmlDoc</var> a character represented by such an entity if you decode the entity reference before the character is processed by one of the XmlDoc API deserializing or direct storage methods. </blockquote>
if you decode the entity reference before the character is processed
by one of the XmlDoc API deserializing or direct storage methods.
</ul>
</ul>
====Components of XMLDecl====
====Components of XMLDecl====
<pre style="skel">
<p class="code">[XA24] VersionInfo  ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
[XA24] VersionInfo  ::= S 'version' Eq
                        ("'" VersionNum "'"
                        | '"' VersionNum '"')
   
   
[XB26] VersionNum  ::= ([a-zA-Z0-9_.:] | '-')+
[XB26] VersionNum  ::= ([a-zA-Z0-9_.:] | '-')+
   
   
[XC80] EncodingDecl ::= S 'encoding' Eq
[XC80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" )
                        ('"' EncName '"'
                        | "'" EncName "'" )
   
   
[XD81] EncName      ::= [A-Za-z] ([A-Za-z0-9._]
[XD81] EncName      ::= [A-Za-z] ([A-Za-z0-9._] | '-')*  /* Only Latin chars */
                        | '-')*  /* Only Latin chars */
   
   
[XE32] SDDecl      ::= S 'standalone' Eq (
[XE32] SDDecl      ::= S 'standalone' Eq ( ("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"') )
                        ("'" ('yes' | 'no') "'")
</p>
                        | ('"' ('yes' | 'no') '"') )
</pre>


===Names and namespaces===
===Names and namespaces===
XML documents are allowed to contain
XML documents are allowed to contain elements and attributes that are defined by one organization, as well as
elements and attributes that are defined by one organization, as well as
other elements and attributes that are defined by another organization.
other elements and attributes that are defined by another organization.
In order to achieve this organizational &ldquo;merging,&rdquo;
In order to achieve this organizational "merging," the <i>XML Namespaces Recommendation</i> (http://www.w3.org/TR/REC-xml-names)
the <i><b>XML Namespaces Recommendation</b></i>
(http://www.w3.org/TR/REC-xml-names)
provides for a way to qualify these merged names so that they will not conflict.
provides for a way to qualify these merged names so that they will not conflict.
   
   
Also, the Namespaces Recommendation provides a way for an application
Also, the Namespaces Recommendation provides a way for an application
to examine, in effect, the &ldquo;defining organization&rdquo; of a name
to examine, in effect, the "defining organization" of a name
in an XML document, so that various properties can be inferred, and
in an XML document, so that various properties can be inferred, and
names from the same &ldquo;organization&rdquo; can be grouped together.
names from the same "organization" can be grouped together.
   
   
Conceptually, the Namespaces Recommendation qualifies a name with a
Conceptually, the Namespaces Recommendation qualifies a name with a Uniform Resource Identifier ('''URI''').
Uniform Resource Identifier ('''URI''').
There are various rules for various types of URIs; one familiar type
There are various rules for various types of URIs; one familiar type
is the same as URLs on the World Wide Web, such as
is the same as URLs on the World Wide Web, such as:
<pre>
<p class="code"><nowiki>http://www.w3.org/2001/XMLSchema</nowiki>
    http://www.w3.org/2001/XMLSchema
</p>
</pre>
The important aspect of a URI, as far as the names in an XML document
The important aspect of a URI, as far as the names in an XML document
are concerned, is simply that it is a unique string for the names
are concerned, is simply that it is a unique string for the names that are associated with it.
that are associated with it.
   
   
The characters that are valid in a URI (shown in [[#Uniform Resource Identifier syntax|Uniform Resource Identifier syntax]])
The characters that are valid in a URI (shown in [[#Uniform Resource Identifier syntax|Uniform Resource Identifier syntax]]) exceed the set of characters that are valid in an XML name.
exceed the set of characters that are valid in an XML name.
Therefore, the technique employed for XML Namespace qualification is to use a special kind of attribute &mdash; one that begins with "xmlns" &mdash; to associate a name '''prefix''' with a URI.
Therefore, the technique employed for XML Namespace qualification is to
use a special kind of attribute &mdash; one that begins
with &ldquo;xmlns&rdquo; &mdash; to associate a name '''prefix''' with a URI.
Then attaching a prefix to a name effectively attaches the URI to a name.
Then attaching a prefix to a name effectively attaches the URI to a name.
   
   
The syntax for making this association, the namespace declaration, is explained in the next section.
The syntax for making this association, the namespace declaration, is explained in the next section.
====Name and namespace syntax====
====Name and namespace syntax====
The <i><b>W3C XML Recommendation</b></i> syntax rule for names is shown in
The <i>W3C XML Recommendation</i> syntax rule for names is shown in
"[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]" (and
[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]] (and
repeated below) as the Name ([NA]), NameStartChar ([NSC]),
repeated below) as the <code>Name</code> (<code>[NA]</code>), <code>NameStartChar</code> (<code>[NSC]</code>), and <code>NameChar</code> (<code>[NC]</code>) productions.
and NameChar ([NC]) productions.
The XML Namespaces Recommendation provides additional rules for Element and Attribute names (but not for PI targets).
The XML Namespaces Recommendation provides additional
From the Namespaces Recommendation, element and attribute names are both instances of <code>QName</code>:
rules for Element and Attribute names (but not for PI targets).
From the Namespaces Recommendation, element and attribute names
are both instances of <tt>QName</tt>:
   
   
<pre style="skel">
<p class="code">[NSC] NameStartChar  ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |  
[NSC] NameStartChar  ::=  ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
                              [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |  
      [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
                              [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
      [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
      [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
   
   
[NC]  NameChar  ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
[NC]  NameChar  ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
                    [#x0300-#x036F] | [#x203F-#x2040]
   
   
[NA]  Name      ::= NameStartChar (NameChar)*
[NA]  Name      ::= NameStartChar (NameChar)*
   
   
[NB5] NCName    ::= (NameStartChar - ':') (NameChar - ':')*
[NB5] NCName    ::= (NameStartChar - ':') (NameChar - ':')*
[NC6] QName      ::= (Prefix ':')? LocalPart
[NC6] QName      ::= (Prefix ':')? LocalPart
[ND7] Prefix    ::= NCName
[ND7] Prefix    ::= NCName
[NE8] LocalPart  ::= NCName
[NE8] LocalPart  ::= NCName
</pre>
</p>
   
   
Although the <i><b>W3C XML Recommendation</b></i> does not require that attribute and element names
Although the <i>W3C XML Recommendation</i> does not require that attribute and element names
follow the XML Namespaces Recommendation, the operation of XPath requires
follow the XML Namespaces Recommendation, the operation of XPath requires it.
it.
Therefore, since XPath is so important for the XmlDoc API, its default operating
Therefore, since XPath is so important for the XmlDoc API, its default operating
mode is to require Namespaces conformance in the XML document.
mode is to require Namespaces conformance in the XML document.
See the [[Namespace (XmlDoc property)|Namespace]] property.
See the <var>[[Namespace (XmlDoc property)|Namespace]]</var> property.
   
   
The restrictions and changes to the XML Recommendation are as follows:
The restrictions and changes to the XML Recommendation are as follows:
<ul>
<ul>
<li>The <tt>NameStartChar</tt> and <tt>NameChar</tt> productions are taken
<li>The <code>NameStartChar</code> and <code>NameChar</code> productions are taken
from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) .
from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) .
Starting with version 7.6 of the <var class="product">Sirius Mods</var>, XmlDocs are maintained in Unicode, as
Starting with version 7.6 of the <var class="product">Sirius Mods</var>, XmlDocs are maintained in Unicode, as
supported by the <var class="product">Sirius Mods</var>.
supported by the <var class="product">Sirius Mods</var>.
That support excludes characters encoded in more than two bytes, so production
That support excludes characters encoded in more than two bytes, so production
[NSC], above, shows no Unicode characters greater than U+FFFD.
<code>[NSC]</code>, above, shows no Unicode characters greater than <code>U+FFFD</code>.
   
   
By default, deserialization of an XML document fails if the document
By default, deserialization of an XML document fails if the document
contains a Unicode character that is not translatable to EBCDIC.
contains a Unicode character that is not translatable to EBCDIC.
The AllowUntranslatable argument of the deserialization methods lets you
The <var>AllowUntranslatable</var> argument of the deserialization methods lets you circumvent this restriction.
circumvent this restriction.
 
<li>A name can have at most one colon (:), which separates
<li>A name can have at most one colon (<tt>:</tt>), which separates
the name into a non-null '''prefix''' and a non-null '''local name'''.
the name into a non-null '''prefix''' and a non-null '''local name'''.
<li>A name without a prefix is simply a local name.
<li>A name without a prefix is simply a local name.
<li>The prefix, if any, must be associated with a '''namespace
 
URI''' using an attribute of the form:
<li>The prefix, if any, must be associated with a '''namespace URI''' using an attribute of the form:
<blockquote style="xmp">
<p class="code">xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"
xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"
</p>
</blockquote>
<!--  xmlns:prefix="URI" -->
<!--  xmlns:prefix="URI" -->
<!--?? xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"-->
<!--?? xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"-->
For example, all elements (and attributes of those elements) within
For example, all elements (and attributes of those elements) within
the content of the <tt>definitions</tt> element below can use the prefix
the content of the <code>definitions</code> element below can use the prefix
&ldquo;xsd&rdquo; to qualify their names to belong to the
"xsd" to qualify their names to belong to the <nowiki>"http://www.w3.org/2001/XMLSchema"</nowiki> namespace:
<nowiki>"http://www.w3.org/2001/XMLSchema"</nowiki> namespace:
<p class="code"><nowiki><definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema"></nowiki>
<pre style="xmp">
  ... content of definitions element ...
  <definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</definitions>
    ... content of definitions element ...
</p>
  </definitions>
 
</pre>
<li>The prefix <code>xml</code> is bound to the namespace URI
<li>The prefix <tt>xml</tt> is bound to the namespace URI
<code><nowiki>http://www.w3.org/XML/1998/namespace</nowiki></code>.
<tt><nowiki>http://www.w3.org/XML/1998/namespace</nowiki></tt>.
Neither can be used without the other.
Neither can be used without the other.
<br>
 
<li>An element can also have a '''default namespace''' attribute,
<li>An element can also have a '''default namespace''' attribute, which "declares" its namespace, of the form:
which &ldquo;declares&rdquo; its namespace, of the form:
<p class="code">xmlns="URI"
<pre style="xmp">
</p>
    xmlns="URI"
</pre>
<!--??  xmlns="<i><b>URI</b></i>"-->
<!--??  xmlns="<i><b>URI</b></i>"-->
<br>
 
<li>Another form of default namespace declaration allows
<li>Another form of default namespace declaration allows
an element to disable any default namespace with:
an element to disable any default namespace with:
<pre>
<p class="code">xmlns=""
    xmlns=""
</p>
</pre>
 
<li>A namespace declaration is syntactically the same as an Attribute.
<li>A namespace declaration is syntactically the same as an Attribute.
<li>The scope of a non-default namespace declaration is the element containing it, its
<li>The scope of a non-default namespace declaration is the element containing it, its
attributes, and all descendant elements and their attributes, until another declaration
attributes, and all descendant elements and their attributes, until another declaration
of the prefix.
of the prefix.
<li>The scope of a default namespace declaration is the element containing it (but not
<li>The scope of a default namespace declaration is the element containing it (but not
the attributes of that element) and its descendant elements (but not their attributes),
the attributes of that element) and its descendant elements (but not their attributes),
Line 722: Line 688:
<li>The namespace URI associated with a name is
<li>The namespace URI associated with a name is
<ol>
<ol>
<li>the in-scope URI associated with the prefix of the name, if the name has
<li>the in-scope URI associated with the prefix of the name, if the name has a prefix
a prefix
 
<li>for element names,
<li>for element names, the in-scope default namespace URI, if the name does not have a prefix
the in-scope default namespace URI, if the name does not have a prefix
and there is a default namespace URI in scope
and there is a default namespace URI in scope
<li>no namespace URI, otherwise
<li>no namespace URI, otherwise
</ol>
</ol>
<li>Two names are identical if they have the same local name and either
<li>Two names are identical if they have the same local name and either
they both do not have a namespace URI or they both have the same namespace
they both do not have a namespace URI or they both have the same namespace URI.
URI.
</ul>
</ul>


====Uniform Resource Identifier syntax====
====Uniform Resource Identifier syntax====
The form of a valid string used as a URI is specified in IETF RFC2396
The form of a valid string used as a URI is specified in IETF RFC2396 (see http://www.faqs.org/rfcs/rfc2396.html).
(see http://www.faqs.org/rfcs/rfc2396.html) .
The rules are as follows:
The rules are as follows:
<ul>
<ul>
<li>Namespace URIs must be '''absolute''':
<li>Namespace URIs must be '''absolute''':
they must start with a non-null prefix (called a
they must start with a non-null prefix (called a
&ldquo;scheme&rdquo;), followed by a colon (:) and a non-null suffix.
"scheme"), followed by a colon (<tt>:</tt>) and a non-null suffix.
<li>The scheme must start
 
with a letter, which may be followed by any combination of letters, digits, and
<li>The scheme must start with a letter, which may be followed by any combination of letters, digits, and
the plus (+), hyphen (-), and period (.) characters.
the plus (<tt>+</tt>), hyphen (<tt>-</tt>), and period (<tt>.</tt>) characters.
<br>
 
<li>The suffix can contain any of
<li>The suffix can contain any of the following characters, in addition to letters and digits:
the following characters, in addition to letters and digits:
<p class="code">; (semicolon)                - (hyphen)
<pre>
/ (slash)                    _ (underscore)
    ; (semicolon)                - (hyphen)
? (question mark)            . (period)
    / (slash)                    _ (underscore)
&#58; (colon)                    ! (exclamation point)
    ? (question mark)            . (period)
@ (at sign)                  ~ (tilde)
    : (colon)                    ! (exclamation point)
& (ampersand)                * (asterisk)
    @ (at sign)                  ~ (tilde)
= (equal sign)              ' (apostrophe)
    & (ampersand)                * (asterisk)
+ (plus sign)                ( (open parenthesis)
    = (equal sign)              ' (apostrophe)
$ (dollar sign)              ) (close parenthesis)
    + (plus sign)                ( (open parenthesis)
, (comma)
    $ (dollar sign)              ) (close parenthesis)
</p>
    , (comma)
</pre>
   
   
The suffix can also contain:
The suffix can also contain:
<ul>
<ul>
<li>At most one number sign (#).
<li>At most one number sign (<tt>#</tt>).
<li>As of <var class="product">Sirius Mods</var> 7.2, a percent (%) character followed by two hex digits
 
<li>A percent (<tt>%</tt>) character followed by two hex digits
to escape some other character.
to escape some other character.
   
   
Line 770: Line 734:
<ul>
<ul>
<li>The hex digits A-F may be uppercase or lowercase.
<li>The hex digits A-F may be uppercase or lowercase.
<li>The hexadecimal values are not replaced when URI processing is performed.
<li>The hexadecimal values are not replaced when URI processing is performed.
<p>
For example, even though the ASCII code for the number &ldquo;4&rdquo; is
For example, even though the ASCII code for the number "4" is
hexadecimal 34, the following two URIs are different and distinct:
hexadecimal 34, the following two URIs are different and distinct:</p>
<pre>
<p class="code"><nowiki>http://my.URI.number4
    http://my.URI.number4
http://my.URI.number%34</nowiki>
    http://my.URI.number%34
</p>
</pre>
Thus, for instance, the following fragment:
Thus, for instance, the following fragment:
<pre>
<p class="code">%n = %d:AddElement('x', , <nowiki>'http://my.URI.number4')
    %n = %d:AddElement('x', , 'http://my.URI.number4')
    %n:AddElement('x', , 'http://my.URI.number%34')
        %n:AddElement('x', , 'http://my.URI.number%34')
%d:Print
    %d:Print
%d:SelectionPrefix('f') = 'http://my.URI.number4'</nowiki>
    %d:SelectionPrefix('f') = 'http://my.URI.number4'
Print %d:SelectCount('//f:x') And 'matching node(s)'
    Print %d:SelectCount('//f:x') And 'matching node(s)'
</p>
</pre>
Will have the following result:
Will have the following result:
<pre>
<p class="code"><nowiki><x xmlns="http://my.URI.number4">
    <x xmlns="http://my.URI.number4">
  <x xmlns="http://my.URI.number%34"/></nowiki>
      <x xmlns="http://my.URI.number%34"/>
</x>
    </x>
1 matching node(s)
    1 matching node(s)
</p>
</pre>
</ul>
</ul>
</ul>
</ul>
</ul>
</ul>
===Well-formed documents and validation===
===Well-formed documents and validation===
Before an XML document can be processed, its structure must match the
Before an XML document can be processed, its structure must match the rules expressed in the productions in
rules expressed in the productions in
[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]], along with
"[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]", along with
the extra rules alluded to in square brackets (for example, <code>[Unique Att]</code>,
the extra rules alluded to in square brackets (for example, <tt>[Unique Att]</tt>,
indicating that a single attribute name may not be given twice in the list of attributes for an element).
indicating that a single attribute name may not be given twice in the list
When the syntax is correct, including these rules, the document is called '''well-formed'''.
of attributes for an element).
When the syntax is correct, including these rules, the document is
called '''well-formed'''.
   
   
The XmlDoc API enforces the syntax rules of well-formed documents.
The XmlDoc API enforces the syntax rules of well-formed documents.
Line 810: Line 770:
In addition to this checking, an XML processor may also check to see that
In addition to this checking, an XML processor may also check to see that
the format of the document matches the structure and restrictions
the format of the document matches the structure and restrictions
declared for it in either
declared for it in either the Document Type Declaration or the document's Schema.
the Document Type Declaration or the document's Schema.
If the document matches the type structure and restrictions, it is called '''valid'''.
If the document matches the type structure and restrictions, it is
In the <i>W3C XML Recommendation</i>, this validation of a document is an optional feature of an XML processor.
called '''valid'''.
In the <i><b>W3C XML Recommendation</b></i>, this validation of a document is an optional feature of
an XML processor.
   
   
<!-- &NSCHVSN -->
<!-- &NSCHVSN -->
With the current version, the XmlDoc API does not validate the XML document.
With the current version, the XmlDoc API does not validate the XML document.
A later version will incorporate this feature.
Note that support of XML Schema is planned; Document Type Declarations
Note that support of XML Schema is planned; Document Type Declarations
have several shortcomings, including a limitation on the types of
have several shortcomings, including a limitation on the types of
constraints that can be placed on the document, a specialized baroque
constraints that can be placed on the document, a specialized baroque
syntax that doesn't conform to the element/attribute structure of
syntax that doesn't conform to the element/attribute structure of
XML, and incorporation of some features that have nothing to do with
XML, and incorporation of some features that have nothing to do with document validation.
document validation.
 
===Normalization during deserialization===
===Normalization during deserialization===
When an XML processor, in particular the XmlDoc API, parses an XML document from
When an XML processor, in particular the XmlDoc API, parses an XML document from
character form into an internal representation, it must make some transformations
character form into an internal representation, it must make some transformations of the document.
of the document.
The two most significant types of these transformations concern the following:
The two most significant types of these transformations concern the following:
<ul>
<ul>
Line 835: Line 790:
<li>Whitespace characters
<li>Whitespace characters
</ul>
</ul>
====Normalizing entity and character references====
====Normalizing entity and character references====
Entity and character references are replaced by their entity and character
Entity and character references are replaced by their entity and character counterparts before deserialization.
counterparts before deserialization.
For example, the entity reference <code>&amp;gt;</code> in the <code>content</code>
For example, the entity reference <tt>&amp;gt;</tt> in the <tt>content</tt>
of an element or in the <code>AttValue</code> of an Attribute, is handled exactly as if a greater-than symbol
of an element or in the <tt>AttValue</tt> of an Attribute,
(<tt>></tt>) occurred at that point in the document.
is handled exactly as if a greater-than symbol
Similarly, the character reference <code>&amp;#x5B;</code> is handled as if a left
(>) occurred at that point in the document.
square-bracket symbol (<tt>[</tt>) occurred at that point in the document.
Similarly, the character
reference <tt>&amp;#x5B;</tt> is handled as if a left
square-bracket symbol ( [ ) occurred at that point in the document.
   
   
This normalization occurs '''after''' whitespace normalization, which is
This normalization occurs '''after''' whitespace normalization, which is discussed in the next section.
discussed in the next section.
 
====Normalizing whitespace characters====
====Normalizing whitespace characters====
In the XML syntax, the whitespace characters are (in hexadecimal,
In the XML syntax, the whitespace characters are (in hexadecimal,
using ISO-10646 character codes):
using ISO-10646 character codes):
<dl>
<table class="thJustBold">
<dt>tab
<tr><th>tab</th>
<dd>x'09'
<td>x'09'</td></tr>
<dt>linefeed
<tr><th>linefeed</th>
<dd>x'0A'
<td>x'0A'</td></tr>
<dt>carriage return
<tr><th>carriage return</th>
<dd>x'0D'
<td>x'0D'</td></tr>
<dt>space
<tr><th>space</th>
<dd>x'20'
<td>x'20'</td></tr>
</dl>
</table>
In general, the whitespace characters can be used in the <tt>S</tt>
 
production (shown in
In general, the whitespace characters can be used in the <code>S</code> production (shown in
[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]),
[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]),
which must separate
which must separate many of the tokens in a document (for example, it must follow the element name, if the <code>STag</code>
many of the tokens in a document
contains an Attribute) and may optionally be used in many other places (for example, it may appear before or after the equal sign (<tt>=</tt>)
(for example, it must follow the element name, if the <tt>STag</tt>
contains an Attribute) and may optionally be used in many other
places
(for example, it may appear before or after the equal sign (=)
between an Attribute name and its value.
between an Attribute name and its value.
   
   
The interplay of three factors determine the normalization of whitespace
The interplay of three factors determine the normalization of whitespace characters during deserialization:
characters during deserialization:
<ul>
<ul>
<li>The <i><b>W3C XML Recommendation</b></i> specifies two normalizing transformations of whitespace:
<li>The <i>W3C XML Recommendation</i> specifies two normalizing transformations of whitespace:
<ol>
<ol>
<li>When a special combination of
 
line-end characters &mdash; carriage return and linefeed &mdash;
<li>When a special combination of line-end characters &mdash; carriage return and linefeed &mdash; occur '''anywhere'''
occur '''anywhere'''
in an XML document, they are replaced by a single linefeed character.
in an XML document, they are replaced by a single linefeed character.
Also, carriage returns not followed by a linefeed are
Also, carriage returns not followed by a linefeed are replaced by a single linefeed character.
replaced by a single linefeed character.
 
<li>When any whitespace character appears in the value of an attribute,
<li>When any whitespace character appears in the value of an attribute, it is replaced by a single space character.
it is replaced by a single space character.
</ol>
</ol>
   
   
The XmlDoc API always applies these transformations, and
The XmlDoc API always applies these transformations, and the following two sub-sections describe them in more detail.
the following two sub-sections describe them in
<li>In addition to the XML standard whitespace transformations, the XmlDoc API deserialization methods offer options to
more detail.
control normalization of whitespace characters that occur in the <code>content</code> of an element.
<li>In addition to the XML standard whitespace transformations,
Those options are described in these pages:
the XmlDoc API deserialization methods offer options to
control normalization of whitespace characters that
occur in the <tt>content</tt> of an element.
Those options are described in these sections:
<ul>
<ul>
<li>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]
<li>[[LoadXml (XmlDoc/XmlNode function)]]
<li>[[WebReceive (XmlDoc function)|WebReceive]]
<li>[[WebReceive (XmlDoc function)]]
</ul>
</ul>
<li>The XmlDoc API deserialization (and serialization) methods
 
honor the <tt>xml:space</tt> attribute:
<li>The XmlDoc API deserialization (and serialization) methods honor the <code>xml:space</code> attribute:
After the XML standard whitespace transformations,
After the XML standard whitespace transformations, any whitespace within the scope of <code>xml:space="preserve"</code>
any whitespace within the scope of <tt>xml:space="preserve"</tt>
is retained as is, regardless of the whitespace-handling option in effect for the deserialization method.
is retained as is, regardless of
Elements that are in the scope of <code>xml:space="default"</code> have whitespace handled
the whitespace-handling option in effect for the deserialization method.
Elements that are in the scope of <tt>xml:space="default"</tt>
have whitespace handled
according to the whitespace-handling option in effect for the deserialization.
according to the whitespace-handling option in effect for the deserialization.
The individual method descriptions cited above have more information.
The individual method descriptions cited above have more information.
</ul>
</ul>
=====Normalized line-end=====
=====Normalized line-end=====
As specified in &ldquo;2.11 End-of-Line Handling&rdquo; of the <i><b>W3C XML Recommendation</b></i>,
As specified in "2.11 End-of-Line Handling" of the <i>W3C XML Recommendation</i>,
all instances of a carriage return character followed by a linefeed character
all instances of a carriage return character followed by a linefeed character (CR-LF sequence),
(CR-LF sequence),
as well as all instances of a carriage return not followed by a linefeed,
as well as all instances of a carriage return not followed by a linefeed,
are converted to a single linefeed character.
are converted to a single linefeed character.
Line 919: Line 858:
This behavior only applies to deserialization: there is no modification
This behavior only applies to deserialization: there is no modification
of whitespace characters in values passed as the <i><b>value</b></i>
of whitespace characters in values passed as the <i><b>value</b></i>
argument of the XmlDoc API Add* and Insert* methods
argument of the XmlDoc API Add* and Insert* methods that allow a value argument.
that allow a value argument.
Therefore the values of the <code>FOO1</code> and <code>FOO2</code> elements
Therefore the values of the &ldquo;FOO1&rdquo; and &ldquo;FOO2&rdquo; elements
created by the <var>LoadXml</var> (deserialization) and <var>AddElement</var> invocations below are different:
created by the LoadXml (deserialization) and AddElement invocations below are different:
<p class="code">&#42; Get EBCDIC carriage return and linefeed:
<pre>
%cl = $X2C('0D25')
    * Get EBCDIC carriage return and linefeed:
    %cl = $X2C('0D25')
   
   
    * This Element value is linefeed:
&#42; This Element value is linefeed:
    %node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>')
%node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>')
   
   
    * This Element value is carriage return and linefeed:
&#42; This Element value is carriage return and linefeed:
    %node:AddElement('FOO2', %cl)
%node:AddElement('FOO2', %cl)
</pre>
</p>
   
   
Also, the normalization applies to the characters in the input
Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution.
serialized string, not the values after entity substitution.
Therefore the values of <code>FOO1</code> and <code>FOO2</code> created by the following two <var>LoadXml</var> invocations are different:
Therefore the values of &ldquo;FOO1&rdquo; and &ldquo;FOO2&rdquo; created by the following two
<p class="code">&#42; Get EBCDIC carriage return and linefeed:
LoadXml invocations are different:
%cl = $X2C('0D25')
<pre>
    * Get EBCDIC carriage return and linefeed:
    %cl = $X2C('0D25')
   
   
    * Element value is linefeed:
&#42; Element value is linefeed:
    %doc:LoadXml('<FOO1>' With %cl With '</FOO1>')
%doc:LoadXml('<FOO1>' With %cl With '</FOO1>')
   
   
    %doc = New
%doc = New
    * Element value is carriage return and linefeed
&#42; Element value is carriage return and linefeed
    * (note, character references are ISO-10646):
&#42; (note, character references are ISO-10646):
    %doc:LoadXml('<FOO2>&amp;#x0D;&amp;#x0A;' With '</FOO2>')
%doc:LoadXml('<FOO2>&amp;#x0D;&amp;#x0A;' With '</FOO2>')
</pre>
</p>
   
   
Linefeed characters not removed by the normalization described above
Linefeed characters not removed by the normalization described above
and belonging to the Text node child of an element
and belonging to the Text node child of an element
(but not in any other type of node) can further be affected by the
(but not in any other type of node) can further be affected by the
whitespace-handling options of
whitespace-handling options of <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var> and <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>.
[[LoadXml (XmlDoc/XmlNode function)|LoadXml]] and [[WebReceive (XmlDoc function)|WebReceive]].
 
=====Normalized attribute value=====
=====Normalized attribute value=====
After replacing all CR-LF sequences, and all other CR instances,
After replacing all CR-LF sequences, and all other CR instances, by LF (as described in [[#Normalized line-end|Normalized line-end]]),
by LF (as described in [[#Normalized line-end|Normalized line-end]]),
attribute values have additional whitespace normalization.
attribute values have additional whitespace normalization.
As specified in &ldquo;3.3.3 Attribute-Value Normalization&rdquo; of the <i><b>W3C XML Recommendation</b></i>,
As specified in "3.3.3 Attribute-Value Normalization" of the <i>W3C XML Recommendation</i>,
after the CR-LF normalization, every instance of a
after the CR-LF normalization, every instance of a whitespace character (tab and linefeed)
whitespace character (tab and linefeed)
in an attribute value is converted to a space character.
in an attribute value is converted to a space character.
Leading and trailing spaces
Leading and trailing spaces are not stripped, nor are sequences of multiple spaces collapsed.
are not stripped, nor are sequences of multiple spaces
collapsed.
   
   
This behavior only applies to deserialization; that is, there is no modification
This behavior only applies to deserialization; that is, there is no modification
of whitespace characters in attribute values passed as the <i><b>value</b></i>
of whitespace characters in attribute values passed as the <var class="term">value</var>
argument of the [[AddAttribute (XmlNode function)|AddAttribute]] function..
argument of the <var>[[AddAttribute (XmlNode function)|AddAttribute]]</var> function.
Therefore the values of the &ldquo;FOO&rdquo; attribute created by the following two
Therefore the values of the <code>FOO</code> attribute created by the following two methods are different:
methods are different:
<p class="code">&#42; Get EBCDIC carriage return:
<pre>
%c = $X2C('0D')
    * Get EBCDIC carriage return:
    %c = $X2C('0D')
   
   
    * Attribute value is space:
&#42; Attribute value is space:
    %doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>')
%doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>')
   
   
    * Attribute value is carriage return:
&#42; Attribute value is carriage return:
    %doc:AddAttribute('FOO', %c, '/*/*')
%doc:AddAttribute('FOO', %c, '/*/*')
</pre>
</p>
   
   
Also, the normalization applies to the characters in the input
Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution.
serialized string, not the values after entity substitution.
Therefore the values of the <code>FOO</code> attribute created by the following two <var>LoadXml</var> invocations are different:
Therefore the values of the &ldquo;FOO&rdquo; attribute created by the following two
<p class="code">&#42; Get EBCDIC carriage return:
LoadXml invocations are different:
%c = $X2C('0D')
<pre>
    * Get EBCDIC carriage return:
    %c = $X2C('0D')
   
   
    * Attribute value is space:
&#42; Attribute value is space:
    %doc:LoadXml('<top FOO="' With %C With '"/>')
%doc:LoadXml('<top FOO="' With %C With '"/>')
    %doc = New
%doc = New
   
   
    * Attribute value is carriage return - note CR
&#42; Attribute value is carriage return - note CR
    * is the same in EBCDIC and ISO-10646:
&#42; is the same in EBCDIC and ISO-10646:
    %doc:LoadXml('<top FOO="#x0D;"/>')
%doc:LoadXml('<top FOO="#x0D;"/>')
</pre>
</p>
'''Note:'''
 
Whitespace in an attribute (and in any type of node other than
<p class="note">'''Note:''' Whitespace in an attribute (and in any type of node other than
a Text node child of an element) is '''not''' affected by the
a Text node child of an element) is '''not''' affected by the
whitespace-handling options of [[LoadXml (XmlDoc/XmlNode function)|LoadXml]],
whitespace-handling options of <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>,
[[WebReceive (XmlDoc function)|WebReceive]], and [[ParseXml (HttpResponse function)|ParseXml]].
<var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, and <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>. </p>


===Language identification===
===Language identification===
From the <i><b>W3C XML Recommendation</b></i>:
From the <i><b>W3C XML Recommendation</b></i>:
&ldquo;A special attribute named xml:lang may be inserted in documents to
"A special attribute named xml:lang may be inserted in documents to
specify the language used in the contents
specify the language used in the contents
and attribute values of any element in an XML document.&rdquo;
and attribute values of any element in an XML document."
In versions of ''Janus SOAP'' prior to 6.8, the <tt>xml:lang=".."</tt>
attribute was accepted regardless of its value.
As of version 6.8, the only valid values of such attributes are
the language identifier tags specified in IETF RFC 3066
(http://www.w3.org/TR/REC-xml/#RFC1766).
   
   
The only valid values of the <code>xml:lang=".."</code> attribute that <var class="product">Janus SOAP</var> accepts are the language identifier tags specified in IETF RFC 3066 (http://www.w3.org/TR/REC-xml/#RFC1766).
==References==
==References==
As mentioned, the XML support in ''Janus SOAP'' is heavily oriented to the concepts and facilities defined by
As mentioned, the XML support in <var class="product">Janus SOAP</var> is heavily oriented to the concepts and facilities defined by
the XML standards.
the XML standards.
There are two key aspects of XML that application developers should understand at an appropriate level of detail:
There are two key aspects of XML that application developers should understand at an appropriate level of detail:
Line 1,036: Line 957:
<td>By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML.
<td>By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML.
<p>
<p>
For XML programming using ''Janus SOAP'' or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards. </p></td></tr>
For XML programming using <var class="product">Janus SOAP</var> or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards. </p></td></tr>
<tr><td>XML background</td>
<tr><td>XML background</td>
<td>http://www.w3.org/XML/1999/XML-in-10-points</td></tr>
<td>http://www.w3.org/XML/1999/XML-in-10-points</td></tr>
Line 1,054: Line 975:
This section lists some of the XML-related standards documents that are available.
This section lists some of the XML-related standards documents that are available.
   
   
The World Wide Web Consortion (or &ldquo;W3C&rdquo;) is the body that creates the XML
The World Wide Web Consortion (or "W3C") is the body that creates the XML
standards, along with other Internet standards, such as HTML, XHTML, and HTTP.
standards, along with other Internet standards, such as HTML, XHTML, and HTTP.
The term &ldquo;Recommendation,&rdquo; in W3C parlance, means that the
The term "Recommendation," in W3C parlance, means that the
standard has been approved by the W3C.
standard has been approved by the W3C.
   
   
Line 1,062: Line 983:
date on which that status was achieved,
date on which that status was achieved,
and the URL that can be used to obtain the document:
and the URL that can be used to obtain the document:
<table>
<table class="thJustBold">
<tr><th nowrap>Extensible Markup Language (XML) 1.0 (Third Edition) </th>
<tr><th nowrap>Extensible Markup Language (XML) 1.0 (Third Edition) </th>
<td>W3C Recommendation 04 February 2004: <br>http://www.w3.org/TR/REC-xml
<td>W3C Recommendation 04 February 2004: <br>http://www.w3.org/TR/REC-xml
<p>  
<p>  
This is referred to as the <i><b>W3C XML Recommendation</b></i> throughout this article. </p></td></tr>
This is referred to as the <i>W3C XML Recommendation</i> throughout this article. </p></td></tr>
<tr><th>Namespaces spec </th>
<tr><th>Namespaces spec </th>
<td>http://www.w3.org/TR/REC-xml-names
<td>http://www.w3.org/TR/REC-xml-names
Line 1,093: Line 1,014:


[[Category:Overviews]]
[[Category:Overviews]]
[[Category:Janus SOAP]]

Latest revision as of 19:17, 13 May 2016

Janus SOAP provides SOUL programmers with a substantial set of facilities for processing eXtensible Markup Language (XML) documents. Among other benefits, this enables rich and automated Web services based on a shared and open Web infrastructure. The design of this XML support is based on various standards, such as XML and XPath. Many sections in this article refer to these and other standards, for example, Simple Object Access Protocol (SOAP). However, it is important to recognize:

Janus SOAP enables you to process any XML document, whether or not you are using SOAP messages and envelopes.

XML support is provided in two disjoint sets of classes in Janus SOAP:

XmlDoc API
The methods in these classes allow you to convert a character stream XML document into an internal format (an XmlDoc object) or to programmatically create an XmlDoc, to access and modify an XmlDoc, and to convert an XmlDoc into a character stream XML document.
XmlParser API
This set of classes provides for event-based extraction of information from an XML document in its character stream form. This can be beneficial when only a relatively small part of the XML document is to be processed.

Standards relevant to Janus SOAP XML facilities

eXtensible Markup Language (XML)

XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can be used for structuring almost any kind of data. Although the word "markup" reveals that the roots of XML are from document processing, and indeed the outermost entity in XML is called a "document," XML is ideally suited to structuring almost any kind of data that is exchanged between or within applications, particularly (although by no means exclusively) if they are communicating on a network.

The syntax of XML provides for hierarchical structuring of data (again, the outer entity is called a document) into the principle type called an element. Elements and the other components of an XML document are described in XML.

One of the reasons that XML is so powerful is that there is no fixed vocabulary for XML documents. Every XML document can have its own set of names (subject to the rules for the characters that may occur in a name). Additionally, no structure is dictated for an XML document, except that it have a single top-level element and other elements must be completely contained within their parent elements. These characteristics allow XML to represent an extremely wide range of types of data very effectively.

An XML document can be considered an abstract object: when XML is used for interchange between applications, it is usually "serialized", or transmitted, completely in character form. The advantage of this is that it is human-readable and can be conveniently viewed using a generic XML editor, both of which can be huge benefits for debugging. Additionally, standard network protocols can be used to exchange documents between a wide variety of applications on a wide variety of platforms. As the World Wide Web has demonstrated, using characters as the basis for information interchange is extremely powerful and flexible.

Beyond these core properties which make XML very attractive for structuring data, it has become the basis for a large family of standards. Often these standards are referred to as the XML "family," in part because they are managed by the XML Working Group of the W3C. Some of these important standards are XML Schema, XML Stylesheet Transformations, XML Query, and Web Services Description Language (WSDL). See http://www.w3c.org for more information about these and other standards related to XML.

Quoting from XML in a Nutshell (2nd ed) (see References):

XML offers the tantalizing possibility of truly cross-platform, long term

data formats. ... XML delivers portable data. In many ways, XML is the most portable ... format designed since the ASCII text file.

You can use XML strictly as an internal data structure in your application, or in Model 204 files, or with operating system files, or with other programs using some communication mechanism. The simple, character-based format of XML enhances such communication. You can communicate with the Web (HTTP), either as a server application (for example, using Janus Web Server) or making client XML requests (for example, using Janus Sockets HTTP Helper). You can use native Model 204 IODEV communication facilities, or Model 204 MQ Series, or any facility that can send and receive streams of characters.

Simple Object Access Protocol (SOAP)

The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports the exchange of structured information between Web-based applications. SOAP employs XML to serialize the objects passed between applications. SOAP can be used in combination with a variety of existing firewall-friendly Internet protocols and formats including HTTP, SMTP, and MIME. SOAP supports a wide range of application paradigms, from messaging systems to Remote Procedure Call (RPC).

SOAP is an excellent standard for information exchange between applications, so good that it is the reason for the name Janus SOAP. It is important to recognize the following, however: Janus SOAP enables you to process any XML document, whether or not you are using SOAP messages and envelopes.

In fact, with the current version, although you can readily process formal SOAP messages, there are no features specially oriented toward that: all features are generalized for handling any kind of XML document. Later versions will add more functionality to incorporate the standard processing of SOAP messages, so your application will only need to deal with the application-specific parts of the messages.

Example SOAP request

This example SOAP message is a request to a SOAP server:

POST /StockQuote HTTP/1.1 Host: www.stockquoteserver.com Content-Type: text/xml; charset="utf-8" Content-Length: nnnn SOAPAction: "Some-URI" <SoapEnv:Envelope xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/" SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SoapEnv:Body> <m:GetLastTradePrice xmlns:m="http://sirius-software.com/samp/JSOAP/1"> <symbol>EMC</symbol> </m:GetLastTradePrice> </SoapEnv:Body> </SoapEnv:Envelope>

Example SOAP response

This example SOAP message could be a response to the above message:

HTTP/1.1 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: nnnn <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/> <SOAP-ENV:Body> <m:GetLastTradePriceResponse xmlns:m="Some-URI"> <Price>34.5</Price> </m:GetLastTradePriceResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

XML Path Language (XPath) in the XmlDoc API

XPath is a language designed specifically to select nodes from an XML document. It is very powerful, yet it is based on familiar syntax that mimics an XML document's hierarchy. XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes on which to operate. It is a key component of XSLT, XPointer, and XLink, and it has a common foundation with XML Query.

An introduction to the use of XPath is provided in An example of XmlDoc methods and XPath; a more complete description of XPath is contained in XPath.

XML

As explained above, XML provides the basis for a large number of varied standards. This section introduces the W3C XML Recommendation, that is, the XML standard. It gives you basic information about XML, explaining some of the concepts using the XmlDoc API (that is, the methods of the XmlDoc, XmlNodelist, and XmlNode classes). This approach gives you concrete examples which you can try in SOUL, and which may make the abstract concepts easier to understand.

The syntax of XML provides for:

  • Hierarchical structuring of data (the outer object is called a document) into elements.

    An element has a name, which need not be unique within the document. An element can have any number of attributes, each of which has a name (which must be unique within that element — but not within the document) and a value. Within an element can be a series of values and ("sub-") elements, which provides XML with its hierarchical nature.

  • Assigning unique identifiers to elements; this provides even more structuring possibilities than simple hierarchy.

    These identifiers are implemented with the element type definition features provided with either Document Type Declarations or with XML Schema. Element type definitions are omitted from our XML documentation; they are not supported in the current version.

An XML document has exactly one outer, or "top-level," element, and this element contains, as descendants, any other elements that may be in the document.

In addition to the data contained in elements and attributes, any number of comments may appear wherever an element may appear. There is also a component called a processing instruction, or PI, which is effectively a comment that has a name.

All names (element names, attribute names, entity references, and PI targets) are case-sensitive; for example, a less-than symbol (<) can be included in an attribute value if you use the characters &lt; — but not if you use &LT; or &Lt;.

The rest of this section explains the syntax of XML and various rules for XML documents, according to the W3C XML Recommendation (as mentioned in References, this includes both the XML specification per se, and the XML Namespaces specification). In (XML syntax, below) and elsewhere as appropriate, you will find comments about limitations imposed by the XmlDoc API on the W3C XML Recommendation.

XML example

The next example illustrates the major components of an XML document. The formatting into separate, indented lines is provided for readability, but it is not significant for this and for most business data exchange applications. The letter labels on the left are not part of the document; they are for the explanation which follows:

X: <?xml version='1.1'?> A: <!-- Purchase order follows --> B: <purchase_order> C: <memo>Dave's order was "late"</memo> D: <?program-version 4.1?> E: <pitm> <partID>1234</partID> F: <price per="12" amt="1.280"/> <qty>36</qty> G: </pitm> H: <pitm> I: <price amt=".29"></price> <partID>5678</partID> <qty>2</qty> </pitm> </purchase_order>

In the following explanation of each of the labeled lines above, references of the form [cnn], like [B22], are to productions in Syntax of document, element, Attribute, Comment, PI below.

X: <?xml version='1.1'?>

The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which is the set of components preceding the top-level element. If XMLDecl is present it must:

  • Be the first markup in the document (only whitespace may precede it).
  • Specify at least the version (as of version 7.5 of the Sirius Mods, "1.0" and "1.1" are the only valid versions).
The clauses in XMLDecl are positional, that is, they must be given in the order shown in the syntax.
A: <!-- Purchase order follows -->

This is a comment at top-level. [A1], [B22], and [D27] allow zero or more comments and PIs before and after the top-level element.

B: <purchase_order>

This is the element start-tag or STag ([G40]) of the top-level element ([A1]).

C: <memo>Dave's order was "late"</memo>

With "leaf" elements (known in XML Schema as elements with simple content), that is, if the only thing between the STag and Etag is CharData ([P14]), you can usually implement the information either as an element (text) or as an attribute of the parent element. This text example highlights one small distinction, namely that AttValue ([M10]) has less flexibility:

  • If the value includes both apostrophes and quotation marks, either the apostrophes or the quotes must be escaped.
  • CharData not only allows quotes and apostrophes, but it also allows CDSect [Q18].
D: <?program-version 4.1?>

This is a PI [V16]. Presumably the name (actually, the target) "program-version" is used by the application reading this document.

E: <pitm>

This is the STag of an element which is contained within another element and which contains child elements; this allows you to group elements together.

F: <price per="12" amt="1.280"/>

This is an example of the EmptyElemTag ([I44]), which can be useful if an element contains no data (just the name can be meaningful to the application), or if it only contains data using attributes.

G: </pitm>

This is the ETag [H42] of an element. The name must exactly match the STag for the element (again, XML is case sensitive).

H: <pitm>

Here is another STag of an element; it is the "sibling" of another with the same name. The ability to have sub-elements and the ability to repeat elements with the same name in a given parent element are the important data modeling distinctions between elements and attributes.

I: <price amt=".29"></price>

Note that not all instances of a given element type (the price item is an element type) must have the same attributes, nor must they have the same sub-structure. Also, these are optional:

  • Whether an element has content.
  • Whether to use an STag immediately followed by an ETag (as is done here) or to use the EmptyElemTag (as is done above in item F).

XML syntax

This section contains a version of the XML syntax. It is taken from the W3C XML Recommendation, which is the authoritative reference:

http://www.w3.org/TR/REC-xml

The syntax below has been changed from the standard in these ways:

  • The only structure in the XML syntax not supported in the current version is the Document Type Declaration, or DTD, ("<!DOCTYPE...>"). Although a DTD can be tolerated if you use the DTD_IGNORE option of the deserialization functions (LoadXml, WebReceive, and ParseXml) — the information contained in the DTD is not used nor made available to the SOUL program. Reflecting the absence of support for DTD, the productions in the syntax that follows are altered to remove those parts of an XML document introduced in the DTD.

    Note: Much of the functionality of document type declarations may be better provided using XML Schema, which is planned for a future version.

  • The Char, Name, NameStartChar, and NameChar productions are taken from the XML 1.1 recommendation. As explained in Char and Reference, only characters representable in 8-bit EBCDIC were handled prior to Sirius Mods version 7.6, so fewer characters were supported in the production for Char ([CA2]) in earlier Sirius Mods releases.
  • The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version 7.7, the maximum was 100).
  • The productions are re-ordered (to make it easier to read the grammar), and letters are added before them, so when [B22] is referred to in the text, you know that this is between [Ann] and [Cnn] in this grammar, and this is production [22] for the same non-terminal (in this case, prolog) in the W3C XML Recommendation.

The conventions used are:

'yyy' or "yyy" Enclosed item, yyy, must appear exactly as shown.
#xnn Specifies the character (in ISO-10646) with code value nn.

For example, #x09 #x0D #x0A #x20 specify the tab, carriage return, linefeed, and space characters, respectively.

[^abc] Specifies any character except a, b, or c.
[chars] Specifies any character within the set chars, where chars can be the concatenation of these sets:
  • y, meaning the single character y
  • y-z, meaning characters in the range from y to z, inclusive
The resulting set of chars is the union of the specified sets.
set1 - set2 ("-" not enclosed in [...]) The set of strings described by set1, with the set of strings described by set2 removed.
| Separates alternatives.
? Follows an optional item.
* Follows an item that can occur any number of times (even not at all).
+ Follows an item that can occur one or more times.
(abc) (parentheses) Groups items.
[rule] ("to the right") Marks an additional syntax rule.
/*comment*/ Marks a comment.

The syntax is shown in three sections:

  1. The major components
  2. The productions that describe individual characters
  3. The components of the "XML Declaration" (<?xml version=...?>)

Syntax of document, element, Attribute, Comment, PI

[A1] document  ::= (prolog element Misc*) (Char* RestrictedChar Char*) [B22] prolog  ::= XMLDecl? Misc* [C23] XMLDecl  ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [D27] Misc  ::= Comment | PI | S [E3] S  ::= (#x20 | #x9 /* Whitespace */ | #xD | #xA)+ [F39] element  ::= STag content ETag [Element Type Match] | EmptyElemTag [G40] STag  ::= '<' Name (S Attribute)* S? '>' [Unique Att] [H42] ETag  ::= '</' Name S? '>' [I44] EmptyElemTag  ::= '<' Name (S Attribute)* S? '/>' [Unique Att] [NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] [NC] NameChar  ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] [NA] Name  ::= NameStartChar (NameChar)*

Within an XML document, the maximum length of a name (for example, each of the prefix part the the local part of an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7, the maximum length was 100 characters). Element and attribute names are also subject to restrictions related to XML Namespaces; see Name and namespace syntax.

[L41] Attribute  ::= Name Eq AttValue [M10] AttValue  ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [N25] Eq  ::= S? '=' S? [O43] content  ::= CharData? ( (element | Reference | CDSect | PI | Comment) CharData? )* [P14] CharData  ::= [^<&]* - ([^<&]* ']]>' [^<&]*) [Q18] CDSect  ::= CDStart CData CDEnd [R19] CDStart  ::= '<![CDATA[' [S20] CData  ::= (Char* - (Char* ']]>' Char*)) [T21] CDEnd  ::= ']]>' [U15] Comment  ::= '<!--' ( (Char - '-') | ('-' (Char - '-')) )* '-->' [V16] PI  ::= '<?' PITarget (S (Char* (Char* '?>' Char*) ))? '?>' [W17] PITarget  ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))

Char and Reference

[CA2] Char  ::= [#x1-#xD7FF] | [#xE000-#xFFFD] [CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F] [CB67] Reference  ::= EntityRef | CharRef [CD68] EntityRef  ::= '&' Name ';' [CC66] CharRef  ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' [Legal Char]

ISO-10646 and EBCDIC characters
  • Through Sirius Mods version 7.5, XmlDocs were maintained in EBCDIC, and production [CA2] above did not allow the full range of ISO-10646 characters shown in the W3C XML Recommendation. (ISO-10646 is the standard for the universal character set, also known as Unicode.) The XmlDoc API might have rejected an XML document because it contained an ISO-10646 character that could not be represented in EBCDIC. As of Sirius Mods version 7.6, XmlDocs are maintained in Unicode as supported by the Sirius Mods. This is why production [CA2] shows that no Unicode characters greater then U+FFFD are allowed. In addition, deserialization (with default options) of an XML document fails if the document contains a Unicode character that is not translatable to EBCDIC. The AllowUntranslatable option of the deserialization methods lets you circumvent this restriction. The null character (#x0), normally restricted, is allowed in an XML document if the XmlDoc's AllowNull property is set to True.

    Note: Using the standard translation table provided with Sirius Mods versions prior to 7.3, many EBCDIC characters (such as X'FF'), in addition to the "control characters" that were explicitly prohibited, were not legal XML characters because they did not translate to any Unicode character.

    In Sirius Mods version 7.3, the standard translation table was modified significantly. For more information about supported characters and character translation issues as of version 7.3, see Support for the ASCII subset of Unicode and Corrected translations between ASCII/Unicode and EBCDIC.

  • As stated in Transport: sending and receiving XML, UTF-8, UTF-16, and ISO-8859-x encodings are accepted (note that these must be given in all-capital letters within the XML declaration).
  • XPath comparisons are performed using Unicode. As of version 7.3, it is the only type of ordered character comparison. Prior to Sirius Mods version 7.3, this is the default type of comparison performed, and could be controlled by the (now obsolete) XPathOrder property.
Entity references
  • One purpose of an EntityRef is to allow a sequence of characters that may be illegal in a particular context of an XML document. For example, within an element's content, the string ]]> is not allowed, so you may replace the greater-than symbol (>) with either its character code in a CharRef, or with the predefined entity &gt;:

    ]]&gt;

    A Reference (EntityRef or CharRef) is allowed only in an element's content ([O43]) or in AttValue ([M10]).

  • There is a facility for defining your own entities in a DTD, but since DTDs are not supported in Janus SOAP, the only entity references supported are the five predefined entities:
    &amp; ampersand (&)
    &apos; apostrophe (')
    &gt; greater than (>)
    &lt; less than (<)
    &quot; double quotation mark (")
    &lsqb;
    &rsqb;
    left and right square brackets ([ ])
    (as of Model 204 7.6)

    Note: You can use any of the XHTML entities (listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2) to represent Unicode characters when converting from EBCDIC to Unicode. Character decoding must be in effect, however: you must be using the U constant function or the CharacterDecode=True argument on the EbcdicToUnicode function.

    You can load into an XmlDoc a character represented by such an entity if you decode the entity reference before the character is processed by one of the XmlDoc API deserializing or direct storage methods.

Components of XMLDecl

[XA24] VersionInfo  ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') [XB26] VersionNum  ::= ([a-zA-Z0-9_.:] | '-')+ [XC80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) [XD81] EncName  ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Only Latin chars */ [XE32] SDDecl  ::= S 'standalone' Eq ( ("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"') )

Names and namespaces

XML documents are allowed to contain elements and attributes that are defined by one organization, as well as other elements and attributes that are defined by another organization. In order to achieve this organizational "merging," the XML Namespaces Recommendation (http://www.w3.org/TR/REC-xml-names) provides for a way to qualify these merged names so that they will not conflict.

Also, the Namespaces Recommendation provides a way for an application to examine, in effect, the "defining organization" of a name in an XML document, so that various properties can be inferred, and names from the same "organization" can be grouped together.

Conceptually, the Namespaces Recommendation qualifies a name with a Uniform Resource Identifier (URI). There are various rules for various types of URIs; one familiar type is the same as URLs on the World Wide Web, such as:

http://www.w3.org/2001/XMLSchema

The important aspect of a URI, as far as the names in an XML document are concerned, is simply that it is a unique string for the names that are associated with it.

The characters that are valid in a URI (shown in Uniform Resource Identifier syntax) exceed the set of characters that are valid in an XML name. Therefore, the technique employed for XML Namespace qualification is to use a special kind of attribute — one that begins with "xmlns" — to associate a name prefix with a URI. Then attaching a prefix to a name effectively attaches the URI to a name.

The syntax for making this association, the namespace declaration, is explained in the next section.

Name and namespace syntax

The W3C XML Recommendation syntax rule for names is shown in Syntax of document, element, Attribute, Comment, PI (and repeated below) as the Name ([NA]), NameStartChar ([NSC]), and NameChar ([NC]) productions. The XML Namespaces Recommendation provides additional rules for Element and Attribute names (but not for PI targets). From the Namespaces Recommendation, element and attribute names are both instances of QName:

[NSC] NameStartChar  ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] [NC] NameChar  ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] [NA] Name  ::= NameStartChar (NameChar)* [NB5] NCName  ::= (NameStartChar - ':') (NameChar - ':')* [NC6] QName  ::= (Prefix ':')? LocalPart [ND7] Prefix  ::= NCName [NE8] LocalPart  ::= NCName

Although the W3C XML Recommendation does not require that attribute and element names follow the XML Namespaces Recommendation, the operation of XPath requires it. Therefore, since XPath is so important for the XmlDoc API, its default operating mode is to require Namespaces conformance in the XML document. See the Namespace property.

The restrictions and changes to the XML Recommendation are as follows:

  • The NameStartChar and NameChar productions are taken from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) . Starting with version 7.6 of the Sirius Mods, XmlDocs are maintained in Unicode, as supported by the Sirius Mods. That support excludes characters encoded in more than two bytes, so production [NSC], above, shows no Unicode characters greater than U+FFFD. By default, deserialization of an XML document fails if the document contains a Unicode character that is not translatable to EBCDIC. The AllowUntranslatable argument of the deserialization methods lets you circumvent this restriction.
  • A name can have at most one colon (:), which separates the name into a non-null prefix and a non-null local name.
  • A name without a prefix is simply a local name.
  • The prefix, if any, must be associated with a namespace URI using an attribute of the form:

    xmlns:prefix="URI"

    For example, all elements (and attributes of those elements) within the content of the definitions element below can use the prefix "xsd" to qualify their names to belong to the "http://www.w3.org/2001/XMLSchema" namespace:

    <definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema"> ... content of definitions element ... </definitions>

  • The prefix xml is bound to the namespace URI http://www.w3.org/XML/1998/namespace. Neither can be used without the other.
  • An element can also have a default namespace attribute, which "declares" its namespace, of the form:

    xmlns="URI"

  • Another form of default namespace declaration allows an element to disable any default namespace with:

    xmlns=""

  • A namespace declaration is syntactically the same as an Attribute.
  • The scope of a non-default namespace declaration is the element containing it, its attributes, and all descendant elements and their attributes, until another declaration of the prefix.
  • The scope of a default namespace declaration is the element containing it (but not the attributes of that element) and its descendant elements (but not their attributes), until the occurrence of another default declaration.
  • The namespace URI associated with a name is
    1. the in-scope URI associated with the prefix of the name, if the name has a prefix
    2. for element names, the in-scope default namespace URI, if the name does not have a prefix and there is a default namespace URI in scope
    3. no namespace URI, otherwise
  • Two names are identical if they have the same local name and either they both do not have a namespace URI or they both have the same namespace URI.

Uniform Resource Identifier syntax

The form of a valid string used as a URI is specified in IETF RFC2396 (see http://www.faqs.org/rfcs/rfc2396.html). The rules are as follows:

  • Namespace URIs must be absolute: they must start with a non-null prefix (called a "scheme"), followed by a colon (:) and a non-null suffix.
  • The scheme must start with a letter, which may be followed by any combination of letters, digits, and the plus (+), hyphen (-), and period (.) characters.
  • The suffix can contain any of the following characters, in addition to letters and digits:

    ; (semicolon) - (hyphen) / (slash) _ (underscore) ? (question mark) . (period) : (colon)  ! (exclamation point) @ (at sign) ~ (tilde) & (ampersand) * (asterisk) = (equal sign) ' (apostrophe) + (plus sign) ( (open parenthesis) $ (dollar sign) ) (close parenthesis) , (comma)

    The suffix can also contain:

    • At most one number sign (#).
    • A percent (%) character followed by two hex digits to escape some other character. In this case:
      • The hex digits A-F may be uppercase or lowercase.
      • The hexadecimal values are not replaced when URI processing is performed.

        For example, even though the ASCII code for the number "4" is hexadecimal 34, the following two URIs are different and distinct:

        http://my.URI.number4 http://my.URI.number%34

        Thus, for instance, the following fragment:

        %n = %d:AddElement('x', , 'http://my.URI.number4') %n:AddElement('x', , 'http://my.URI.number%34') %d:Print %d:SelectionPrefix('f') = 'http://my.URI.number4' Print %d:SelectCount('//f:x') And 'matching node(s)'

        Will have the following result:

        <x xmlns="http://my.URI.number4"> <x xmlns="http://my.URI.number%34"/> </x> 1 matching node(s)

Well-formed documents and validation

Before an XML document can be processed, its structure must match the rules expressed in the productions in Syntax of document, element, Attribute, Comment, PI, along with the extra rules alluded to in square brackets (for example, [Unique Att], indicating that a single attribute name may not be given twice in the list of attributes for an element). When the syntax is correct, including these rules, the document is called well-formed.

The XmlDoc API enforces the syntax rules of well-formed documents.

In addition to this checking, an XML processor may also check to see that the format of the document matches the structure and restrictions declared for it in either the Document Type Declaration or the document's Schema. If the document matches the type structure and restrictions, it is called valid. In the W3C XML Recommendation, this validation of a document is an optional feature of an XML processor.

With the current version, the XmlDoc API does not validate the XML document. Note that support of XML Schema is planned; Document Type Declarations have several shortcomings, including a limitation on the types of constraints that can be placed on the document, a specialized baroque syntax that doesn't conform to the element/attribute structure of XML, and incorporation of some features that have nothing to do with document validation.

Normalization during deserialization

When an XML processor, in particular the XmlDoc API, parses an XML document from character form into an internal representation, it must make some transformations of the document. The two most significant types of these transformations concern the following:

  • Entity and character references
  • Whitespace characters

Normalizing entity and character references

Entity and character references are replaced by their entity and character counterparts before deserialization. For example, the entity reference &gt; in the content of an element or in the AttValue of an Attribute, is handled exactly as if a greater-than symbol (>) occurred at that point in the document. Similarly, the character reference &#x5B; is handled as if a left square-bracket symbol ([) occurred at that point in the document.

This normalization occurs after whitespace normalization, which is discussed in the next section.

Normalizing whitespace characters

In the XML syntax, the whitespace characters are (in hexadecimal, using ISO-10646 character codes):

tab x'09'
linefeed x'0A'
carriage return x'0D'
space x'20'

In general, the whitespace characters can be used in the S production (shown in Syntax of document, element, Attribute, Comment, PI), which must separate many of the tokens in a document (for example, it must follow the element name, if the STag contains an Attribute) and may optionally be used in many other places (for example, it may appear before or after the equal sign (=) between an Attribute name and its value.

The interplay of three factors determine the normalization of whitespace characters during deserialization:

  • The W3C XML Recommendation specifies two normalizing transformations of whitespace:
    1. When a special combination of line-end characters — carriage return and linefeed — occur anywhere in an XML document, they are replaced by a single linefeed character. Also, carriage returns not followed by a linefeed are replaced by a single linefeed character.
    2. When any whitespace character appears in the value of an attribute, it is replaced by a single space character.

    The XmlDoc API always applies these transformations, and the following two sub-sections describe them in more detail.

  • In addition to the XML standard whitespace transformations, the XmlDoc API deserialization methods offer options to control normalization of whitespace characters that occur in the content of an element. Those options are described in these pages:
  • The XmlDoc API deserialization (and serialization) methods honor the xml:space attribute: After the XML standard whitespace transformations, any whitespace within the scope of xml:space="preserve" is retained as is, regardless of the whitespace-handling option in effect for the deserialization method. Elements that are in the scope of xml:space="default" have whitespace handled according to the whitespace-handling option in effect for the deserialization. The individual method descriptions cited above have more information.
Normalized line-end

As specified in "2.11 End-of-Line Handling" of the W3C XML Recommendation, all instances of a carriage return character followed by a linefeed character (CR-LF sequence), as well as all instances of a carriage return not followed by a linefeed, are converted to a single linefeed character.

This behavior only applies to deserialization: there is no modification of whitespace characters in values passed as the value argument of the XmlDoc API Add* and Insert* methods that allow a value argument. Therefore the values of the FOO1 and FOO2 elements created by the LoadXml (deserialization) and AddElement invocations below are different:

* Get EBCDIC carriage return and linefeed: %cl = $X2C('0D25') * This Element value is linefeed: %node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>') * This Element value is carriage return and linefeed: %node:AddElement('FOO2', %cl)

Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution. Therefore the values of FOO1 and FOO2 created by the following two LoadXml invocations are different:

* Get EBCDIC carriage return and linefeed: %cl = $X2C('0D25') * Element value is linefeed: %doc:LoadXml('<FOO1>' With %cl With '</FOO1>') %doc = New * Element value is carriage return and linefeed * (note, character references are ISO-10646): %doc:LoadXml('<FOO2>&#x0D;&#x0A;' With '</FOO2>')

Linefeed characters not removed by the normalization described above and belonging to the Text node child of an element (but not in any other type of node) can further be affected by the whitespace-handling options of LoadXml and WebReceive.

Normalized attribute value

After replacing all CR-LF sequences, and all other CR instances, by LF (as described in Normalized line-end), attribute values have additional whitespace normalization. As specified in "3.3.3 Attribute-Value Normalization" of the W3C XML Recommendation, after the CR-LF normalization, every instance of a whitespace character (tab and linefeed) in an attribute value is converted to a space character. Leading and trailing spaces are not stripped, nor are sequences of multiple spaces collapsed.

This behavior only applies to deserialization; that is, there is no modification of whitespace characters in attribute values passed as the value argument of the AddAttribute function. Therefore the values of the FOO attribute created by the following two methods are different:

* Get EBCDIC carriage return: %c = $X2C('0D') * Attribute value is space: %doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>') * Attribute value is carriage return: %doc:AddAttribute('FOO', %c, '/*/*')

Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution. Therefore the values of the FOO attribute created by the following two LoadXml invocations are different:

* Get EBCDIC carriage return: %c = $X2C('0D') * Attribute value is space: %doc:LoadXml('<top FOO="' With %C With '"/>') %doc = New * Attribute value is carriage return - note CR * is the same in EBCDIC and ISO-10646: %doc:LoadXml('<top FOO="#x0D;"/>')

Note: Whitespace in an attribute (and in any type of node other than a Text node child of an element) is not affected by the whitespace-handling options of LoadXml, WebReceive, and ParseXml.

Language identification

From the W3C XML Recommendation: "A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document."

The only valid values of the xml:lang=".." attribute that Janus SOAP accepts are the language identifier tags specified in IETF RFC 3066 (http://www.w3.org/TR/REC-xml/#RFC1766).

References

As mentioned, the XML support in Janus SOAP is heavily oriented to the concepts and facilities defined by the XML standards. There are two key aspects of XML that application developers should understand at an appropriate level of detail:

  • The syntax, structure, and nomenclature of an XML document.
  • For the XmlDoc API, the syntax, nomenclature, and meaning of an XPath expression.

In addition to, and as a subset of, those standards, the following shorter list of references should be useful in understanding the above key aspects:

http://en.wikipedia.org/wiki/XML The Wikipedia entry for XML.
XML in a Nutshell: A Desktop Quick Reference (2nd edition) By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML.

For XML programming using Janus SOAP or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards.

XML background http://www.w3.org/XML/1999/XML-in-10-points
http://en.wikipedia.org/wiki/XML_namespace The Wikipedia entry for XML namespace.
http://en.wikipedia.org/wiki/Xpath The Wikipedia entry for XPath.
http://msdn.microsoft.com/en-us/magazine/cc302158.aspx Microsoft's .NET Framework XML classes.
http://oreilly.com/catalog/9780596003975 .NET and XML, by Niel M. Bornstein, published 2004 by O'Reilly & Associates.

W3C standards

As discussed earlier in this manual, SOAP (Simple Object Access Protocol) is an Internet standard. This section lists some of the XML-related standards documents that are available.

The World Wide Web Consortion (or "W3C") is the body that creates the XML standards, along with other Internet standards, such as HTML, XHTML, and HTTP. The term "Recommendation," in W3C parlance, means that the standard has been approved by the W3C.

Each document is shown with its title, the status of the standard and the date on which that status was achieved, and the URL that can be used to obtain the document:

Extensible Markup Language (XML) 1.0 (Third Edition) W3C Recommendation 04 February 2004:
http://www.w3.org/TR/REC-xml

This is referred to as the W3C XML Recommendation throughout this article.

Namespaces spec http://www.w3.org/TR/REC-xml-names

This further constrains the form of element and attribute names in an XML document, and it provides a means for qualifying names so that different parts of a document can use different vocabularies.

XPath spec http://www.w3.org/TR/xpath

It is recommended that you start with section 5, “Data Model.”

XML Information Set W3C Recommendation 4 February 2004:
http://www.w3.org/TR/xml-infoset
XML Schema W3C Recommendation, 2 May 2001
SOAP Version 1.2 W3C Recommendation 24 June 2003

The above documents are among the rich set of documents available from the World Wide Web Consortium. To browse for their complete public set of publications and useful links, go to:

http://www.w3.org/