XML processing in Janus SOAP: Difference between revisions
m (1 revision) |
m (minor cleanup) |
||
(36 intermediate revisions by 3 users not shown) | |||
Line 5: | Line 5: | ||
Comments with "&NSPRVSN" are places that could be revisited when "SOAP rule" support available | Comments with "&NSPRVSN" are places that could be revisited when "SOAP rule" support available | ||
--> | --> | ||
[[Janus SOAP]] provides | <var class="product">[[Janus SOAP]]</var> provides <var class="product">[[SOUL]]</var> programmers with a substantial set of facilities for processing eXtensible Markup | ||
Language (XML) documents. | Language (XML) documents. | ||
Among other benefits, | Among other benefits, this enables rich and automated Web services based on a shared and open Web infrastructure. | ||
this enables rich and automated Web services based on a shared and open Web infrastructure. | The design of this XML support is based on various standards, such as XML and [[XPath]]. | ||
The design of this XML support is based on various standards, such as XML and XPath. | |||
Many sections in this article refer to these and other standards, | Many sections in this article refer to these and other standards, | ||
for example, [[#Simple Object Access Protocol (SOAP)|Simple Object Access Protocol (SOAP)]]. | for example, [[#Simple Object Access Protocol (SOAP)|Simple Object Access Protocol (SOAP)]]. | ||
However, it is important to recognize: | However, it is important to recognize: | ||
< | <blockquote><var class="product">Janus SOAP</var> enables you to process <i><b>any XML document</b></i>, whether or not you are using SOAP messages and envelopes. | ||
< | </blockquote> | ||
SOAP messages and envelopes. | |||
</ | |||
XML support is provided in two disjoint sets of classes in | XML support is provided in two disjoint sets of classes in <var class="product">Janus SOAP</var>: | ||
<dl> | <dl> | ||
<dt>[[XmlDoc API]] | <dt>[[XmlDoc API]] | ||
<dd>The methods in these classes allow you to convert a character stream XML document into an | <dd>The methods in these classes allow you to convert a character stream XML document into an | ||
internal format (an [[XmlDoc class|XmlDoc | internal format (an <var>[[XmlDoc class|XmlDoc]]</var> object) or to programmatically create an <var>XmlDoc</var>, to access and modify an | ||
XmlDoc, and to convert an XmlDoc into a character stream XML document. | <var>XmlDoc</var>, and to convert an <var>XmlDoc</var> into a character stream XML document. | ||
<dt>[[XmlParser API]] | <dt>[[XmlParser API]] | ||
<dd>This set of classes provides for event-based extraction of information from an XML document in | <dd>This set of classes provides for event-based extraction of information from an XML document in | ||
Line 29: | Line 27: | ||
This can be beneficial when only a relatively small part of the XML document is to be processed. | This can be beneficial when only a relatively small part of the XML document is to be processed. | ||
</dl> | </dl> | ||
==Standards relevant to Janus SOAP XML facilities== | ==Standards relevant to Janus SOAP XML facilities== | ||
===eXtensible Markup Language (XML)=== | ===eXtensible Markup Language (XML)=== | ||
XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can | XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can be used for structuring almost any kind of data. | ||
be used for structuring almost any kind of data. | Although the word "markup" reveals that the roots of XML are from | ||
Although the word | |||
document processing, and indeed the outermost entity in XML is called a | document processing, and indeed the outermost entity in XML is called a | ||
"document," XML is ideally suited to structuring almost any kind of | |||
data that is exchanged between or within applications, | data that is exchanged between or within applications, | ||
particularly (although by no means exclusively) if they are communicating on a network. | particularly (although by no means exclusively) if they are communicating on a network. | ||
The syntax of XML provides for hierarchical structuring of data (again, the outer | The syntax of XML provides for hierarchical structuring of data (again, the outer | ||
entity is called a document) | entity is called a document) into the principle type called an '''element'''. | ||
into the principle type called an '''element'''. | |||
Elements and the other components of an XML document are described in [[#XML|XML]]. | Elements and the other components of an XML document are described in [[#XML|XML]]. | ||
Line 56: | Line 54: | ||
An XML document can be considered an abstract object: when XML | An XML document can be considered an abstract object: when XML | ||
is used for interchange between applications, | is used for interchange between applications, | ||
it is usually | it is usually "serialized", or transmitted, completely | ||
in character form. | in character form. | ||
The advantage of this is that it is human-readable and can be | The advantage of this is that it is human-readable and can be | ||
Line 63: | Line 61: | ||
Additionally, standard network protocols can be used to exchange documents | Additionally, standard network protocols can be used to exchange documents | ||
between a wide variety of applications on a wide variety of platforms. | between a wide variety of applications on a wide variety of platforms. | ||
As the | As the World Wide Web has demonstrated, using characters as the basis for | ||
World Wide Web has demonstrated, using characters as the basis for | |||
information interchange is extremely powerful and flexible. | information interchange is extremely powerful and flexible. | ||
Beyond these core properties which make XML very attractive for structuring | Beyond these core properties which make XML very attractive for structuring | ||
data, it has become the basis for a large family of standards. | data, it has become the basis for a large family of standards. | ||
Often these standards are referred to as the XML | Often these standards are referred to as the XML "family," in part | ||
because they are managed by the XML Working Group of the W3C. | because they are managed by the XML Working Group of the W3C. | ||
Some of these important standards are | Some of these important standards are | ||
XML Schema, XML Stylesheet Transformations, XML Query, and Web Services | XML Schema, XML Stylesheet Transformations, XML Query, and Web Services | ||
Description Language (WSDL). | Description Language (WSDL). | ||
See http://www.w3c.org | See http://www.w3c.org for more information about these and other standards related to XML. | ||
for more information about these and other standards related to XML. | |||
Quoting from <i | Quoting from <i>XML in a Nutshell (2nd ed)</i> (see [[#References|References]]): | ||
< | <blockquote>XML offers the tantalizing possibility of truly cross-platform, long term | ||
data formats. ... | data formats. ... XML delivers portable data. | ||
XML delivers portable data. | |||
In many ways, XML is the most portable ... format designed since the ASCII text file. | In many ways, XML is the most portable ... format designed since the ASCII text file. | ||
</ | </blockquote> | ||
You can use XML strictly as an internal | You can use XML strictly as an internal data structure in your application, | ||
or in | or in <var class="product">Model 204</var> files, or with operating system files, or with other programs using some communication mechanism. | ||
some communication mechanism. | |||
The simple, character-based format of XML enhances such communication. | The simple, character-based format of XML enhances such communication. | ||
You can communicate with the Web (HTTP), either as a server application | You can communicate with the Web (HTTP), either as a server application (for example, | ||
(for example, | using <var class="product">[[Janus Web Server]]</var>) or making client XML requests (for example, using <var class="product">[[Janus Sockets]]</var> [[HTTP Helper]]). | ||
using [[Janus Web Server]]) or making client XML requests (for example, using [[Janus Sockets]] HTTP Helper). | You can use native <var class="product">Model 204</var> IODEV communication facilities, or <var class="product">Model 204</var> MQ Series, or | ||
You can use native | |||
any facility that can send and receive streams of characters. | any facility that can send and receive streams of characters. | ||
===Simple Object Access Protocol (SOAP)=== | ===Simple Object Access Protocol (SOAP)=== | ||
The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports | The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports the exchange of structured | ||
the exchange of structured | |||
information between Web-based applications. | information between Web-based applications. | ||
SOAP employs XML to serialize the objects passed between applications. | SOAP employs XML to serialize the objects passed between applications. | ||
SOAP can be used in combination with a variety of existing firewall-friendly | SOAP can be used in combination with a variety of existing firewall-friendly | ||
Internet protocols and formats including HTTP, SMTP, and MIME. | Internet protocols and formats including HTTP, SMTP, and MIME. | ||
SOAP supports a wide range of application paradigms, from messaging systems to | SOAP supports a wide range of application paradigms, from messaging systems to Remote Procedure Call (RPC). | ||
Remote Procedure Call (RPC). | |||
SOAP is an excellent standard for information exchange between applications, | SOAP is an excellent standard for information exchange between applications, | ||
so good that it is the reason for the name | so good that it is the reason for the name <var class="product">Janus SOAP</var>. | ||
It is important to recognize the following, however: | It is important to recognize the following, however: | ||
< | <var class="product">Janus SOAP</var> enables you to process <i><b>any XML document</b></i>, whether | ||
< | or not you are using SOAP messages and envelopes. | ||
or not you are using SOAP messages and envelopes | |||
<!-- &NSPRVSN --> | <!-- &NSPRVSN --> | ||
In fact, with the current version, | In fact, with the current version, although you can readily process formal SOAP | ||
although you can readily process formal SOAP | |||
messages, there are no features specially oriented toward that: all features are | messages, there are no features specially oriented toward that: all features are | ||
generalized for handling any kind of XML document. | generalized for handling any kind of XML document. | ||
Later versions will add more functionality to incorporate the standard processing | Later versions will add more functionality to incorporate the standard processing | ||
of SOAP messages, so your | of SOAP messages, so your application will only need to deal with the application-specific | ||
application will only need to deal with the application-specific | |||
parts of the messages. | parts of the messages. | ||
====Example SOAP request==== | ====Example SOAP request==== | ||
This example SOAP message is a request to a SOAP server: | This example SOAP message is a request to a SOAP server: | ||
< | <p class="code">POST /StockQuote HTTP/1.1 | ||
Host: www.stockquoteserver.com | |||
Content-Type: text/xml; charset="utf-8" | |||
Content-Length: nnnn | |||
SOAPAction: "Some-URI" | |||
<nowiki><SoapEnv:Envelope | |||
xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/" | |||
SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> | |||
<SoapEnv:Body> | |||
<m:GetLastTradePrice | |||
xmlns:m="http://sirius-software.com/samp/JSOAP/1"> | |||
<symbol>EMC</symbol> | |||
</m:GetLastTradePrice> | |||
</SoapEnv:Body> | |||
</SoapEnv:Envelope></nowiki> | |||
</ | </p> | ||
====Example SOAP response==== | ====Example SOAP response==== | ||
This example SOAP message could be a response to | This example SOAP message could be a response to | ||
the above message: | the above message: | ||
< | <p class="code">HTTP/1.1 200 OK | ||
Content-Type: text/xml; charset="utf-8" | |||
Content-Length: nnnn | |||
<nowiki><SOAP-ENV:Envelope | |||
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" | |||
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/></nowiki> | |||
<SOAP-ENV:Body> | |||
<m:GetLastTradePriceResponse xmlns:m="Some-URI"> | |||
<Price>34.5</Price> | |||
</m:GetLastTradePriceResponse> | |||
</SOAP-ENV:Body> | |||
</SOAP-ENV:Envelope> | |||
</ | </p> | ||
===XML Path Language (XPath) in the XmlDoc API=== | ===XML Path Language (XPath) in the XmlDoc API=== | ||
XPath is a language designed specifically to select nodes from an XML document. | XPath is a language designed specifically to select nodes from an XML document. | ||
It is very powerful, yet it is based on familiar syntax that mimics an | It is very powerful, yet it is based on familiar syntax that mimics an XML document's hierarchy. | ||
XML document's hierarchy. | XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes on which to operate. | ||
XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes | It is a key component of XSLT, XPointer, and XLink, and it has a common foundation with XML Query. | ||
on which to operate. | |||
It is a key component of XSLT, XPointer, and | |||
XLink, and it has a common foundation with XML Query. | |||
An introduction to the use of XPath is provided in | An introduction to the use of XPath is provided in | ||
Line 173: | Line 159: | ||
==XML== | ==XML== | ||
As explained above, XML provides the basis for a large | As explained above, XML provides the basis for a large number of varied standards. | ||
number of varied standards. | |||
This section introduces the <i><b>W3C XML Recommendation</b></i>, that is, the XML standard. | This section introduces the <i><b>W3C XML Recommendation</b></i>, that is, the XML standard. | ||
It gives you basic information about XML, explaining some of the concepts using the | It gives you basic information about XML, explaining some of the concepts using the XmlDoc API (that is, | ||
XmlDoc API (that is, | the methods of the <var>XmlDoc</var>, <var>XmlNodelist</var>, and <var>XmlNode</var> classes). | ||
the methods of the XmlDoc, XmlNodelist, and XmlNode classes). | This approach gives you concrete examples which you can try in <var class="product">SOUL</var>, | ||
This approach gives you concrete examples which you can try in | |||
and which may make the abstract concepts easier to understand. | and which may make the abstract concepts easier to understand. | ||
The syntax of XML provides for | The syntax of XML provides for: | ||
object is called a document) into '''elements'''. | <ul> | ||
<li>Hierarchical structuring of data (the outer object is called a document) into '''elements'''. | |||
<p> | |||
An element has a name, which need not be unique within the document. | An element has a name, which need not be unique within the document. | ||
An element can have any number of '''attributes''', each of which | An element can have any number of '''attributes''', each of which | ||
has a name (which must be unique within that element — but not within the | has a name (which must be unique within that element — but not within the document) and a value. | ||
document) and a value. | Within an element can be a series of values and ("sub-") elements, which provides XML with its hierarchical nature.</p> | ||
Within an element can be a series of values and ( | |||
which provides XML with its hierarchical nature. | <li>Assigning unique identifiers to elements; | ||
< | |||
<li> | |||
this provides even more structuring possibilities than simple hierarchy. | this provides even more structuring possibilities than simple hierarchy. | ||
<p> | |||
These identifiers are implemented with the element type definition | These identifiers are implemented with the element type definition | ||
features provided with either Document Type Declarations or with XML Schema. | features provided with either Document Type Declarations or with XML Schema. | ||
Element type definitions are omitted from our XML documentation; they | Element type definitions are omitted from our XML documentation; they | ||
are not supported in the current version. | are not supported in the current version. </p> | ||
<!-- &NSCHVSN --> | <!-- &NSCHVSN --> | ||
</ul> | </ul> | ||
An XML document has exactly one outer, or | An XML document has exactly one outer, or "top-level," element, and this element | ||
contains, as descendants, any other elements that may be in the document. | |||
any other elements that may be in the document. | |||
In addition to the data contained in elements and attributes, any | In addition to the data contained in elements and attributes, any | ||
number of '''comments''' may appear wherever an element may appear. | number of '''comments''' may appear wherever an element may appear. | ||
There | There is also a component called a processing instruction, or '''PI''', | ||
is also a component called a processing instruction, or '''PI''', | |||
which is effectively a comment that has a name. | which is effectively a comment that has a name. | ||
All names (element names, attribute names, entity references, | All names (element names, attribute names, entity references, and PI targets) are case-sensitive; for example, a less-than symbol | ||
and PI targets) are case-sensitive; for example, a less-than symbol | (<tt><</tt>) can be included in an attribute value if you use the characters | ||
(<) can be included in an attribute value if you use the characters | <code>&lt;</code> — but not if you use <code>&LT;</code> or <code>&Lt;</code>. | ||
or | |||
The rest of this section explains the syntax of XML and various rules | The rest of this section explains the syntax of XML and various rules | ||
for XML documents, according to the <i | for XML documents, according to the <i>W3C XML Recommendation</i> (as mentioned in [[#References|References]], | ||
this includes both the XML specification per se, and the XML Namespaces | this includes both the XML specification per se, and the XML Namespaces specification). | ||
specification). | In ([[#XML syntax|XML syntax]], below) and elsewhere as appropriate, you will find | ||
In ([[#XML syntax|XML syntax]]) and elsewhere as appropriate, you will find | comments about limitations imposed by the XmlDoc API on the <i>W3C XML Recommendation</i>. | ||
comments about limitations imposed by the XmlDoc API on the <i | |||
===XML example=== | ===XML example=== | ||
The next example illustrates the major components of an XML document. | The next example illustrates the major components of an XML document. | ||
The formatting into separate, indented lines is | The formatting into separate, indented lines is provided for readability, but it is not significant for this and for most business data exchange applications. | ||
provided for readability, but it is not significant for this and for most | The letter labels on the left are not part of the document; they are for the explanation which follows: | ||
business data exchange applications. | <p class="code">X: <?xml version='1.1'?> | ||
The letter labels on the left are not part of the document; they | A: <!-- Purchase order follows --> | ||
are for the explanation which follows: | B: <purchase_order> | ||
< | C: <memo>Dave's order was "late"</memo> | ||
D: <?program-version 4.1?> | |||
E: <pitm> | |||
<partID>1234</partID> | |||
F: <price per="12" amt="1.280"/> | |||
<qty>36</qty> | |||
G: </pitm> | |||
H: <pitm> | |||
I: <price amt=".29"></price> | |||
<partID>5678</partID> | |||
<qty>2</qty> | |||
</pitm> | |||
</purchase_order> | |||
</p> | |||
</ | |||
In the following explanation of each of the labeled lines above, | In the following explanation of each of the labeled lines above, references of the form '''['''<i>cnn</i>''']''', like <code>[B22]</code>, | ||
references of the form '''['''<i | are to productions in [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]] below. | ||
are to productions in [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]. | <table class="thJustBold"> | ||
< | <tr><th>X: | ||
< | <td><code><?xml version='1.1'?></code> | ||
< | <p> | ||
< | |||
The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which | The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which | ||
is the set of components preceding the top-level element. | is the set of components preceding the top-level element. | ||
If XMLDecl is present it must: | If XMLDecl is present it must:</p> | ||
<ul> | <ul> | ||
<li>Be the first markup in the document (only whitespace may precede it). | <li>Be the first markup in the document (only whitespace may precede it). | ||
<li>Specify at least the version (as of version 7.5 of the <var class="product">Sirius Mods</var>, | <li>Specify at least the version (as of version 7.5 of the <var class="product">Sirius Mods</var>, | ||
"1.0" and "1.1" are the only valid versions). | |||
</ul> | </ul> | ||
The clauses in XMLDecl are positional, that is, they must be given in the order | The clauses in XMLDecl are positional, that is, they must be given in the order shown in the syntax.</td></tr> | ||
shown in the syntax. | |||
< | <tr><th>A:</th> | ||
< | <td><code><!-- Purchase order follows --></code> | ||
< | <p> | ||
This is a comment at top-level. | This is a comment at top-level. [A1], [B22], and [D27] allow | ||
[A1], [B22], and [D27] allow | zero or more comments and PIs before and after the top-level element.</p></td></tr> | ||
zero or more comments and PIs before and after the top-level element. | |||
< | <tr><th>B:</th> | ||
< | <td><code><purchase_order></code> | ||
< | <p> | ||
This is the element start-tag or STag ([G40]) of the top-level element ([A1]). | This is the element start-tag or STag ([G40]) of the top-level element ([A1]).</p></td></tr> | ||
< | |||
< | <tr><th>C:</th> | ||
< | <td><code><memo>Dave's order was "late"</memo></code> | ||
With | <p> | ||
With "leaf" elements (known in XML Schema as elements with simple content), | |||
that is, if the only thing between the STag and | that is, if the only thing between the STag and | ||
Etag is CharData ([P14]), you can usually implement the information either as an | Etag is CharData ([P14]), you can usually implement the information either as an | ||
element (text) or as an attribute of the parent element. | element (text) or as an attribute of the parent element. | ||
This text example highlights one small distinction, namely that | This text example highlights one small distinction, namely that | ||
AttValue ([M10]) has less flexibility: | AttValue ([M10]) has less flexibility:</p> | ||
<ul> | <ul> | ||
<li>If the value includes both apostrophes and quotation marks, either the | <li>If the value includes both apostrophes and quotation marks, either the | ||
apostrophes or the quotes must be escaped. | apostrophes or the quotes must be escaped. | ||
<li>CharData not only allows | <li>CharData not only allows | ||
quotes and apostrophes, but it also allows CDSect [Q18]. | quotes and apostrophes, but it also allows CDSect [Q18]. | ||
</ul> | </ul></td></tr> | ||
< | |||
< | <tr><th>D:</th> | ||
< | <td><code><?program-version 4.1?></code> | ||
<p> | |||
This is a PI [V16]. | This is a PI [V16]. | ||
Presumably the name (actually, the target) | Presumably the name (actually, the target) "program-version" is used by the application reading this document.</p></td></tr> | ||
is used by the application reading this document. | |||
< | <tr><th>E:</th> | ||
< | <td><code><pitm></code> | ||
< | <p> | ||
This is the STag of an element which is contained | This is the STag of an element which is contained | ||
within another element and which contains child elements; | within another element and which contains child elements; | ||
this allows you to group elements together. | this allows you to group elements together.</p></td></tr> | ||
< | |||
< | <tr><th>F:</th> | ||
< | <td><code><price per="12" amt="1.280"/></code> | ||
<p> | |||
This is an example of the EmptyElemTag ([I44]), which can be useful | This is an example of the EmptyElemTag ([I44]), which can be useful | ||
if an element contains no data (just the name can be meaningful to | if an element contains no data (just the name can be meaningful to | ||
the application), or if it only contains data using attributes. | the application), or if it only contains data using attributes.</p></td></tr> | ||
< | |||
< | <tr><th>G:</th> | ||
< | <td><code></pitm></code> | ||
<p> | |||
This is the ETag [H42] of an element. | This is the ETag [H42] of an element. | ||
The name must exactly match the STag for the element (again, XML is case sensitive). | The name must exactly match the STag for the element (again, XML is case sensitive).</p></td></tr> | ||
< | |||
< | <tr><th>H:</th> | ||
< | <td><code><pitm></code> | ||
Here is another STag of an element; | <p> | ||
it is the | Here is another STag of an element; it is the "sibling" of another with the same name. | ||
The ability to have sub-elements and the ability to repeat elements with the | The ability to have sub-elements and the ability to repeat elements with the | ||
same name in a given parent element are the important data modeling | same name in a given parent element are the important data modeling | ||
distinctions between elements and attributes. | distinctions between elements and attributes.</p></td></tr> | ||
< | |||
< | <tr><th>I:</th> | ||
< | <td><code><price amt=".29"></price></code> | ||
<p> | |||
Note that not all instances of a given element type (the price item | Note that not all instances of a given element type (the price item | ||
is an element type) must have the same attributes, nor must they have | is an element type) must have the same attributes, nor must they have | ||
the same sub-structure. | the same sub-structure. Also, these are optional:</p> | ||
Also, these are optional: | |||
<ul> | <ul> | ||
<li>Whether an element has content. | <li>Whether an element has content. | ||
<li>Whether to use an STag immediately followed by an ETag (as is done here) | <li>Whether to use an STag immediately followed by an ETag (as is done here) | ||
or to use the EmptyElemTag (as is done above in item F). | or to use the EmptyElemTag (as is done above in item F). | ||
</ul> | </ul></td></tr> | ||
</ | </table> | ||
===XML syntax=== | ===XML syntax=== | ||
This section contains a version of the XML syntax. | This section contains a version of the XML syntax. | ||
It is taken from the <i | It is taken from the <i>W3C XML Recommendation</i>, which is the authoritative reference: | ||
< | <p class="code"><nowiki>http://www.w3.org/TR/REC-xml</nowiki> | ||
</p> | |||
</ | |||
The syntax below has been changed from the standard in these ways: | The syntax below has been changed from the standard in these ways: | ||
<ul> | <ul> | ||
<li>The only structure in the XML syntax not supported | <li>The only structure in the XML syntax not supported | ||
in the current version is the | in the current version is the | ||
<!-- &NDTDVSN --> | <!-- &NDTDVSN --> | ||
Document Type Declaration, or DTD, ( | Document Type Declaration, or DTD, ("<!DOCTYPE...>"). | ||
Although a DTD can be tolerated if you use the DTD_IGNORE option | Although a DTD can be tolerated if you use the DTD_IGNORE option | ||
of the deserialization functions ([[LoadXml (XmlDoc/XmlNode function)|LoadXml]], | of the deserialization functions (<var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>, | ||
[[WebReceive (XmlDoc function)|WebReceive]], and [[ParseXml (HttpResponse function)|ParseXml]]) | <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, and <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>) | ||
— the information contained in the | — the information contained in the DTD is not used nor made available to the <var class="product">SOUL</var> program. | ||
DTD is not used nor made available to the | |||
Reflecting the absence of support for DTD, | Reflecting the absence of support for DTD, the productions in the syntax that follows are altered to remove those | ||
the productions in the syntax that follows are altered to remove those | |||
parts of an XML document introduced in the DTD. | parts of an XML document introduced in the DTD. | ||
'''Note:''' | <p class="note"> | ||
Much of the functionality of document type declarations may be better | '''Note:''' Much of the functionality of document type declarations may be better | ||
provided using XML Schema, which is planned for a future version. | provided using XML Schema, which is planned for a future version.</p> | ||
<li>The Char, Name, NameStartChar, and NameChar productions are taken from | |||
the | <li>The Char, Name, NameStartChar, and NameChar productions are taken from the [http://www.w3.org/TR/xml11/ XML 1.1 recommendation]. | ||
As explained in [[#Char and Reference|Char and Reference]], only characters representable in 8-bit | As explained in [[#Char and Reference|Char and Reference]], only characters representable in 8-bit | ||
EBCDIC were handled prior to <var class="product">Sirius Mods</var> version 7.6, | EBCDIC were handled prior to <var class="product">Sirius Mods</var> version 7.6, so fewer characters were supported in the production for | ||
so fewer characters were supported in the production for | |||
Char ([CA2]) in earlier <var class="product">Sirius Mods</var> releases. | Char ([CA2]) in earlier <var class="product">Sirius Mods</var> releases. | ||
<li>The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version | <li>The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version | ||
7.7, the maximum was 100). | 7.7, the maximum was 100). | ||
<li>The productions are re-ordered | |||
(to make it easier to read the grammar), and letters are added before them, | <li>The productions are re-ordered (to make it easier to read the grammar), and letters are added before them, | ||
so when [B22] is referred to in the text, you know that this is between | so when <code>[B22]</code> is referred to in the text, you know that this is between [A<i>nn</i>] and [C<i>nn</i>] in this grammar, and this is production [22] for the same | ||
[ | non-terminal (in this case, <code>prolog</code>) in the <i>W3C XML Recommendation</i>. | ||
non-terminal (in this case, < | |||
</ul> | </ul> | ||
The conventions used are: | The conventions used are: | ||
< | <table> | ||
< | <tr><th>'<i>yyy</i>' or "<i>yyy</i>"</th> | ||
< | <td>Enclosed item, <i><b>yyy</b></i>, must appear exactly as shown.</td></tr> | ||
< | |||
< | <tr><th>#x<i>nn</i></th> | ||
value <i><b>nn</b></i>. | <td>Specifies the character (in ISO-10646) with code value <i><b>nn</b></i>. | ||
For example, < | <p> | ||
tab, carriage return, linefeed, and space characters, respectively. | For example, <code>#x09 #x0D #x0A #x20</code> specify the | ||
< | tab, carriage return, linefeed, and space characters, respectively.</p></td></tr> | ||
< | |||
<i><b>a</b></i>, <i><b>b</b></i>, or <i><b>c</b></i>. | <tr><th>[^<i>abc</i>]</th> | ||
< | <td>Specifies any character except <i><b>a</b></i>, <i><b>b</b></i>, or <i><b>c</b></i>.</td></tr> | ||
< | |||
<i><b>chars</b></i>, where <i><b>chars</b></i> can be the concatenation of these sets: | <tr><th>[<i>chars</i>]</th> | ||
<td>Specifies any character within the set <i><b>chars</b></i>, where <i><b>chars</b></i> can be the concatenation of these sets: | |||
<ul> | <ul> | ||
<li><i><b>y</b></i>, meaning the single character <i><b>y</b></i> | <li><i><b>y</b></i>, meaning the single character <i><b>y</b></i> | ||
<li><i><b>y</b></i>'''-'''<i><b>z</b></i>, meaning characters in the range from | <li><i><b>y</b></i>'''-'''<i><b>z</b></i>, meaning characters in the range from | ||
<i><b>y</b></i> to <i><b>z</b></i>, inclusive | <i><b>y</b></i> to <i><b>z</b></i>, inclusive | ||
</ul> | </ul> | ||
The resulting set of | |||
<i><b>chars</b></i> is the union of the specified sets. | The resulting set of <i><b>chars</b></i> is the union of the specified sets.</td></tr> | ||
< | |||
< | <tr><th><i>set1</i> - <i>set2</i> ("-" not enclosed in [...])</th> | ||
described by <i><b>set2</b></i> removed. | <td>The set of strings described by <i><b>set1</b></i>, with the set of strings | ||
< | described by <i><b>set2</b></i> removed.</td></tr> | ||
< | |||
< | <tr><th>|</th> | ||
< | <td>Separates alternatives.</td></tr> | ||
< | |||
< | <tr><th>?</th> | ||
< | <td>Follows an optional item.</td></tr> | ||
< | |||
< | <tr><th>*</th> | ||
< | <td>Follows an item that can occur any number of times (even not at all).</td></tr> | ||
< | |||
< | <tr><th>+</th> | ||
< | <td>Follows an item that can occur one or more times.</td></tr> | ||
< | |||
</ | <tr><th>(<i>abc</i>) (parentheses)</th> | ||
<td>Groups items.</td></tr> | |||
<tr><th>[<i>rule</i>] ("to the right")</th> | |||
<td>Marks an additional syntax rule.</td></tr> | |||
<tr><th>/*<i>comment</i>*/</th> | |||
<td>Marks a comment.</td></tr> | |||
</table> | |||
The syntax is shown in three sections: | The syntax is shown in three sections: | ||
Line 418: | Line 409: | ||
<li>The major components | <li>The major components | ||
<li>The productions that describe individual characters | <li>The productions that describe individual characters | ||
<li>The components | <li>The components of the "XML Declaration" (<code><?xml version=...?></code>) | ||
of the | |||
</ol> | </ol> | ||
====Syntax of document, element, Attribute, Comment, PI==== | ====Syntax of document, element, Attribute, Comment, PI==== | ||
< | <p class="code">[A1] document ::= (prolog element Misc*) (Char* RestrictedChar Char*) | ||
[A1] document | |||
[B22] prolog ::= XMLDecl? Misc* | |||
[B22] prolog | |||
[C23] XMLDecl | [C23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' | ||
[D27] Misc | [D27] Misc ::= Comment | PI | S | ||
[E3] S | |||
[E3] S ::= (#x20 | #x9 /* Whitespace */ | #xD | #xA)+ | |||
[F39] element | |||
[F39] element ::= STag content ETag [Element Type Match] | EmptyElemTag | |||
[G40] STag | |||
[H42] ETag | [G40] STag ::= '<' Name (S Attribute)* S? '>' [Unique Att] | ||
[H42] ETag ::= '</' Name S? '>' | |||
[I44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [Unique Att] | [I44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [Unique Att] | ||
[NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | | [NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | | ||
[#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | | |||
[#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | |||
[NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | | [NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] | ||
[NA] Name ::= NameStartChar (NameChar)* | [NA] Name ::= NameStartChar (NameChar)* | ||
</ | </p> | ||
Within an XML document, the maximum length of a name (for example, | Within an XML document, the maximum length of a name (for example, each of the prefix part the the local part of | ||
each of the prefix part the the local part of | |||
an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7, | an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7, | ||
the maximum length was 100 characters). | the maximum length was 100 characters). | ||
Element and attribute names are also subject to | Element and attribute names are also subject to | ||
restrictions related to XML Namespaces; see [[#Name and namespace syntax|Name and namespace syntax]]. | restrictions related to XML Namespaces; see [[#Name and namespace syntax|Name and namespace syntax]]. | ||
< | <p class="code">[L41] Attribute ::= Name Eq AttValue | ||
[L41] Attribute ::= Name Eq AttValue | |||
[M10] AttValue ::= '"' ([^<&"] | Reference)* '"' | [M10] AttValue ::= '"' ([^<&"] | Reference)* '"' | ||
| "'" ([^<&'] | Reference)* "'" | | "'" ([^<&'] | Reference)* "'" | ||
Line 463: | Line 453: | ||
[N25] Eq ::= S? '=' S? | [N25] Eq ::= S? '=' S? | ||
[O43] content ::= CharData? ( (element | [O43] content ::= CharData? ( (element | Reference | CDSect | PI | Comment) CharData? )* | ||
[P14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) | [P14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) | ||
[Q18] CDSect ::= CDStart CData CDEnd | [Q18] CDSect ::= CDStart CData CDEnd | ||
[R19] CDStart ::= '<![CDATA[' | [R19] CDStart ::= '<![CDATA[' | ||
[S20] CData ::= (Char* - (Char* ']]>' Char*)) | [S20] CData ::= (Char* - (Char* ']]>' Char*)) | ||
[T21] CDEnd ::= ']]>' | [T21] CDEnd ::= ']]>' | ||
[U15] Comment ::= ' | [U15] Comment ::= '<!--' ( (Char - '-') | ('-' (Char - '-')) )* '-->' | ||
[V16] PI ::= '<?' PITarget (S (Char* | [V16] PI ::= '<?' PITarget (S (Char* (Char* '?>' Char*) ))? '?>' | ||
[W17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) | |||
</p> | |||
====Char and Reference==== | ====Char and Reference==== | ||
< | <p class="code">[CA2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | ||
[CA2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | |||
[CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F] | ||
[CB67] Reference ::= EntityRef | CharRef | [CB67] Reference ::= EntityRef | CharRef | ||
[CD68] EntityRef ::= '&' Name ';' | [CD68] EntityRef ::= '&' Name ';' | ||
[CC66] CharRef ::= '&#' [0-9]+ ';' | |||
[CC66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' [Legal Char] | |||
</ | </p> | ||
=====ISO-10646 and EBCDIC characters===== | =====ISO-10646 and EBCDIC characters===== | ||
<ul> | <ul> | ||
<li>Through <var class="product">Sirius Mods</var> version 7.5, | <li>Through <var class="product">Sirius Mods</var> version 7.5, XmlDocs were maintained in EBCDIC, and | ||
XmlDocs were maintained in EBCDIC, and | production <code>[CA2]</code> above did not allow the full range of ISO-10646 characters shown in the <i>W3C XML Recommendation</i>. | ||
production [CA2] above did not | (ISO-10646 is the standard for the universal character set, also known as Unicode.) | ||
allow the full range of ISO-10646 characters shown in the <i | |||
(ISO-10646 is the standard for the universal character set, also known as | |||
Unicode.) | |||
The XmlDoc API might have rejected an XML document | The XmlDoc API might have rejected an XML document | ||
because it contained an ISO-10646 character that could not be represented in EBCDIC. | because it contained an ISO-10646 character that could not be represented in EBCDIC. | ||
As of <var class="product">Sirius Mods</var> version 7.6, XmlDocs are maintained in Unicode | As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDocs</var> are maintained in Unicode | ||
as supported by the <var class="product">Sirius Mods</var>. | as supported by the <var class="product">Sirius Mods</var>. | ||
This is why production [CA2] shows that | This is why production <code>[CA2]</code> shows that no Unicode characters greater then <code>U+FFFD</code> are allowed. | ||
no Unicode characters greater then U+FFFD are allowed. | |||
In addition, deserialization (with default options) of an XML document fails if the document | In addition, deserialization (with default options) of an XML document fails if the document | ||
contains a Unicode character that is not translatable to EBCDIC. | contains a Unicode character that is not translatable to EBCDIC. | ||
The AllowUntranslatable option of the deserialization methods lets you | The <var>AllowUntranslatable</var> option of the deserialization methods lets you circumvent this restriction. | ||
circumvent this restriction | |||
The null character (<code>#x0</code>), normally restricted, is allowed in an XML | |||
document if the <var>XmlDoc</var>'s <var>AllowNull</var> property is set to <code>True</code>. | |||
<blockquote class="note"> | |||
<p>'''Note:''' Using the standard translation table provided with <var class="product">Sirius Mods</var> versions prior to 7.3, | |||
many EBCDIC characters (such as <code>X'FF'</code>), in addition to the "control characters" that were | |||
explicitly prohibited, were ''not'' legal XML characters because they did not translate to any Unicode character.</p> | |||
<p> | |||
In <var class="product">Sirius Mods</var> version 7.3, the standard translation table was modified significantly. | In <var class="product">Sirius Mods</var> version 7.3, the standard translation table was modified significantly. | ||
For more information about supported characters and character translation | For more information about supported characters and character translation | ||
issues as of version 7.3, see [[ | issues as of version 7.3, see [[Unicode#Support for the ASCII subset of Unicode|Support for the ASCII subset of Unicode]] and [[Unicode#Corrected translations between ASCII/Unicode and EBCDIC|Corrected translations between ASCII/Unicode and EBCDIC]].</p> </blockquote> | ||
<li>As stated in | |||
<li>As stated in [[XmlDoc API#Transport: sending and receiving XML|Transport: sending and receiving XML]], UTF-8, UTF-16, and ISO-8859-x | |||
encodings are accepted (note that these must be given in all-capital letters within the XML declaration). | encodings are accepted (note that these must be given in all-capital letters within the XML declaration). | ||
<li>XPath comparisons are performed using Unicode. | <li>XPath comparisons are performed using Unicode. | ||
As of version 7.3, it is the only type of ordered character comparison. | As of version 7.3, it is the only type of ordered character comparison. | ||
Line 535: | Line 518: | ||
and could be controlled by the (now obsolete) [[XPathOrder (obsolete XmlDoc property)|XPathOrder]] property. | and could be controlled by the (now obsolete) [[XPathOrder (obsolete XmlDoc property)|XPathOrder]] property. | ||
</ul> | </ul> | ||
=====Entity references===== | =====Entity references===== | ||
<ul> | <ul> | ||
<li>One purpose of an EntityRef is to allow a sequence of characters that | <li>One purpose of an EntityRef is to allow a sequence of characters that | ||
may be illegal in a particular context of an XML document. | may be illegal in a particular context of an XML document. | ||
For example, within an element's content, the string | For example, within an element's content, the string <code>]]></code> is not | ||
allowed, so you may replace the greater-than symbol (>) with | allowed, so you may replace the greater-than symbol (<tt>></tt>) with either its character code in a CharRef, or with the predefined entity <code>&gt;</code>: | ||
either its character code in a CharRef, or with | <p class="code">]]&gt; | ||
the predefined entity < | </p> | ||
< | <p> | ||
A <code>Reference</code> (<code>EntityRef</code> or <code>CharRef</code>) is allowed only in an element's content (<code>[O43]</code>) or in <code>AttValue</code> (<code>[M10]</code>).</p> | |||
</ | |||
A Reference (EntityRef or CharRef) | |||
is allowed only in an element's content ([O43]) or in AttValue ([M10]). | |||
<li>There is a facility for defining your own entities in a DTD, but | <li>There is a facility for defining your own entities in a DTD, but | ||
since DTDs are not supported in | since DTDs are not supported in <var class="product">Janus SOAP</var>, | ||
the only entity references supported are the five predefined entities: | the only entity references supported are the five predefined entities: | ||
< | |||
< | <table class="thJustBold"> | ||
< | <tr><th>&amp;</th> | ||
< | <td>ampersand (<tt>&</tt>)</td></tr> | ||
< | |||
< | <tr><th>&apos;</th> | ||
< | <td>apostrophe (<tt>'</tt>)</td></tr> | ||
< | |||
< | <tr><th>&gt;</th> | ||
< | <td>greater than (<tt>></tt>)</td></tr> | ||
< | |||
</ | <tr><th>&lt;</th> | ||
<td>less than (<tt><</tt>)</td></tr> | |||
(listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2) | <tr><th>&quot;</th> | ||
<td>double quotation mark (<tt>"</tt>)</td></tr> | |||
<tr><th>&lsqb; <br>&rsqb;</th> | |||
<td>left and right square brackets (<tt>[</tt> <tt>]</tt>) <br>(as of Model 204 7.6)</td></tr> | |||
</table> | |||
<blockquote class="note"> | |||
<p>'''Note:''' You can use any of the XHTML entities (listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2) | |||
to represent Unicode characters when converting from EBCDIC to Unicode. | to represent Unicode characters when converting from EBCDIC to Unicode. | ||
Character decoding must be in effect, however: you must be using | Character decoding must be in effect, however: you must be using | ||
the [[U (String function)|U]] constant function | the <var>[[U (String function)|U]]</var> constant function or the <code>CharacterDecode=True</code> argument on the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> function. </p> | ||
or the < | |||
argument on the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] function. | |||
You can load into an XmlDoc a character represented by such an entity | You can load into an <var>XmlDoc</var> a character represented by such an entity if you decode the entity reference before the character is processed by one of the XmlDoc API deserializing or direct storage methods. </blockquote> | ||
if you decode the entity reference before the character is processed | |||
by one of the XmlDoc API deserializing or direct storage methods. | |||
</ul> | </ul> | ||
====Components of XMLDecl==== | ====Components of XMLDecl==== | ||
< | <p class="code">[XA24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') | ||
[XA24] VersionInfo ::= S 'version' Eq | |||
[XB26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+ | [XB26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+ | ||
[XC80] EncodingDecl ::= S 'encoding' Eq | [XC80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) | ||
[XD81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | [XD81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Only Latin chars */ | ||
[XE32] SDDecl ::= S 'standalone' Eq ( | [XE32] SDDecl ::= S 'standalone' Eq ( ("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"') ) | ||
</p> | |||
</ | |||
===Names and namespaces=== | ===Names and namespaces=== | ||
XML documents are allowed to contain | XML documents are allowed to contain elements and attributes that are defined by one organization, as well as | ||
elements and attributes that are defined by one organization, as well as | |||
other elements and attributes that are defined by another organization. | other elements and attributes that are defined by another organization. | ||
In order to achieve this organizational | In order to achieve this organizational "merging," the <i>XML Namespaces Recommendation</i> (http://www.w3.org/TR/REC-xml-names) | ||
the <i | |||
(http://www.w3.org/TR/REC-xml-names) | |||
provides for a way to qualify these merged names so that they will not conflict. | provides for a way to qualify these merged names so that they will not conflict. | ||
Also, the Namespaces Recommendation provides a way for an application | Also, the Namespaces Recommendation provides a way for an application | ||
to examine, in effect, the | to examine, in effect, the "defining organization" of a name | ||
in an XML document, so that various properties can be inferred, and | in an XML document, so that various properties can be inferred, and | ||
names from the same | names from the same "organization" can be grouped together. | ||
Conceptually, the Namespaces Recommendation qualifies a name with a | Conceptually, the Namespaces Recommendation qualifies a name with a Uniform Resource Identifier ('''URI'''). | ||
Uniform Resource Identifier ('''URI'''). | |||
There are various rules for various types of URIs; one familiar type | There are various rules for various types of URIs; one familiar type | ||
is the same as URLs on the World Wide Web, such as | is the same as URLs on the World Wide Web, such as: | ||
< | <p class="code"><nowiki>http://www.w3.org/2001/XMLSchema</nowiki> | ||
</p> | |||
</ | |||
The important aspect of a URI, as far as the names in an XML document | The important aspect of a URI, as far as the names in an XML document | ||
are concerned, is simply that it is a unique string for the names | are concerned, is simply that it is a unique string for the names that are associated with it. | ||
that are associated with it. | |||
The characters that are valid in a URI (shown in [[#Uniform Resource Identifier syntax|Uniform Resource Identifier syntax]]) | The characters that are valid in a URI (shown in [[#Uniform Resource Identifier syntax|Uniform Resource Identifier syntax]]) exceed the set of characters that are valid in an XML name. | ||
exceed the set of characters that are valid in an XML name. | Therefore, the technique employed for XML Namespace qualification is to use a special kind of attribute — one that begins with "xmlns" — to associate a name '''prefix''' with a URI. | ||
Therefore, the technique employed for XML Namespace qualification is to | |||
use a special kind of attribute — one that begins | |||
with | |||
Then attaching a prefix to a name effectively attaches the URI to a name. | Then attaching a prefix to a name effectively attaches the URI to a name. | ||
The syntax for making this association, the namespace declaration, is explained in the next section. | The syntax for making this association, the namespace declaration, is explained in the next section. | ||
====Name and namespace syntax==== | ====Name and namespace syntax==== | ||
The <i | The <i>W3C XML Recommendation</i> syntax rule for names is shown in | ||
[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]] (and | |||
repeated below) as the Name ([NA]), NameStartChar ([NSC]), | repeated below) as the <code>Name</code> (<code>[NA]</code>), <code>NameStartChar</code> (<code>[NSC]</code>), and <code>NameChar</code> (<code>[NC]</code>) productions. | ||
and NameChar ([NC]) productions. | The XML Namespaces Recommendation provides additional rules for Element and Attribute names (but not for PI targets). | ||
The XML Namespaces Recommendation provides additional | From the Namespaces Recommendation, element and attribute names are both instances of <code>QName</code>: | ||
rules for Element and Attribute names (but not for PI targets). | |||
From the Namespaces Recommendation, element and attribute names | |||
are both instances of < | |||
< | <p class="code">[NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | | ||
[NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | | ||
[#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | |||
[NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | | [NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] | ||
[NA] Name ::= NameStartChar (NameChar)* | [NA] Name ::= NameStartChar (NameChar)* | ||
[NB5] NCName ::= (NameStartChar - ':') (NameChar - ':')* | [NB5] NCName ::= (NameStartChar - ':') (NameChar - ':')* | ||
[NC6] QName ::= (Prefix ':')? LocalPart | [NC6] QName ::= (Prefix ':')? LocalPart | ||
[ND7] Prefix ::= NCName | [ND7] Prefix ::= NCName | ||
[NE8] LocalPart ::= NCName | [NE8] LocalPart ::= NCName | ||
</ | </p> | ||
Although the <i | Although the <i>W3C XML Recommendation</i> does not require that attribute and element names | ||
follow the XML Namespaces Recommendation, the operation of XPath requires | follow the XML Namespaces Recommendation, the operation of XPath requires it. | ||
it. | |||
Therefore, since XPath is so important for the XmlDoc API, its default operating | Therefore, since XPath is so important for the XmlDoc API, its default operating | ||
mode is to require Namespaces conformance in the XML document. | mode is to require Namespaces conformance in the XML document. | ||
See the [[Namespace (XmlDoc property)|Namespace]] property. | See the <var>[[Namespace (XmlDoc property)|Namespace]]</var> property. | ||
The restrictions and changes to the XML Recommendation are as follows: | The restrictions and changes to the XML Recommendation are as follows: | ||
<ul> | <ul> | ||
<li>The < | <li>The <code>NameStartChar</code> and <code>NameChar</code> productions are taken | ||
from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) . | from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) . | ||
Starting with version 7.6 of the <var class="product">Sirius Mods</var>, XmlDocs are maintained in Unicode, as | Starting with version 7.6 of the <var class="product">Sirius Mods</var>, XmlDocs are maintained in Unicode, as | ||
supported by the <var class="product">Sirius Mods</var>. | supported by the <var class="product">Sirius Mods</var>. | ||
That support excludes characters encoded in more than two bytes, so production | That support excludes characters encoded in more than two bytes, so production | ||
[NSC], above, shows no Unicode characters greater than U+FFFD. | <code>[NSC]</code>, above, shows no Unicode characters greater than <code>U+FFFD</code>. | ||
By default, deserialization of an XML document fails if the document | By default, deserialization of an XML document fails if the document | ||
contains a Unicode character that is not translatable to EBCDIC. | contains a Unicode character that is not translatable to EBCDIC. | ||
The AllowUntranslatable argument of the deserialization methods lets you | The <var>AllowUntranslatable</var> argument of the deserialization methods lets you circumvent this restriction. | ||
circumvent this restriction. | |||
<li>A name can have at most one colon (:), which separates | <li>A name can have at most one colon (<tt>:</tt>), which separates | ||
the name into a non-null '''prefix''' and a non-null '''local name'''. | the name into a non-null '''prefix''' and a non-null '''local name'''. | ||
<li>A name without a prefix is simply a local name. | <li>A name without a prefix is simply a local name. | ||
<li>The prefix, if any, must be associated with a '''namespace | |||
URI''' using an attribute of the form: | <li>The prefix, if any, must be associated with a '''namespace URI''' using an attribute of the form: | ||
< | <p class="code">xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>" | ||
xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>" | </p> | ||
</ | |||
<!-- xmlns:prefix="URI" --> | <!-- xmlns:prefix="URI" --> | ||
<!--?? xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"--> | <!--?? xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"--> | ||
For example, all elements (and attributes of those elements) within | For example, all elements (and attributes of those elements) within | ||
the content of the < | the content of the <code>definitions</code> element below can use the prefix | ||
"xsd" to qualify their names to belong to the <nowiki>"http://www.w3.org/2001/XMLSchema"</nowiki> namespace: | |||
<nowiki>"http://www.w3.org/2001/XMLSchema"</nowiki> namespace: | <p class="code"><nowiki><definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema"></nowiki> | ||
< | ... content of definitions element ... | ||
</definitions> | |||
</p> | |||
</ | <li>The prefix <code>xml</code> is bound to the namespace URI | ||
<li>The prefix < | <code><nowiki>http://www.w3.org/XML/1998/namespace</nowiki></code>. | ||
< | |||
Neither can be used without the other. | Neither can be used without the other. | ||
<li>An element can also have a '''default namespace''' attribute, | <li>An element can also have a '''default namespace''' attribute, which "declares" its namespace, of the form: | ||
which | <p class="code">xmlns="URI" | ||
< | </p> | ||
</ | |||
<!--?? xmlns="<i><b>URI</b></i>"--> | <!--?? xmlns="<i><b>URI</b></i>"--> | ||
<li>Another form of default namespace declaration allows | <li>Another form of default namespace declaration allows | ||
an element to disable any default namespace with: | an element to disable any default namespace with: | ||
< | <p class="code">xmlns="" | ||
</p> | |||
</ | |||
<li>A namespace declaration is syntactically the same as an Attribute. | <li>A namespace declaration is syntactically the same as an Attribute. | ||
<li>The scope of a non-default namespace declaration is the element containing it, its | <li>The scope of a non-default namespace declaration is the element containing it, its | ||
attributes, and all descendant elements and their attributes, until another declaration | attributes, and all descendant elements and their attributes, until another declaration | ||
of the prefix. | of the prefix. | ||
<li>The scope of a default namespace declaration is the element containing it (but not | <li>The scope of a default namespace declaration is the element containing it (but not | ||
the attributes of that element) and its descendant elements (but not their attributes), | the attributes of that element) and its descendant elements (but not their attributes), | ||
Line 722: | Line 688: | ||
<li>The namespace URI associated with a name is | <li>The namespace URI associated with a name is | ||
<ol> | <ol> | ||
<li>the in-scope URI associated with the prefix of the name, if the name has | <li>the in-scope URI associated with the prefix of the name, if the name has a prefix | ||
a prefix | |||
<li>for element names, | <li>for element names, the in-scope default namespace URI, if the name does not have a prefix | ||
the in-scope default namespace URI, if the name does not have a prefix | |||
and there is a default namespace URI in scope | and there is a default namespace URI in scope | ||
<li>no namespace URI, otherwise | <li>no namespace URI, otherwise | ||
</ol> | </ol> | ||
<li>Two names are identical if they have the same local name and either | <li>Two names are identical if they have the same local name and either | ||
they both do not have a namespace URI or they both have the same namespace | they both do not have a namespace URI or they both have the same namespace URI. | ||
URI. | |||
</ul> | </ul> | ||
====Uniform Resource Identifier syntax==== | ====Uniform Resource Identifier syntax==== | ||
The form of a valid string used as a URI is specified in IETF RFC2396 | The form of a valid string used as a URI is specified in IETF RFC2396 (see http://www.faqs.org/rfcs/rfc2396.html). | ||
(see http://www.faqs.org/rfcs/rfc2396.html) . | |||
The rules are as follows: | The rules are as follows: | ||
<ul> | <ul> | ||
<li>Namespace URIs must be '''absolute''': | <li>Namespace URIs must be '''absolute''': | ||
they must start with a non-null prefix (called a | they must start with a non-null prefix (called a | ||
"scheme"), followed by a colon (<tt>:</tt>) and a non-null suffix. | |||
<li>The scheme must start | |||
with a letter, which may be followed by any combination of letters, digits, and | <li>The scheme must start with a letter, which may be followed by any combination of letters, digits, and | ||
the plus (+), hyphen (-), and period (.) characters. | the plus (<tt>+</tt>), hyphen (<tt>-</tt>), and period (<tt>.</tt>) characters. | ||
<li>The suffix can contain any of | <li>The suffix can contain any of the following characters, in addition to letters and digits: | ||
the following characters, in addition to letters and digits: | <p class="code">; (semicolon) - (hyphen) | ||
< | / (slash) _ (underscore) | ||
? (question mark) . (period) | |||
: (colon) ! (exclamation point) | |||
@ (at sign) ~ (tilde) | |||
& (ampersand) * (asterisk) | |||
= (equal sign) ' (apostrophe) | |||
+ (plus sign) ( (open parenthesis) | |||
$ (dollar sign) ) (close parenthesis) | |||
, (comma) | |||
</p> | |||
</ | |||
The suffix can also contain: | The suffix can also contain: | ||
<ul> | <ul> | ||
<li>At most one number sign (#). | <li>At most one number sign (<tt>#</tt>). | ||
<li> | |||
<li>A percent (<tt>%</tt>) character followed by two hex digits | |||
to escape some other character. | to escape some other character. | ||
Line 770: | Line 734: | ||
<ul> | <ul> | ||
<li>The hex digits A-F may be uppercase or lowercase. | <li>The hex digits A-F may be uppercase or lowercase. | ||
<li>The hexadecimal values are not replaced when URI processing is performed. | <li>The hexadecimal values are not replaced when URI processing is performed. | ||
<p> | |||
For example, even though the ASCII code for the number | For example, even though the ASCII code for the number "4" is | ||
hexadecimal 34, the following two URIs are different and distinct: | hexadecimal 34, the following two URIs are different and distinct:</p> | ||
< | <p class="code"><nowiki>http://my.URI.number4 | ||
http://my.URI.number%34</nowiki> | |||
</p> | |||
</ | |||
Thus, for instance, the following fragment: | Thus, for instance, the following fragment: | ||
< | <p class="code">%n = %d:AddElement('x', , <nowiki>'http://my.URI.number4') | ||
%n:AddElement('x', , 'http://my.URI.number%34') | |||
%d:Print | |||
%d:SelectionPrefix('f') = 'http://my.URI.number4'</nowiki> | |||
Print %d:SelectCount('//f:x') And 'matching node(s)' | |||
</p> | |||
</ | |||
Will have the following result: | Will have the following result: | ||
< | <p class="code"><nowiki><x xmlns="http://my.URI.number4"> | ||
<x xmlns="http://my.URI.number%34"/></nowiki> | |||
</x> | |||
1 matching node(s) | |||
</p> | |||
</ | |||
</ul> | </ul> | ||
</ul> | </ul> | ||
</ul> | </ul> | ||
===Well-formed documents and validation=== | ===Well-formed documents and validation=== | ||
Before an XML document can be processed, its structure must match the | Before an XML document can be processed, its structure must match the rules expressed in the productions in | ||
rules expressed in the productions in | [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]], along with | ||
the extra rules alluded to in square brackets (for example, <code>[Unique Att]</code>, | |||
the extra rules alluded to in square brackets (for example, < | indicating that a single attribute name may not be given twice in the list of attributes for an element). | ||
indicating that a single attribute name may not be given twice in the list | When the syntax is correct, including these rules, the document is called '''well-formed'''. | ||
of attributes for an element). | |||
When the syntax is correct, including these rules, the document is | |||
called '''well-formed'''. | |||
The XmlDoc API enforces the syntax rules of well-formed documents. | The XmlDoc API enforces the syntax rules of well-formed documents. | ||
Line 810: | Line 770: | ||
In addition to this checking, an XML processor may also check to see that | In addition to this checking, an XML processor may also check to see that | ||
the format of the document matches the structure and restrictions | the format of the document matches the structure and restrictions | ||
declared for it in either | declared for it in either the Document Type Declaration or the document's Schema. | ||
the Document Type Declaration or the document's Schema. | If the document matches the type structure and restrictions, it is called '''valid'''. | ||
If the document matches the type structure and restrictions, it is | In the <i>W3C XML Recommendation</i>, this validation of a document is an optional feature of an XML processor. | ||
called '''valid'''. | |||
In the <i | |||
an XML processor. | |||
<!-- &NSCHVSN --> | <!-- &NSCHVSN --> | ||
With the current version, the XmlDoc API does not validate the XML document. | With the current version, the XmlDoc API does not validate the XML document. | ||
Note that support of XML Schema is planned; Document Type Declarations | Note that support of XML Schema is planned; Document Type Declarations | ||
have several shortcomings, including a limitation on the types of | have several shortcomings, including a limitation on the types of | ||
constraints that can be placed on the document, a specialized baroque | constraints that can be placed on the document, a specialized baroque | ||
syntax that doesn't conform to the element/attribute structure of | syntax that doesn't conform to the element/attribute structure of | ||
XML, and incorporation of some features that have nothing to do with | XML, and incorporation of some features that have nothing to do with document validation. | ||
document validation. | |||
===Normalization during deserialization=== | ===Normalization during deserialization=== | ||
When an XML processor, in particular the XmlDoc API, parses an XML document from | When an XML processor, in particular the XmlDoc API, parses an XML document from | ||
character form into an internal representation, it must make some transformations | character form into an internal representation, it must make some transformations of the document. | ||
of the document. | |||
The two most significant types of these transformations concern the following: | The two most significant types of these transformations concern the following: | ||
<ul> | <ul> | ||
Line 835: | Line 790: | ||
<li>Whitespace characters | <li>Whitespace characters | ||
</ul> | </ul> | ||
====Normalizing entity and character references==== | ====Normalizing entity and character references==== | ||
Entity and character references are replaced by their entity and character | Entity and character references are replaced by their entity and character counterparts before deserialization. | ||
counterparts before deserialization. | For example, the entity reference <code>&gt;</code> in the <code>content</code> | ||
For example, the entity reference < | of an element or in the <code>AttValue</code> of an Attribute, is handled exactly as if a greater-than symbol | ||
of an element or in the < | (<tt>></tt>) occurred at that point in the document. | ||
is handled exactly as if a greater-than symbol | Similarly, the character reference <code>&#x5B;</code> is handled as if a left | ||
(>) occurred at that point in the document. | square-bracket symbol (<tt>[</tt>) occurred at that point in the document. | ||
Similarly, the character | |||
reference < | |||
square-bracket symbol ( [ ) occurred at that point in the document. | |||
This normalization occurs '''after''' whitespace normalization, which is | This normalization occurs '''after''' whitespace normalization, which is discussed in the next section. | ||
discussed in the next section. | |||
====Normalizing whitespace characters==== | ====Normalizing whitespace characters==== | ||
In the XML syntax, the whitespace characters are (in hexadecimal, | In the XML syntax, the whitespace characters are (in hexadecimal, | ||
using ISO-10646 character codes): | using ISO-10646 character codes): | ||
< | <table class="thJustBold"> | ||
< | <tr><th>tab</th> | ||
< | <td>x'09'</td></tr> | ||
< | <tr><th>linefeed</th> | ||
< | <td>x'0A'</td></tr> | ||
< | <tr><th>carriage return</th> | ||
< | <td>x'0D'</td></tr> | ||
< | <tr><th>space</th> | ||
< | <td>x'20'</td></tr> | ||
</ | </table> | ||
In general, the whitespace characters can be used in the < | |||
production (shown in | In general, the whitespace characters can be used in the <code>S</code> production (shown in | ||
[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]), | [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]), | ||
which must separate | which must separate many of the tokens in a document (for example, it must follow the element name, if the <code>STag</code> | ||
many of the tokens in a document | contains an Attribute) and may optionally be used in many other places (for example, it may appear before or after the equal sign (<tt>=</tt>) | ||
(for example, it must follow the element name, if the < | |||
contains an Attribute) and may optionally be used in many other | |||
places | |||
(for example, it may appear before or after the equal sign (=) | |||
between an Attribute name and its value. | between an Attribute name and its value. | ||
The interplay of three factors determine the normalization of whitespace | The interplay of three factors determine the normalization of whitespace characters during deserialization: | ||
characters during deserialization: | |||
<ul> | <ul> | ||
<li>The <i | <li>The <i>W3C XML Recommendation</i> specifies two normalizing transformations of whitespace: | ||
<ol> | <ol> | ||
<li>When a special combination of | |||
line-end characters — carriage return and linefeed — | <li>When a special combination of line-end characters — carriage return and linefeed — occur '''anywhere''' | ||
occur '''anywhere''' | |||
in an XML document, they are replaced by a single linefeed character. | in an XML document, they are replaced by a single linefeed character. | ||
Also, carriage returns not followed by a linefeed are | Also, carriage returns not followed by a linefeed are replaced by a single linefeed character. | ||
replaced by a single linefeed character. | |||
<li>When any whitespace character appears in the value of an attribute, | <li>When any whitespace character appears in the value of an attribute, it is replaced by a single space character. | ||
it is replaced by a single space character. | |||
</ol> | </ol> | ||
The XmlDoc API always applies these transformations, and | The XmlDoc API always applies these transformations, and the following two sub-sections describe them in more detail. | ||
the following two sub-sections describe them in | <li>In addition to the XML standard whitespace transformations, the XmlDoc API deserialization methods offer options to | ||
more detail. | control normalization of whitespace characters that occur in the <code>content</code> of an element. | ||
<li>In addition to the XML standard whitespace transformations, | Those options are described in these pages: | ||
the XmlDoc API deserialization methods offer options to | |||
control normalization of whitespace characters that | |||
occur in the < | |||
Those options are described in these | |||
<ul> | <ul> | ||
<li>[[LoadXml (XmlDoc/XmlNode function) | <li>[[LoadXml (XmlDoc/XmlNode function)]] | ||
<li>[[WebReceive (XmlDoc function) | <li>[[WebReceive (XmlDoc function)]] | ||
</ul> | </ul> | ||
<li>The XmlDoc API deserialization (and serialization) methods | |||
honor the < | <li>The XmlDoc API deserialization (and serialization) methods honor the <code>xml:space</code> attribute: | ||
After the XML standard whitespace transformations, | After the XML standard whitespace transformations, any whitespace within the scope of <code>xml:space="preserve"</code> | ||
any whitespace within the scope of < | is retained as is, regardless of the whitespace-handling option in effect for the deserialization method. | ||
is retained as is, regardless of | Elements that are in the scope of <code>xml:space="default"</code> have whitespace handled | ||
the whitespace-handling option in effect for the deserialization method. | |||
Elements that are in the scope of < | |||
have whitespace handled | |||
according to the whitespace-handling option in effect for the deserialization. | according to the whitespace-handling option in effect for the deserialization. | ||
The individual method descriptions cited above have more information. | The individual method descriptions cited above have more information. | ||
</ul> | </ul> | ||
=====Normalized line-end===== | =====Normalized line-end===== | ||
As specified in | As specified in "2.11 End-of-Line Handling" of the <i>W3C XML Recommendation</i>, | ||
all instances of a carriage return character followed by a linefeed character | all instances of a carriage return character followed by a linefeed character (CR-LF sequence), | ||
(CR-LF sequence), | |||
as well as all instances of a carriage return not followed by a linefeed, | as well as all instances of a carriage return not followed by a linefeed, | ||
are converted to a single linefeed character. | are converted to a single linefeed character. | ||
Line 919: | Line 858: | ||
This behavior only applies to deserialization: there is no modification | This behavior only applies to deserialization: there is no modification | ||
of whitespace characters in values passed as the <i><b>value</b></i> | of whitespace characters in values passed as the <i><b>value</b></i> | ||
argument of the XmlDoc API Add* and Insert* methods | argument of the XmlDoc API Add* and Insert* methods that allow a value argument. | ||
that allow a value argument. | Therefore the values of the <code>FOO1</code> and <code>FOO2</code> elements | ||
Therefore the values of the | created by the <var>LoadXml</var> (deserialization) and <var>AddElement</var> invocations below are different: | ||
created by the LoadXml (deserialization) and AddElement invocations below are different: | <p class="code">* Get EBCDIC carriage return and linefeed: | ||
< | %cl = $X2C('0D25') | ||
* This Element value is linefeed: | |||
%node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>') | |||
* This Element value is carriage return and linefeed: | |||
%node:AddElement('FOO2', %cl) | |||
</ | </p> | ||
Also, the normalization applies to the characters in the input | Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution. | ||
serialized string, not the values after entity substitution. | Therefore the values of <code>FOO1</code> and <code>FOO2</code> created by the following two <var>LoadXml</var> invocations are different: | ||
Therefore the values of | <p class="code">* Get EBCDIC carriage return and linefeed: | ||
LoadXml invocations are different: | %cl = $X2C('0D25') | ||
< | |||
* Element value is linefeed: | |||
%doc:LoadXml('<FOO1>' With %cl With '</FOO1>') | |||
%doc = New | |||
* Element value is carriage return and linefeed | |||
* (note, character references are ISO-10646): | |||
%doc:LoadXml('<FOO2>&#x0D;&#x0A;' With '</FOO2>') | |||
</ | </p> | ||
Linefeed characters not removed by the normalization described above | Linefeed characters not removed by the normalization described above | ||
and belonging to the Text node child of an element | and belonging to the Text node child of an element | ||
(but not in any other type of node) can further be affected by the | (but not in any other type of node) can further be affected by the | ||
whitespace-handling options of | whitespace-handling options of <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var> and <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>. | ||
[[LoadXml (XmlDoc/XmlNode function)|LoadXml]] and [[WebReceive (XmlDoc function)|WebReceive]]. | |||
=====Normalized attribute value===== | =====Normalized attribute value===== | ||
After replacing all CR-LF sequences, and all other CR instances, | After replacing all CR-LF sequences, and all other CR instances, by LF (as described in [[#Normalized line-end|Normalized line-end]]), | ||
by LF (as described in [[#Normalized line-end|Normalized line-end]]), | |||
attribute values have additional whitespace normalization. | attribute values have additional whitespace normalization. | ||
As specified in | As specified in "3.3.3 Attribute-Value Normalization" of the <i>W3C XML Recommendation</i>, | ||
after the CR-LF normalization, every instance of a | after the CR-LF normalization, every instance of a whitespace character (tab and linefeed) | ||
whitespace character (tab and linefeed) | |||
in an attribute value is converted to a space character. | in an attribute value is converted to a space character. | ||
Leading and trailing spaces | Leading and trailing spaces are not stripped, nor are sequences of multiple spaces collapsed. | ||
are not stripped, nor are sequences of multiple spaces | |||
collapsed. | |||
This behavior only applies to deserialization; that is, there is no modification | This behavior only applies to deserialization; that is, there is no modification | ||
of whitespace characters in attribute values passed as the < | of whitespace characters in attribute values passed as the <var class="term">value</var> | ||
argument of the [[AddAttribute (XmlNode function)|AddAttribute]] function | argument of the <var>[[AddAttribute (XmlNode function)|AddAttribute]]</var> function. | ||
Therefore the values of the | Therefore the values of the <code>FOO</code> attribute created by the following two methods are different: | ||
methods are different: | <p class="code">* Get EBCDIC carriage return: | ||
< | %c = $X2C('0D') | ||
* Attribute value is space: | |||
%doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>') | |||
* Attribute value is carriage return: | |||
%doc:AddAttribute('FOO', %c, '/*/*') | |||
</ | </p> | ||
Also, the normalization applies to the characters in the input | Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution. | ||
serialized string, not the values after entity substitution. | Therefore the values of the <code>FOO</code> attribute created by the following two <var>LoadXml</var> invocations are different: | ||
Therefore the values of the | <p class="code">* Get EBCDIC carriage return: | ||
LoadXml invocations are different: | %c = $X2C('0D') | ||
< | |||
* Attribute value is space: | |||
%doc:LoadXml('<top FOO="' With %C With '"/>') | |||
%doc = New | |||
* Attribute value is carriage return - note CR | |||
* is the same in EBCDIC and ISO-10646: | |||
%doc:LoadXml('<top FOO="#x0D;"/>') | |||
</ | </p> | ||
'''Note:''' | |||
Whitespace in an attribute (and in any type of node other than | <p class="note">'''Note:''' Whitespace in an attribute (and in any type of node other than | ||
a Text node child of an element) is '''not''' affected by the | a Text node child of an element) is '''not''' affected by the | ||
whitespace-handling options of [[LoadXml (XmlDoc/XmlNode function)|LoadXml]], | whitespace-handling options of <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>, | ||
[[WebReceive (XmlDoc function)|WebReceive]], and [[ParseXml (HttpResponse function)|ParseXml]]. | <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, and <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>. </p> | ||
===Language identification=== | ===Language identification=== | ||
From the <i><b>W3C XML Recommendation</b></i>: | From the <i><b>W3C XML Recommendation</b></i>: | ||
"A special attribute named xml:lang may be inserted in documents to | |||
specify the language used in the contents | specify the language used in the contents | ||
and attribute values of any element in an XML document. | and attribute values of any element in an XML document." | ||
The only valid values of the <code>xml:lang=".."</code> attribute that <var class="product">Janus SOAP</var> accepts are the language identifier tags specified in IETF RFC 3066 (http://www.w3.org/TR/REC-xml/#RFC1766). | |||
==References== | ==References== | ||
As mentioned, the XML support in | As mentioned, the XML support in <var class="product">Janus SOAP</var> is heavily oriented to the concepts and facilities defined by | ||
the XML standards. | the XML standards. | ||
There are two key aspects of XML that application developers should understand at an appropriate level of detail: | There are two key aspects of XML that application developers should understand at an appropriate level of detail: | ||
Line 1,036: | Line 957: | ||
<td>By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML. | <td>By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML. | ||
<p> | <p> | ||
For XML programming using | For XML programming using <var class="product">Janus SOAP</var> or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards. </p></td></tr> | ||
<tr><td>XML background</td> | <tr><td>XML background</td> | ||
<td>http://www.w3.org/XML/1999/XML-in-10-points</td></tr> | <td>http://www.w3.org/XML/1999/XML-in-10-points</td></tr> | ||
Line 1,054: | Line 975: | ||
This section lists some of the XML-related standards documents that are available. | This section lists some of the XML-related standards documents that are available. | ||
The World Wide Web Consortion (or | The World Wide Web Consortion (or "W3C") is the body that creates the XML | ||
standards, along with other Internet standards, such as HTML, XHTML, and HTTP. | standards, along with other Internet standards, such as HTML, XHTML, and HTTP. | ||
The term | The term "Recommendation," in W3C parlance, means that the | ||
standard has been approved by the W3C. | standard has been approved by the W3C. | ||
Line 1,062: | Line 983: | ||
date on which that status was achieved, | date on which that status was achieved, | ||
and the URL that can be used to obtain the document: | and the URL that can be used to obtain the document: | ||
<table> | <table class="thJustBold"> | ||
<tr><th nowrap>Extensible Markup Language (XML) 1.0 (Third Edition) </th> | <tr><th nowrap>Extensible Markup Language (XML) 1.0 (Third Edition) </th> | ||
<td>W3C Recommendation 04 February 2004: <br>http://www.w3.org/TR/REC-xml | <td>W3C Recommendation 04 February 2004: <br>http://www.w3.org/TR/REC-xml | ||
<p> | <p> | ||
This is referred to as the <i | This is referred to as the <i>W3C XML Recommendation</i> throughout this article. </p></td></tr> | ||
<tr><th>Namespaces spec </th> | <tr><th>Namespaces spec </th> | ||
<td>http://www.w3.org/TR/REC-xml-names | <td>http://www.w3.org/TR/REC-xml-names | ||
Line 1,093: | Line 1,014: | ||
[[Category:Overviews]] | [[Category:Overviews]] | ||
[[Category:Janus SOAP]] |
Latest revision as of 19:17, 13 May 2016
Janus SOAP provides SOUL programmers with a substantial set of facilities for processing eXtensible Markup Language (XML) documents. Among other benefits, this enables rich and automated Web services based on a shared and open Web infrastructure. The design of this XML support is based on various standards, such as XML and XPath. Many sections in this article refer to these and other standards, for example, Simple Object Access Protocol (SOAP). However, it is important to recognize:
Janus SOAP enables you to process any XML document, whether or not you are using SOAP messages and envelopes.
XML support is provided in two disjoint sets of classes in Janus SOAP:
- XmlDoc API
- The methods in these classes allow you to convert a character stream XML document into an internal format (an XmlDoc object) or to programmatically create an XmlDoc, to access and modify an XmlDoc, and to convert an XmlDoc into a character stream XML document.
- XmlParser API
- This set of classes provides for event-based extraction of information from an XML document in its character stream form. This can be beneficial when only a relatively small part of the XML document is to be processed.
Standards relevant to Janus SOAP XML facilities
eXtensible Markup Language (XML)
XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can be used for structuring almost any kind of data. Although the word "markup" reveals that the roots of XML are from document processing, and indeed the outermost entity in XML is called a "document," XML is ideally suited to structuring almost any kind of data that is exchanged between or within applications, particularly (although by no means exclusively) if they are communicating on a network.
The syntax of XML provides for hierarchical structuring of data (again, the outer entity is called a document) into the principle type called an element. Elements and the other components of an XML document are described in XML.
One of the reasons that XML is so powerful is that there is no fixed vocabulary for XML documents. Every XML document can have its own set of names (subject to the rules for the characters that may occur in a name). Additionally, no structure is dictated for an XML document, except that it have a single top-level element and other elements must be completely contained within their parent elements. These characteristics allow XML to represent an extremely wide range of types of data very effectively.
An XML document can be considered an abstract object: when XML is used for interchange between applications, it is usually "serialized", or transmitted, completely in character form. The advantage of this is that it is human-readable and can be conveniently viewed using a generic XML editor, both of which can be huge benefits for debugging. Additionally, standard network protocols can be used to exchange documents between a wide variety of applications on a wide variety of platforms. As the World Wide Web has demonstrated, using characters as the basis for information interchange is extremely powerful and flexible.
Beyond these core properties which make XML very attractive for structuring data, it has become the basis for a large family of standards. Often these standards are referred to as the XML "family," in part because they are managed by the XML Working Group of the W3C. Some of these important standards are XML Schema, XML Stylesheet Transformations, XML Query, and Web Services Description Language (WSDL). See http://www.w3c.org for more information about these and other standards related to XML.
Quoting from XML in a Nutshell (2nd ed) (see References):
XML offers the tantalizing possibility of truly cross-platform, long term
data formats. ... XML delivers portable data. In many ways, XML is the most portable ... format designed since the ASCII text file.
You can use XML strictly as an internal data structure in your application, or in Model 204 files, or with operating system files, or with other programs using some communication mechanism. The simple, character-based format of XML enhances such communication. You can communicate with the Web (HTTP), either as a server application (for example, using Janus Web Server) or making client XML requests (for example, using Janus Sockets HTTP Helper). You can use native Model 204 IODEV communication facilities, or Model 204 MQ Series, or any facility that can send and receive streams of characters.
Simple Object Access Protocol (SOAP)
The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports the exchange of structured information between Web-based applications. SOAP employs XML to serialize the objects passed between applications. SOAP can be used in combination with a variety of existing firewall-friendly Internet protocols and formats including HTTP, SMTP, and MIME. SOAP supports a wide range of application paradigms, from messaging systems to Remote Procedure Call (RPC).
SOAP is an excellent standard for information exchange between applications, so good that it is the reason for the name Janus SOAP. It is important to recognize the following, however: Janus SOAP enables you to process any XML document, whether or not you are using SOAP messages and envelopes.
In fact, with the current version, although you can readily process formal SOAP messages, there are no features specially oriented toward that: all features are generalized for handling any kind of XML document. Later versions will add more functionality to incorporate the standard processing of SOAP messages, so your application will only need to deal with the application-specific parts of the messages.
Example SOAP request
This example SOAP message is a request to a SOAP server:
POST /StockQuote HTTP/1.1 Host: www.stockquoteserver.com Content-Type: text/xml; charset="utf-8" Content-Length: nnnn SOAPAction: "Some-URI" <SoapEnv:Envelope xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/" SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SoapEnv:Body> <m:GetLastTradePrice xmlns:m="http://sirius-software.com/samp/JSOAP/1"> <symbol>EMC</symbol> </m:GetLastTradePrice> </SoapEnv:Body> </SoapEnv:Envelope>
Example SOAP response
This example SOAP message could be a response to the above message:
HTTP/1.1 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: nnnn <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/> <SOAP-ENV:Body> <m:GetLastTradePriceResponse xmlns:m="Some-URI"> <Price>34.5</Price> </m:GetLastTradePriceResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
XML Path Language (XPath) in the XmlDoc API
XPath is a language designed specifically to select nodes from an XML document. It is very powerful, yet it is based on familiar syntax that mimics an XML document's hierarchy. XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes on which to operate. It is a key component of XSLT, XPointer, and XLink, and it has a common foundation with XML Query.
An introduction to the use of XPath is provided in An example of XmlDoc methods and XPath; a more complete description of XPath is contained in XPath.
XML
As explained above, XML provides the basis for a large number of varied standards. This section introduces the W3C XML Recommendation, that is, the XML standard. It gives you basic information about XML, explaining some of the concepts using the XmlDoc API (that is, the methods of the XmlDoc, XmlNodelist, and XmlNode classes). This approach gives you concrete examples which you can try in SOUL, and which may make the abstract concepts easier to understand.
The syntax of XML provides for:
- Hierarchical structuring of data (the outer object is called a document) into elements.
An element has a name, which need not be unique within the document. An element can have any number of attributes, each of which has a name (which must be unique within that element — but not within the document) and a value. Within an element can be a series of values and ("sub-") elements, which provides XML with its hierarchical nature.
- Assigning unique identifiers to elements;
this provides even more structuring possibilities than simple hierarchy.
These identifiers are implemented with the element type definition features provided with either Document Type Declarations or with XML Schema. Element type definitions are omitted from our XML documentation; they are not supported in the current version.
An XML document has exactly one outer, or "top-level," element, and this element contains, as descendants, any other elements that may be in the document.
In addition to the data contained in elements and attributes, any number of comments may appear wherever an element may appear. There is also a component called a processing instruction, or PI, which is effectively a comment that has a name.
All names (element names, attribute names, entity references, and PI targets) are case-sensitive; for example, a less-than symbol
(<) can be included in an attribute value if you use the characters
<
— but not if you use <
or ≪
.
The rest of this section explains the syntax of XML and various rules for XML documents, according to the W3C XML Recommendation (as mentioned in References, this includes both the XML specification per se, and the XML Namespaces specification). In (XML syntax, below) and elsewhere as appropriate, you will find comments about limitations imposed by the XmlDoc API on the W3C XML Recommendation.
XML example
The next example illustrates the major components of an XML document. The formatting into separate, indented lines is provided for readability, but it is not significant for this and for most business data exchange applications. The letter labels on the left are not part of the document; they are for the explanation which follows:
X: <?xml version='1.1'?> A: <!-- Purchase order follows --> B: <purchase_order> C: <memo>Dave's order was "late"</memo> D: <?program-version 4.1?> E: <pitm> <partID>1234</partID> F: <price per="12" amt="1.280"/> <qty>36</qty> G: </pitm> H: <pitm> I: <price amt=".29"></price> <partID>5678</partID> <qty>2</qty> </pitm> </purchase_order>
In the following explanation of each of the labeled lines above, references of the form [cnn], like [B22]
,
are to productions in Syntax of document, element, Attribute, Comment, PI below.
X: | <?xml version='1.1'?>
The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which is the set of components preceding the top-level element. If XMLDecl is present it must:
|
---|---|
A: | <!-- Purchase order follows -->
This is a comment at top-level. [A1], [B22], and [D27] allow zero or more comments and PIs before and after the top-level element. |
B: | <purchase_order>
This is the element start-tag or STag ([G40]) of the top-level element ([A1]). |
C: | <memo>Dave's order was "late"</memo>
With "leaf" elements (known in XML Schema as elements with simple content), that is, if the only thing between the STag and Etag is CharData ([P14]), you can usually implement the information either as an element (text) or as an attribute of the parent element. This text example highlights one small distinction, namely that AttValue ([M10]) has less flexibility:
|
D: | <?program-version 4.1?>
This is a PI [V16]. Presumably the name (actually, the target) "program-version" is used by the application reading this document. |
E: | <pitm>
This is the STag of an element which is contained within another element and which contains child elements; this allows you to group elements together. |
F: | <price per="12" amt="1.280"/>
This is an example of the EmptyElemTag ([I44]), which can be useful if an element contains no data (just the name can be meaningful to the application), or if it only contains data using attributes. |
G: | </pitm>
This is the ETag [H42] of an element. The name must exactly match the STag for the element (again, XML is case sensitive). |
H: | <pitm>
Here is another STag of an element; it is the "sibling" of another with the same name. The ability to have sub-elements and the ability to repeat elements with the same name in a given parent element are the important data modeling distinctions between elements and attributes. |
I: | <price amt=".29"></price>
Note that not all instances of a given element type (the price item is an element type) must have the same attributes, nor must they have the same sub-structure. Also, these are optional:
|
XML syntax
This section contains a version of the XML syntax. It is taken from the W3C XML Recommendation, which is the authoritative reference:
http://www.w3.org/TR/REC-xml
The syntax below has been changed from the standard in these ways:
- The only structure in the XML syntax not supported
in the current version is the
Document Type Declaration, or DTD, ("<!DOCTYPE...>").
Although a DTD can be tolerated if you use the DTD_IGNORE option
of the deserialization functions (LoadXml,
WebReceive, and ParseXml)
— the information contained in the DTD is not used nor made available to the SOUL program.
Reflecting the absence of support for DTD, the productions in the syntax that follows are altered to remove those
parts of an XML document introduced in the DTD.
Note: Much of the functionality of document type declarations may be better provided using XML Schema, which is planned for a future version.
- The Char, Name, NameStartChar, and NameChar productions are taken from the XML 1.1 recommendation. As explained in Char and Reference, only characters representable in 8-bit EBCDIC were handled prior to Sirius Mods version 7.6, so fewer characters were supported in the production for Char ([CA2]) in earlier Sirius Mods releases.
- The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version 7.7, the maximum was 100).
- The productions are re-ordered (to make it easier to read the grammar), and letters are added before them,
so when
[B22]
is referred to in the text, you know that this is between [Ann] and [Cnn] in this grammar, and this is production [22] for the same non-terminal (in this case,prolog
) in the W3C XML Recommendation.
The conventions used are:
'yyy' or "yyy" | Enclosed item, yyy, must appear exactly as shown. |
---|---|
#xnn | Specifies the character (in ISO-10646) with code value nn.
For example, |
[^abc] | Specifies any character except a, b, or c. |
[chars] | Specifies any character within the set chars, where chars can be the concatenation of these sets:
|
set1 - set2 ("-" not enclosed in [...]) | The set of strings described by set1, with the set of strings described by set2 removed. |
| | Separates alternatives. |
? | Follows an optional item. |
* | Follows an item that can occur any number of times (even not at all). |
+ | Follows an item that can occur one or more times. |
(abc) (parentheses) | Groups items. |
[rule] ("to the right") | Marks an additional syntax rule. |
/*comment*/ | Marks a comment. |
The syntax is shown in three sections:
- The major components
- The productions that describe individual characters
- The components of the "XML Declaration" (
<?xml version=...?>
)
Syntax of document, element, Attribute, Comment, PI
[A1] document ::= (prolog element Misc*) (Char* RestrictedChar Char*) [B22] prolog ::= XMLDecl? Misc* [C23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [D27] Misc ::= Comment | PI | S [E3] S ::= (#x20 | #x9 /* Whitespace */ | #xD | #xA)+ [F39] element ::= STag content ETag [Element Type Match] | EmptyElemTag [G40] STag ::= '<' Name (S Attribute)* S? '>' [Unique Att] [H42] ETag ::= '</' Name S? '>' [I44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [Unique Att] [NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] [NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] [NA] Name ::= NameStartChar (NameChar)*
Within an XML document, the maximum length of a name (for example, each of the prefix part the the local part of an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7, the maximum length was 100 characters). Element and attribute names are also subject to restrictions related to XML Namespaces; see Name and namespace syntax.
[L41] Attribute ::= Name Eq AttValue [M10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [N25] Eq ::= S? '=' S? [O43] content ::= CharData? ( (element | Reference | CDSect | PI | Comment) CharData? )* [P14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) [Q18] CDSect ::= CDStart CData CDEnd [R19] CDStart ::= '<![CDATA[' [S20] CData ::= (Char* - (Char* ']]>' Char*)) [T21] CDEnd ::= ']]>' [U15] Comment ::= '<!--' ( (Char - '-') | ('-' (Char - '-')) )* '-->' [V16] PI ::= '<?' PITarget (S (Char* (Char* '?>' Char*) ))? '?>' [W17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
Char and Reference
[CA2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] [CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F] [CB67] Reference ::= EntityRef | CharRef [CD68] EntityRef ::= '&' Name ';' [CC66] CharRef ::= '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';' [Legal Char]
ISO-10646 and EBCDIC characters
- Through Sirius Mods version 7.5, XmlDocs were maintained in EBCDIC, and
production
[CA2]
above did not allow the full range of ISO-10646 characters shown in the W3C XML Recommendation. (ISO-10646 is the standard for the universal character set, also known as Unicode.) The XmlDoc API might have rejected an XML document because it contained an ISO-10646 character that could not be represented in EBCDIC. As of Sirius Mods version 7.6, XmlDocs are maintained in Unicode as supported by the Sirius Mods. This is why production[CA2]
shows that no Unicode characters greater thenU+FFFD
are allowed. In addition, deserialization (with default options) of an XML document fails if the document contains a Unicode character that is not translatable to EBCDIC. The AllowUntranslatable option of the deserialization methods lets you circumvent this restriction. The null character (#x0
), normally restricted, is allowed in an XML document if the XmlDoc's AllowNull property is set toTrue
.Note: Using the standard translation table provided with Sirius Mods versions prior to 7.3, many EBCDIC characters (such as
X'FF'
), in addition to the "control characters" that were explicitly prohibited, were not legal XML characters because they did not translate to any Unicode character.In Sirius Mods version 7.3, the standard translation table was modified significantly. For more information about supported characters and character translation issues as of version 7.3, see Support for the ASCII subset of Unicode and Corrected translations between ASCII/Unicode and EBCDIC.
- As stated in Transport: sending and receiving XML, UTF-8, UTF-16, and ISO-8859-x encodings are accepted (note that these must be given in all-capital letters within the XML declaration).
- XPath comparisons are performed using Unicode. As of version 7.3, it is the only type of ordered character comparison. Prior to Sirius Mods version 7.3, this is the default type of comparison performed, and could be controlled by the (now obsolete) XPathOrder property.
Entity references
- One purpose of an EntityRef is to allow a sequence of characters that
may be illegal in a particular context of an XML document.
For example, within an element's content, the string
]]>
is not allowed, so you may replace the greater-than symbol (>) with either its character code in a CharRef, or with the predefined entity>
:]]>
A
Reference
(EntityRef
orCharRef
) is allowed only in an element's content ([O43]
) or inAttValue
([M10]
). - There is a facility for defining your own entities in a DTD, but
since DTDs are not supported in Janus SOAP,
the only entity references supported are the five predefined entities:
& ampersand (&) ' apostrophe (') > greater than (>) < less than (<) " double quotation mark (") [
]left and right square brackets ([ ])
(as of Model 204 7.6)Note: You can use any of the XHTML entities (listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2) to represent Unicode characters when converting from EBCDIC to Unicode. Character decoding must be in effect, however: you must be using the U constant function or the
CharacterDecode=True
argument on the EbcdicToUnicode function.You can load into an XmlDoc a character represented by such an entity if you decode the entity reference before the character is processed by one of the XmlDoc API deserializing or direct storage methods.
Components of XMLDecl
[XA24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') [XB26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+ [XC80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) [XD81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Only Latin chars */ [XE32] SDDecl ::= S 'standalone' Eq ( ("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"') )
Names and namespaces
XML documents are allowed to contain elements and attributes that are defined by one organization, as well as other elements and attributes that are defined by another organization. In order to achieve this organizational "merging," the XML Namespaces Recommendation (http://www.w3.org/TR/REC-xml-names) provides for a way to qualify these merged names so that they will not conflict.
Also, the Namespaces Recommendation provides a way for an application to examine, in effect, the "defining organization" of a name in an XML document, so that various properties can be inferred, and names from the same "organization" can be grouped together.
Conceptually, the Namespaces Recommendation qualifies a name with a Uniform Resource Identifier (URI). There are various rules for various types of URIs; one familiar type is the same as URLs on the World Wide Web, such as:
http://www.w3.org/2001/XMLSchema
The important aspect of a URI, as far as the names in an XML document are concerned, is simply that it is a unique string for the names that are associated with it.
The characters that are valid in a URI (shown in Uniform Resource Identifier syntax) exceed the set of characters that are valid in an XML name. Therefore, the technique employed for XML Namespace qualification is to use a special kind of attribute — one that begins with "xmlns" — to associate a name prefix with a URI. Then attaching a prefix to a name effectively attaches the URI to a name.
The syntax for making this association, the namespace declaration, is explained in the next section.
Name and namespace syntax
The W3C XML Recommendation syntax rule for names is shown in
Syntax of document, element, Attribute, Comment, PI (and
repeated below) as the Name
([NA]
), NameStartChar
([NSC]
), and NameChar
([NC]
) productions.
The XML Namespaces Recommendation provides additional rules for Element and Attribute names (but not for PI targets).
From the Namespaces Recommendation, element and attribute names are both instances of QName
:
[NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] [NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] [NA] Name ::= NameStartChar (NameChar)* [NB5] NCName ::= (NameStartChar - ':') (NameChar - ':')* [NC6] QName ::= (Prefix ':')? LocalPart [ND7] Prefix ::= NCName [NE8] LocalPart ::= NCName
Although the W3C XML Recommendation does not require that attribute and element names follow the XML Namespaces Recommendation, the operation of XPath requires it. Therefore, since XPath is so important for the XmlDoc API, its default operating mode is to require Namespaces conformance in the XML document. See the Namespace property.
The restrictions and changes to the XML Recommendation are as follows:
- The
NameStartChar
andNameChar
productions are taken from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) . Starting with version 7.6 of the Sirius Mods, XmlDocs are maintained in Unicode, as supported by the Sirius Mods. That support excludes characters encoded in more than two bytes, so production[NSC]
, above, shows no Unicode characters greater thanU+FFFD
. By default, deserialization of an XML document fails if the document contains a Unicode character that is not translatable to EBCDIC. The AllowUntranslatable argument of the deserialization methods lets you circumvent this restriction. - A name can have at most one colon (:), which separates the name into a non-null prefix and a non-null local name.
- A name without a prefix is simply a local name.
- The prefix, if any, must be associated with a namespace URI using an attribute of the form:
xmlns:prefix="URI"
For example, all elements (and attributes of those elements) within the content of the
definitions
element below can use the prefix "xsd" to qualify their names to belong to the "http://www.w3.org/2001/XMLSchema" namespace:<definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema"> ... content of definitions element ... </definitions>
- The prefix
xml
is bound to the namespace URIhttp://www.w3.org/XML/1998/namespace
. Neither can be used without the other. - An element can also have a default namespace attribute, which "declares" its namespace, of the form:
xmlns="URI"
- Another form of default namespace declaration allows
an element to disable any default namespace with:
xmlns=""
- A namespace declaration is syntactically the same as an Attribute.
- The scope of a non-default namespace declaration is the element containing it, its attributes, and all descendant elements and their attributes, until another declaration of the prefix.
- The scope of a default namespace declaration is the element containing it (but not the attributes of that element) and its descendant elements (but not their attributes), until the occurrence of another default declaration.
- The namespace URI associated with a name is
- the in-scope URI associated with the prefix of the name, if the name has a prefix
- for element names, the in-scope default namespace URI, if the name does not have a prefix and there is a default namespace URI in scope
- no namespace URI, otherwise
- Two names are identical if they have the same local name and either they both do not have a namespace URI or they both have the same namespace URI.
Uniform Resource Identifier syntax
The form of a valid string used as a URI is specified in IETF RFC2396 (see http://www.faqs.org/rfcs/rfc2396.html). The rules are as follows:
- Namespace URIs must be absolute: they must start with a non-null prefix (called a "scheme"), followed by a colon (:) and a non-null suffix.
- The scheme must start with a letter, which may be followed by any combination of letters, digits, and the plus (+), hyphen (-), and period (.) characters.
- The suffix can contain any of the following characters, in addition to letters and digits:
; (semicolon) - (hyphen) / (slash) _ (underscore) ? (question mark) . (period) : (colon) ! (exclamation point) @ (at sign) ~ (tilde) & (ampersand) * (asterisk) = (equal sign) ' (apostrophe) + (plus sign) ( (open parenthesis) $ (dollar sign) ) (close parenthesis) , (comma)
The suffix can also contain:
- At most one number sign (#).
- A percent (%) character followed by two hex digits
to escape some other character.
In this case:
- The hex digits A-F may be uppercase or lowercase.
- The hexadecimal values are not replaced when URI processing is performed.
For example, even though the ASCII code for the number "4" is hexadecimal 34, the following two URIs are different and distinct:
http://my.URI.number4 http://my.URI.number%34
Thus, for instance, the following fragment:
%n = %d:AddElement('x', , 'http://my.URI.number4') %n:AddElement('x', , 'http://my.URI.number%34') %d:Print %d:SelectionPrefix('f') = 'http://my.URI.number4' Print %d:SelectCount('//f:x') And 'matching node(s)'
Will have the following result:
<x xmlns="http://my.URI.number4"> <x xmlns="http://my.URI.number%34"/> </x> 1 matching node(s)
Well-formed documents and validation
Before an XML document can be processed, its structure must match the rules expressed in the productions in
Syntax of document, element, Attribute, Comment, PI, along with
the extra rules alluded to in square brackets (for example, [Unique Att]
,
indicating that a single attribute name may not be given twice in the list of attributes for an element).
When the syntax is correct, including these rules, the document is called well-formed.
The XmlDoc API enforces the syntax rules of well-formed documents.
In addition to this checking, an XML processor may also check to see that the format of the document matches the structure and restrictions declared for it in either the Document Type Declaration or the document's Schema. If the document matches the type structure and restrictions, it is called valid. In the W3C XML Recommendation, this validation of a document is an optional feature of an XML processor.
With the current version, the XmlDoc API does not validate the XML document. Note that support of XML Schema is planned; Document Type Declarations have several shortcomings, including a limitation on the types of constraints that can be placed on the document, a specialized baroque syntax that doesn't conform to the element/attribute structure of XML, and incorporation of some features that have nothing to do with document validation.
Normalization during deserialization
When an XML processor, in particular the XmlDoc API, parses an XML document from character form into an internal representation, it must make some transformations of the document. The two most significant types of these transformations concern the following:
- Entity and character references
- Whitespace characters
Normalizing entity and character references
Entity and character references are replaced by their entity and character counterparts before deserialization.
For example, the entity reference >
in the content
of an element or in the AttValue
of an Attribute, is handled exactly as if a greater-than symbol
(>) occurred at that point in the document.
Similarly, the character reference [
is handled as if a left
square-bracket symbol ([) occurred at that point in the document.
This normalization occurs after whitespace normalization, which is discussed in the next section.
Normalizing whitespace characters
In the XML syntax, the whitespace characters are (in hexadecimal, using ISO-10646 character codes):
tab | x'09' |
---|---|
linefeed | x'0A' |
carriage return | x'0D' |
space | x'20' |
In general, the whitespace characters can be used in the S
production (shown in
Syntax of document, element, Attribute, Comment, PI),
which must separate many of the tokens in a document (for example, it must follow the element name, if the STag
contains an Attribute) and may optionally be used in many other places (for example, it may appear before or after the equal sign (=)
between an Attribute name and its value.
The interplay of three factors determine the normalization of whitespace characters during deserialization:
- The W3C XML Recommendation specifies two normalizing transformations of whitespace:
- When a special combination of line-end characters — carriage return and linefeed — occur anywhere in an XML document, they are replaced by a single linefeed character. Also, carriage returns not followed by a linefeed are replaced by a single linefeed character.
- When any whitespace character appears in the value of an attribute, it is replaced by a single space character.
The XmlDoc API always applies these transformations, and the following two sub-sections describe them in more detail.
- In addition to the XML standard whitespace transformations, the XmlDoc API deserialization methods offer options to
control normalization of whitespace characters that occur in the
content
of an element. Those options are described in these pages: - The XmlDoc API deserialization (and serialization) methods honor the
xml:space
attribute: After the XML standard whitespace transformations, any whitespace within the scope ofxml:space="preserve"
is retained as is, regardless of the whitespace-handling option in effect for the deserialization method. Elements that are in the scope ofxml:space="default"
have whitespace handled according to the whitespace-handling option in effect for the deserialization. The individual method descriptions cited above have more information.
Normalized line-end
As specified in "2.11 End-of-Line Handling" of the W3C XML Recommendation, all instances of a carriage return character followed by a linefeed character (CR-LF sequence), as well as all instances of a carriage return not followed by a linefeed, are converted to a single linefeed character.
This behavior only applies to deserialization: there is no modification
of whitespace characters in values passed as the value
argument of the XmlDoc API Add* and Insert* methods that allow a value argument.
Therefore the values of the FOO1
and FOO2
elements
created by the LoadXml (deserialization) and AddElement invocations below are different:
* Get EBCDIC carriage return and linefeed: %cl = $X2C('0D25') * This Element value is linefeed: %node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>') * This Element value is carriage return and linefeed: %node:AddElement('FOO2', %cl)
Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution.
Therefore the values of FOO1
and FOO2
created by the following two LoadXml invocations are different:
* Get EBCDIC carriage return and linefeed: %cl = $X2C('0D25') * Element value is linefeed: %doc:LoadXml('<FOO1>' With %cl With '</FOO1>') %doc = New * Element value is carriage return and linefeed * (note, character references are ISO-10646): %doc:LoadXml('<FOO2>
' With '</FOO2>')
Linefeed characters not removed by the normalization described above and belonging to the Text node child of an element (but not in any other type of node) can further be affected by the whitespace-handling options of LoadXml and WebReceive.
Normalized attribute value
After replacing all CR-LF sequences, and all other CR instances, by LF (as described in Normalized line-end), attribute values have additional whitespace normalization. As specified in "3.3.3 Attribute-Value Normalization" of the W3C XML Recommendation, after the CR-LF normalization, every instance of a whitespace character (tab and linefeed) in an attribute value is converted to a space character. Leading and trailing spaces are not stripped, nor are sequences of multiple spaces collapsed.
This behavior only applies to deserialization; that is, there is no modification
of whitespace characters in attribute values passed as the value
argument of the AddAttribute function.
Therefore the values of the FOO
attribute created by the following two methods are different:
* Get EBCDIC carriage return: %c = $X2C('0D') * Attribute value is space: %doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>') * Attribute value is carriage return: %doc:AddAttribute('FOO', %c, '/*/*')
Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution.
Therefore the values of the FOO
attribute created by the following two LoadXml invocations are different:
* Get EBCDIC carriage return: %c = $X2C('0D') * Attribute value is space: %doc:LoadXml('<top FOO="' With %C With '"/>') %doc = New * Attribute value is carriage return - note CR * is the same in EBCDIC and ISO-10646: %doc:LoadXml('<top FOO="#x0D;"/>')
Note: Whitespace in an attribute (and in any type of node other than a Text node child of an element) is not affected by the whitespace-handling options of LoadXml, WebReceive, and ParseXml.
Language identification
From the W3C XML Recommendation: "A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document."
The only valid values of the xml:lang=".."
attribute that Janus SOAP accepts are the language identifier tags specified in IETF RFC 3066 (http://www.w3.org/TR/REC-xml/#RFC1766).
References
As mentioned, the XML support in Janus SOAP is heavily oriented to the concepts and facilities defined by the XML standards. There are two key aspects of XML that application developers should understand at an appropriate level of detail:
- The syntax, structure, and nomenclature of an XML document.
- For the XmlDoc API, the syntax, nomenclature, and meaning of an XPath expression.
In addition to, and as a subset of, those standards, the following shorter list of references should be useful in understanding the above key aspects:
http://en.wikipedia.org/wiki/XML | The Wikipedia entry for XML. |
XML in a Nutshell: A Desktop Quick Reference (2nd edition) | By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML.
For XML programming using Janus SOAP or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards. |
XML background | http://www.w3.org/XML/1999/XML-in-10-points |
http://en.wikipedia.org/wiki/XML_namespace | The Wikipedia entry for XML namespace. |
http://en.wikipedia.org/wiki/Xpath | The Wikipedia entry for XPath. |
http://msdn.microsoft.com/en-us/magazine/cc302158.aspx | Microsoft's .NET Framework XML classes. |
http://oreilly.com/catalog/9780596003975 | .NET and XML, by Niel M. Bornstein, published 2004 by O'Reilly & Associates. |
W3C standards
As discussed earlier in this manual, SOAP (Simple Object Access Protocol) is an Internet standard. This section lists some of the XML-related standards documents that are available.
The World Wide Web Consortion (or "W3C") is the body that creates the XML standards, along with other Internet standards, such as HTML, XHTML, and HTTP. The term "Recommendation," in W3C parlance, means that the standard has been approved by the W3C.
Each document is shown with its title, the status of the standard and the date on which that status was achieved, and the URL that can be used to obtain the document:
Extensible Markup Language (XML) 1.0 (Third Edition) | W3C Recommendation 04 February 2004: http://www.w3.org/TR/REC-xml This is referred to as the W3C XML Recommendation throughout this article. |
---|---|
Namespaces spec | http://www.w3.org/TR/REC-xml-names
This further constrains the form of element and attribute names in an XML document, and it provides a means for qualifying names so that different parts of a document can use different vocabularies. |
XPath spec | http://www.w3.org/TR/xpath
It is recommended that you start with section 5, “Data Model.” |
XML Information Set | W3C Recommendation 4 February 2004: http://www.w3.org/TR/xml-infoset |
XML Schema | W3C Recommendation, 2 May 2001
|
SOAP Version 1.2 | W3C Recommendation 24 June 2003
|
The above documents are among the rich set of documents available from the World Wide Web Consortium. To browse for their complete public set of publications and useful links, go to: