XML processing in Janus SOAP: Difference between revisions

Latest revision as of 19:17, 13 May 2016

Janus SOAP provides SOUL programmers with a substantial set of facilities for processing eXtensible Markup Language (XML) documents. Among other benefits, this enables rich and automated Web services based on a shared and open Web infrastructure. The design of this XML support is based on various standards, such as XML and XPath. Many sections in this article refer to these and other standards, for example, Simple Object Access Protocol (SOAP). However, it is important to recognize:

Janus SOAP enables you to process any XML document, whether or not you are using SOAP messages and envelopes.

XML support is provided in two disjoint sets of classes in Janus SOAP:

XmlDoc API: The methods in these classes allow you to convert a character stream XML document into an internal format (an XmlDoc object) or to programmatically create an XmlDoc, to access and modify an XmlDoc, and to convert an XmlDoc into a character stream XML document.
XmlParser API: This set of classes provides for event-based extraction of information from an XML document in its character stream form. This can be beneficial when only a relatively small part of the XML document is to be processed.

Standards relevant to Janus SOAP XML facilities

eXtensible Markup Language (XML)

XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can be used for structuring almost any kind of data. Although the word "markup" reveals that the roots of XML are from document processing, and indeed the outermost entity in XML is called a "document," XML is ideally suited to structuring almost any kind of data that is exchanged between or within applications, particularly (although by no means exclusively) if they are communicating on a network.

The syntax of XML provides for hierarchical structuring of data (again, the outer entity is called a document) into the principle type called an element. Elements and the other components of an XML document are described in XML.

One of the reasons that XML is so powerful is that there is no fixed vocabulary for XML documents. Every XML document can have its own set of names (subject to the rules for the characters that may occur in a name). Additionally, no structure is dictated for an XML document, except that it have a single top-level element and other elements must be completely contained within their parent elements. These characteristics allow XML to represent an extremely wide range of types of data very effectively.

An XML document can be considered an abstract object: when XML is used for interchange between applications, it is usually "serialized", or transmitted, completely in character form. The advantage of this is that it is human-readable and can be conveniently viewed using a generic XML editor, both of which can be huge benefits for debugging. Additionally, standard network protocols can be used to exchange documents between a wide variety of applications on a wide variety of platforms. As the World Wide Web has demonstrated, using characters as the basis for information interchange is extremely powerful and flexible.

Beyond these core properties which make XML very attractive for structuring data, it has become the basis for a large family of standards. Often these standards are referred to as the XML "family," in part because they are managed by the XML Working Group of the W3C. Some of these important standards are XML Schema, XML Stylesheet Transformations, XML Query, and Web Services Description Language (WSDL). See http://www.w3c.org for more information about these and other standards related to XML.

Quoting from XML in a Nutshell (2nd ed) (see References):

XML offers the tantalizing possibility of truly cross-platform, long term
data formats. ... XML delivers portable data. In many ways, XML is the most portable ... format designed since the ASCII text file.

You can use XML strictly as an internal data structure in your application, or in Model 204 files, or with operating system files, or with other programs using some communication mechanism. The simple, character-based format of XML enhances such communication. You can communicate with the Web (HTTP), either as a server application (for example, using Janus Web Server) or making client XML requests (for example, using Janus Sockets HTTP Helper). You can use native Model 204 IODEV communication facilities, or Model 204 MQ Series, or any facility that can send and receive streams of characters.

Simple Object Access Protocol (SOAP)

The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports the exchange of structured information between Web-based applications. SOAP employs XML to serialize the objects passed between applications. SOAP can be used in combination with a variety of existing firewall-friendly Internet protocols and formats including HTTP, SMTP, and MIME. SOAP supports a wide range of application paradigms, from messaging systems to Remote Procedure Call (RPC).

SOAP is an excellent standard for information exchange between applications, so good that it is the reason for the name Janus SOAP. It is important to recognize the following, however: Janus SOAP enables you to process any XML document, whether or not you are using SOAP messages and envelopes.

In fact, with the current version, although you can readily process formal SOAP messages, there are no features specially oriented toward that: all features are generalized for handling any kind of XML document. Later versions will add more functionality to incorporate the standard processing of SOAP messages, so your application will only need to deal with the application-specific parts of the messages.

Example SOAP request

This example SOAP message is a request to a SOAP server:

POST /StockQuote HTTP/1.1 Host: www.stockquoteserver.com Content-Type: text/xml; charset="utf-8" Content-Length: nnnn SOAPAction: "Some-URI" <SoapEnv:Envelope xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/" SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SoapEnv:Body> <m:GetLastTradePrice xmlns:m="http://sirius-software.com/samp/JSOAP/1"> <symbol>EMC</symbol> </m:GetLastTradePrice> </SoapEnv:Body> </SoapEnv:Envelope>

Example SOAP response

This example SOAP message could be a response to the above message:

HTTP/1.1 200 OK Content-Type: text/xml; charset="utf-8" Content-Length: nnnn <SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/> <SOAP-ENV:Body> <m:GetLastTradePriceResponse xmlns:m="Some-URI"> <Price>34.5</Price> </m:GetLastTradePriceResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

XML Path Language (XPath) in the XmlDoc API

XPath is a language designed specifically to select nodes from an XML document. It is very powerful, yet it is based on familiar syntax that mimics an XML document's hierarchy. XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes on which to operate. It is a key component of XSLT, XPointer, and XLink, and it has a common foundation with XML Query.

An introduction to the use of XPath is provided in An example of XmlDoc methods and XPath; a more complete description of XPath is contained in XPath.

XML

As explained above, XML provides the basis for a large number of varied standards. This section introduces the W3C XML Recommendation, that is, the XML standard. It gives you basic information about XML, explaining some of the concepts using the XmlDoc API (that is, the methods of the XmlDoc, XmlNodelist, and XmlNode classes). This approach gives you concrete examples which you can try in SOUL, and which may make the abstract concepts easier to understand.

The syntax of XML provides for:

Hierarchical structuring of data (the outer object is called a document) into elements.
An element has a name, which need not be unique within the document. An element can have any number of attributes, each of which has a name (which must be unique within that element — but not within the document) and a value. Within an element can be a series of values and ("sub-") elements, which provides XML with its hierarchical nature.
Assigning unique identifiers to elements; this provides even more structuring possibilities than simple hierarchy.
These identifiers are implemented with the element type definition features provided with either Document Type Declarations or with XML Schema. Element type definitions are omitted from our XML documentation; they are not supported in the current version.

An XML document has exactly one outer, or "top-level," element, and this element contains, as descendants, any other elements that may be in the document.

In addition to the data contained in elements and attributes, any number of comments may appear wherever an element may appear. There is also a component called a processing instruction, or PI, which is effectively a comment that has a name.

All names (element names, attribute names, entity references, and PI targets) are case-sensitive; for example, a less-than symbol (<) can be included in an attribute value if you use the characters < — but not if you use &LT; or &Lt;.

The rest of this section explains the syntax of XML and various rules for XML documents, according to the W3C XML Recommendation (as mentioned in References, this includes both the XML specification per se, and the XML Namespaces specification). In (XML syntax, below) and elsewhere as appropriate, you will find comments about limitations imposed by the XmlDoc API on the W3C XML Recommendation.

XML example

The next example illustrates the major components of an XML document. The formatting into separate, indented lines is provided for readability, but it is not significant for this and for most business data exchange applications. The letter labels on the left are not part of the document; they are for the explanation which follows:

X: <?xml version='1.1'?> A:  B: <purchase_order> C: <memo>Dave's order was "late"</memo> D: <?program-version 4.1?> E: <pitm> <partID>1234</partID> F: <price per="12" amt="1.280"/> <qty>36</qty> G: </pitm> H: <pitm> I: <price amt=".29"></price> <partID>5678</partID> <qty>2</qty> </pitm> </purchase_order>

In the following explanation of each of the labeled lines above, references of the form [cnn], like [B22], are to productions in Syntax of document, element, Attribute, Comment, PI below.

X:	`<?xml version='1.1'?>` The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which is the set of components preceding the top-level element. If XMLDecl is present it must: Be the first markup in the document (only whitespace may precede it). Specify at least the version (as of version 7.5 of the `Sirius Mods`, "1.0" and "1.1" are the only valid versions). The clauses in XMLDecl are positional, that is, they must be given in the order shown in the syntax.
A:	`<!-- Purchase order follows -->` This is a comment at top-level. [A1], [B22], and [D27] allow zero or more comments and PIs before and after the top-level element.
B:	`<purchase_order>` This is the element start-tag or STag ([G40]) of the top-level element ([A1]).
C:	`<memo>Dave's order was "late"</memo>` With "leaf" elements (known in XML Schema as elements with simple content), that is, if the only thing between the STag and Etag is CharData ([P14]), you can usually implement the information either as an element (text) or as an attribute of the parent element. This text example highlights one small distinction, namely that AttValue ([M10]) has less flexibility: If the value includes both apostrophes and quotation marks, either the apostrophes or the quotes must be escaped. CharData not only allows quotes and apostrophes, but it also allows CDSect [Q18].
D:	`<?program-version 4.1?>` This is a PI [V16]. Presumably the name (actually, the target) "program-version" is used by the application reading this document.
E:	`<pitm>` This is the STag of an element which is contained within another element and which contains child elements; this allows you to group elements together.
F:	`<price per="12" amt="1.280"/>` This is an example of the EmptyElemTag ([I44]), which can be useful if an element contains no data (just the name can be meaningful to the application), or if it only contains data using attributes.
G:	`</pitm>` This is the ETag [H42] of an element. The name must exactly match the STag for the element (again, XML is case sensitive).
H:	`<pitm>` Here is another STag of an element; it is the "sibling" of another with the same name. The ability to have sub-elements and the ability to repeat elements with the same name in a given parent element are the important data modeling distinctions between elements and attributes.
I:	`<price amt=".29"></price>` Note that not all instances of a given element type (the price item is an element type) must have the same attributes, nor must they have the same sub-structure. Also, these are optional: Whether an element has content. Whether to use an STag immediately followed by an ETag (as is done here) or to use the EmptyElemTag (as is done above in item F).

XML syntax

This section contains a version of the XML syntax. It is taken from the W3C XML Recommendation, which is the authoritative reference:

http://www.w3.org/TR/REC-xml

The syntax below has been changed from the standard in these ways:

The only structure in the XML syntax not supported in the current version is the Document Type Declaration, or DTD, ("<!DOCTYPE...>"). Although a DTD can be tolerated if you use the DTD_IGNORE option of the deserialization functions (LoadXml, WebReceive, and ParseXml) — the information contained in the DTD is not used nor made available to the SOUL program. Reflecting the absence of support for DTD, the productions in the syntax that follows are altered to remove those parts of an XML document introduced in the DTD.
Note: Much of the functionality of document type declarations may be better provided using XML Schema, which is planned for a future version.
The Char, Name, NameStartChar, and NameChar productions are taken from the XML 1.1 recommendation. As explained in Char and Reference, only characters representable in 8-bit EBCDIC were handled prior to Sirius Mods version 7.6, so fewer characters were supported in the production for Char ([CA2]) in earlier Sirius Mods releases.
The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version 7.7, the maximum was 100).
The productions are re-ordered (to make it easier to read the grammar), and letters are added before them, so when [B22] is referred to in the text, you know that this is between [Ann] and [Cnn] in this grammar, and this is production [22] for the same non-terminal (in this case, prolog) in the W3C XML Recommendation.

The conventions used are:

'yyy' or "yyy"	Enclosed item, yyy, must appear exactly as shown.
#xnn	Specifies the character (in ISO-10646) with code value nn. For example, `#x09 #x0D #x0A #x20` specify the tab, carriage return, linefeed, and space characters, respectively.
[^abc]	Specifies any character except a, b, or c.
[chars]	Specifies any character within the set chars, where chars can be the concatenation of these sets: y, meaning the single character y y-z, meaning characters in the range from y to z, inclusive The resulting set of chars is the union of the specified sets.
set1 - set2 ("-" not enclosed in [...])	The set of strings described by set1, with the set of strings described by set2 removed.
\|	Separates alternatives.
?	Follows an optional item.
*	Follows an item that can occur any number of times (even not at all).
+	Follows an item that can occur one or more times.
(abc) (parentheses)	Groups items.
[rule] ("to the right")	Marks an additional syntax rule.
/comment/	Marks a comment.

The syntax is shown in three sections:

The major components
The productions that describe individual characters
The components of the "XML Declaration" (<?xml version=...?>)

Syntax of document, element, Attribute, Comment, PI

[A1] document ::= (prolog element Misc*) (Char* RestrictedChar Char*) [B22] prolog ::= XMLDecl? Misc* [C23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' [D27] Misc ::= Comment | PI | S [E3] S ::= (#x20 | #x9 /* Whitespace */ | #xD | #xA)+ [F39] element ::= STag content ETag [Element Type Match] | EmptyElemTag [G40] STag ::= '<' Name (S Attribute)* S? '>' [Unique Att] [H42] ETag ::= '</' Name S? '>' [I44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [Unique Att] [NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] [NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] [NA] Name ::= NameStartChar (NameChar)*

Within an XML document, the maximum length of a name (for example, each of the prefix part the the local part of an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7, the maximum length was 100 characters). Element and attribute names are also subject to restrictions related to XML Namespaces; see Name and namespace syntax.

[L41] Attribute ::= Name Eq AttValue [M10] AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" [N25] Eq ::= S? '=' S? [O43] content ::= CharData? ( (element | Reference | CDSect | PI | Comment) CharData? )* [P14] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*) [Q18] CDSect ::= CDStart CData CDEnd [R19] CDStart ::= '<![CDATA[' [S20] CData ::= (Char* - (Char* ']]>' Char*)) [T21] CDEnd ::= ']]>' [U15] Comment ::= '' [V16] PI ::= '<?' PITarget (S (Char* (Char* '?>' Char*) ))? '?>' [W17] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))

Char and Reference

ISO-10646 and EBCDIC characters

Through Sirius Mods version 7.5, XmlDocs were maintained in EBCDIC, and production [CA2] above did not allow the full range of ISO-10646 characters shown in the W3C XML Recommendation. (ISO-10646 is the standard for the universal character set, also known as Unicode.) The XmlDoc API might have rejected an XML document because it contained an ISO-10646 character that could not be represented in EBCDIC. As of Sirius Mods version 7.6, XmlDocs are maintained in Unicode as supported by the Sirius Mods. This is why production [CA2] shows that no Unicode characters greater then U+FFFD are allowed. In addition, deserialization (with default options) of an XML document fails if the document contains a Unicode character that is not translatable to EBCDIC. The AllowUntranslatable option of the deserialization methods lets you circumvent this restriction. The null character (#x0), normally restricted, is allowed in an XML document if the XmlDoc's AllowNull property is set to True.

Note: Using the standard translation table provided with Sirius Mods versions prior to 7.3, many EBCDIC characters (such as X'FF'), in addition to the "control characters" that were explicitly prohibited, were not legal XML characters because they did not translate to any Unicode character.

In Sirius Mods version 7.3, the standard translation table was modified significantly. For more information about supported characters and character translation issues as of version 7.3, see Support for the ASCII subset of Unicode and Corrected translations between ASCII/Unicode and EBCDIC.
As stated in Transport: sending and receiving XML, UTF-8, UTF-16, and ISO-8859-x encodings are accepted (note that these must be given in all-capital letters within the XML declaration).
XPath comparisons are performed using Unicode. As of version 7.3, it is the only type of ordered character comparison. Prior to Sirius Mods version 7.3, this is the default type of comparison performed, and could be controlled by the (now obsolete) XPathOrder property.

Entity references

One purpose of an EntityRef is to allow a sequence of characters that may be illegal in a particular context of an XML document. For example, within an element's content, the string ]]> is not allowed, so you may replace the greater-than symbol (>) with either its character code in a CharRef, or with the predefined entity >:
]]>

A Reference (EntityRef or CharRef) is allowed only in an element's content ([O43]) or in AttValue ([M10]).
There is a facility for defining your own entities in a DTD, but since DTDs are not supported in Janus SOAP, the only entity references supported are the five predefined entities:

& ampersand (&)

' apostrophe (')

> greater than (>)

< less than (<)

" double quotation mark (")

[
] left and right square brackets ([ ])
(as of Model 204 7.6)

Note: You can use any of the XHTML entities (listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2) to represent Unicode characters when converting from EBCDIC to Unicode. Character decoding must be in effect, however: you must be using the U constant function or the CharacterDecode=True argument on the EbcdicToUnicode function.
You can load into an XmlDoc a character represented by such an entity if you decode the entity reference before the character is processed by one of the XmlDoc API deserializing or direct storage methods.

Components of XMLDecl

[XA24] VersionInfo ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"') [XB26] VersionNum ::= ([a-zA-Z0-9_.:] | '-')+ [XC80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" ) [XD81] EncName ::= [A-Za-z] ([A-Za-z0-9._] | '-')* /* Only Latin chars */ [XE32] SDDecl ::= S 'standalone' Eq ( ("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"') )

Names and namespaces

XML documents are allowed to contain elements and attributes that are defined by one organization, as well as other elements and attributes that are defined by another organization. In order to achieve this organizational "merging," the XML Namespaces Recommendation (http://www.w3.org/TR/REC-xml-names) provides for a way to qualify these merged names so that they will not conflict.

Also, the Namespaces Recommendation provides a way for an application to examine, in effect, the "defining organization" of a name in an XML document, so that various properties can be inferred, and names from the same "organization" can be grouped together.

Conceptually, the Namespaces Recommendation qualifies a name with a Uniform Resource Identifier (URI). There are various rules for various types of URIs; one familiar type is the same as URLs on the World Wide Web, such as:

http://www.w3.org/2001/XMLSchema

The important aspect of a URI, as far as the names in an XML document are concerned, is simply that it is a unique string for the names that are associated with it.

The characters that are valid in a URI (shown in Uniform Resource Identifier syntax) exceed the set of characters that are valid in an XML name. Therefore, the technique employed for XML Namespace qualification is to use a special kind of attribute — one that begins with "xmlns" — to associate a name prefix with a URI. Then attaching a prefix to a name effectively attaches the URI to a name.

The syntax for making this association, the namespace declaration, is explained in the next section.

Name and namespace syntax

The W3C XML Recommendation syntax rule for names is shown in Syntax of document, element, Attribute, Comment, PI (and repeated below) as the Name ([NA]), NameStartChar ([NSC]), and NameChar ([NC]) productions. The XML Namespaces Recommendation provides additional rules for Element and Attribute names (but not for PI targets). From the Namespaces Recommendation, element and attribute names are both instances of QName:

[NSC] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] [NC] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040] [NA] Name ::= NameStartChar (NameChar)* [NB5] NCName ::= (NameStartChar - ':') (NameChar - ':')* [NC6] QName ::= (Prefix ':')? LocalPart [ND7] Prefix ::= NCName [NE8] LocalPart ::= NCName

Although the W3C XML Recommendation does not require that attribute and element names follow the XML Namespaces Recommendation, the operation of XPath requires it. Therefore, since XPath is so important for the XmlDoc API, its default operating mode is to require Namespaces conformance in the XML document. See the Namespace property.

The restrictions and changes to the XML Recommendation are as follows:

The NameStartChar and NameChar productions are taken from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) . Starting with version 7.6 of the Sirius Mods, XmlDocs are maintained in Unicode, as supported by the Sirius Mods. That support excludes characters encoded in more than two bytes, so production [NSC], above, shows no Unicode characters greater than U+FFFD. By default, deserialization of an XML document fails if the document contains a Unicode character that is not translatable to EBCDIC. The AllowUntranslatable argument of the deserialization methods lets you circumvent this restriction.
A name can have at most one colon (:), which separates the name into a non-null prefix and a non-null local name.
A name without a prefix is simply a local name.
The prefix, if any, must be associated with a namespace URI using an attribute of the form:
xmlns:prefix="URI"

For example, all elements (and attributes of those elements) within the content of the definitions element below can use the prefix "xsd" to qualify their names to belong to the "http://www.w3.org/2001/XMLSchema" namespace:

<definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema"> ... content of definitions element ... </definitions>
The prefix xml is bound to the namespace URI http://www.w3.org/XML/1998/namespace. Neither can be used without the other.
An element can also have a default namespace attribute, which "declares" its namespace, of the form:
xmlns="URI"
Another form of default namespace declaration allows an element to disable any default namespace with:
xmlns=""
A namespace declaration is syntactically the same as an Attribute.
The scope of a non-default namespace declaration is the element containing it, its attributes, and all descendant elements and their attributes, until another declaration of the prefix.
The scope of a default namespace declaration is the element containing it (but not the attributes of that element) and its descendant elements (but not their attributes), until the occurrence of another default declaration.
The namespace URI associated with a name is
1. the in-scope URI associated with the prefix of the name, if the name has a prefix
2. for element names, the in-scope default namespace URI, if the name does not have a prefix and there is a default namespace URI in scope
3. no namespace URI, otherwise
Two names are identical if they have the same local name and either they both do not have a namespace URI or they both have the same namespace URI.

Uniform Resource Identifier syntax

The form of a valid string used as a URI is specified in IETF RFC2396 (see http://www.faqs.org/rfcs/rfc2396.html). The rules are as follows:

Namespace URIs must be absolute: they must start with a non-null prefix (called a "scheme"), followed by a colon (:) and a non-null suffix.
The scheme must start with a letter, which may be followed by any combination of letters, digits, and the plus (+), hyphen (-), and period (.) characters.
The suffix can contain any of the following characters, in addition to letters and digits:
; (semicolon) - (hyphen) / (slash) _ (underscore) ? (question mark) . (period) : (colon) ! (exclamation point) @ (at sign) ~ (tilde) & (ampersand) * (asterisk) = (equal sign) ' (apostrophe) + (plus sign) ( (open parenthesis) $ (dollar sign) ) (close parenthesis) , (comma)

The suffix can also contain:
- At most one number sign (#).
- A percent (%) character followed by two hex digits to escape some other character. In this case:
  - The hex digits A-F may be uppercase or lowercase.
  - The hexadecimal values are not replaced when URI processing is performed.
    For example, even though the ASCII code for the number "4" is hexadecimal 34, the following two URIs are different and distinct:
    
    http://my.URI.number4 http://my.URI.number%34
    
    Thus, for instance, the following fragment:
    
    %n = %d:AddElement('x', , 'http://my.URI.number4') %n:AddElement('x', , 'http://my.URI.number%34') %d:Print %d:SelectionPrefix('f') = 'http://my.URI.number4' Print %d:SelectCount('//f:x') And 'matching node(s)'
    
    Will have the following result:
    
    <x xmlns="http://my.URI.number4"> <x xmlns="http://my.URI.number%34"/> </x> 1 matching node(s)

Well-formed documents and validation

Before an XML document can be processed, its structure must match the rules expressed in the productions in Syntax of document, element, Attribute, Comment, PI, along with the extra rules alluded to in square brackets (for example, [Unique Att], indicating that a single attribute name may not be given twice in the list of attributes for an element). When the syntax is correct, including these rules, the document is called well-formed.

The XmlDoc API enforces the syntax rules of well-formed documents.

In addition to this checking, an XML processor may also check to see that the format of the document matches the structure and restrictions declared for it in either the Document Type Declaration or the document's Schema. If the document matches the type structure and restrictions, it is called valid. In the W3C XML Recommendation, this validation of a document is an optional feature of an XML processor.

With the current version, the XmlDoc API does not validate the XML document. Note that support of XML Schema is planned; Document Type Declarations have several shortcomings, including a limitation on the types of constraints that can be placed on the document, a specialized baroque syntax that doesn't conform to the element/attribute structure of XML, and incorporation of some features that have nothing to do with document validation.

Normalization during deserialization

When an XML processor, in particular the XmlDoc API, parses an XML document from character form into an internal representation, it must make some transformations of the document. The two most significant types of these transformations concern the following:

Entity and character references
Whitespace characters

Normalizing entity and character references

Entity and character references are replaced by their entity and character counterparts before deserialization. For example, the entity reference > in the content of an element or in the AttValue of an Attribute, is handled exactly as if a greater-than symbol (>) occurred at that point in the document. Similarly, the character reference [ is handled as if a left square-bracket symbol ([) occurred at that point in the document.

This normalization occurs after whitespace normalization, which is discussed in the next section.

Normalizing whitespace characters

In the XML syntax, the whitespace characters are (in hexadecimal, using ISO-10646 character codes):

tab	x'09'
linefeed	x'0A'
carriage return	x'0D'
space	x'20'

In general, the whitespace characters can be used in the S production (shown in Syntax of document, element, Attribute, Comment, PI), which must separate many of the tokens in a document (for example, it must follow the element name, if the STag contains an Attribute) and may optionally be used in many other places (for example, it may appear before or after the equal sign (=) between an Attribute name and its value.

The interplay of three factors determine the normalization of whitespace characters during deserialization:

The W3C XML Recommendation specifies two normalizing transformations of whitespace:
1. When a special combination of line-end characters — carriage return and linefeed — occur anywhere in an XML document, they are replaced by a single linefeed character. Also, carriage returns not followed by a linefeed are replaced by a single linefeed character.
2. When any whitespace character appears in the value of an attribute, it is replaced by a single space character.
The XmlDoc API always applies these transformations, and the following two sub-sections describe them in more detail.
In addition to the XML standard whitespace transformations, the XmlDoc API deserialization methods offer options to control normalization of whitespace characters that occur in the content of an element. Those options are described in these pages:
- LoadXml (XmlDoc/XmlNode function)
- WebReceive (XmlDoc function)
The XmlDoc API deserialization (and serialization) methods honor the xml:space attribute: After the XML standard whitespace transformations, any whitespace within the scope of xml:space="preserve" is retained as is, regardless of the whitespace-handling option in effect for the deserialization method. Elements that are in the scope of xml:space="default" have whitespace handled according to the whitespace-handling option in effect for the deserialization. The individual method descriptions cited above have more information.

Normalized line-end

As specified in "2.11 End-of-Line Handling" of the W3C XML Recommendation, all instances of a carriage return character followed by a linefeed character (CR-LF sequence), as well as all instances of a carriage return not followed by a linefeed, are converted to a single linefeed character.

This behavior only applies to deserialization: there is no modification of whitespace characters in values passed as the value argument of the XmlDoc API Add* and Insert* methods that allow a value argument. Therefore the values of the FOO1 and FOO2 elements created by the LoadXml (deserialization) and AddElement invocations below are different:

* Get EBCDIC carriage return and linefeed: %cl = $X2C('0D25') * This Element value is linefeed: %node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>') * This Element value is carriage return and linefeed: %node:AddElement('FOO2', %cl)

Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution. Therefore the values of FOO1 and FOO2 created by the following two LoadXml invocations are different:

* Get EBCDIC carriage return and linefeed: %cl = $X2C('0D25') * Element value is linefeed: %doc:LoadXml('<FOO1>' With %cl With '</FOO1>') %doc = New * Element value is carriage return and linefeed * (note, character references are ISO-10646): %doc:LoadXml('<FOO2>
' With '</FOO2>')

Linefeed characters not removed by the normalization described above and belonging to the Text node child of an element (but not in any other type of node) can further be affected by the whitespace-handling options of LoadXml and WebReceive.

Normalized attribute value

After replacing all CR-LF sequences, and all other CR instances, by LF (as described in Normalized line-end), attribute values have additional whitespace normalization. As specified in "3.3.3 Attribute-Value Normalization" of the W3C XML Recommendation, after the CR-LF normalization, every instance of a whitespace character (tab and linefeed) in an attribute value is converted to a space character. Leading and trailing spaces are not stripped, nor are sequences of multiple spaces collapsed.

This behavior only applies to deserialization; that is, there is no modification of whitespace characters in attribute values passed as the value argument of the AddAttribute function. Therefore the values of the FOO attribute created by the following two methods are different:

* Get EBCDIC carriage return: %c = $X2C('0D') * Attribute value is space: %doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>') * Attribute value is carriage return: %doc:AddAttribute('FOO', %c, '/*/*')

Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution. Therefore the values of the FOO attribute created by the following two LoadXml invocations are different:

* Get EBCDIC carriage return: %c = $X2C('0D') * Attribute value is space: %doc:LoadXml('<top FOO="' With %C With '"/>') %doc = New * Attribute value is carriage return - note CR * is the same in EBCDIC and ISO-10646: %doc:LoadXml('<top FOO="#x0D;"/>')

Note: Whitespace in an attribute (and in any type of node other than a Text node child of an element) is not affected by the whitespace-handling options of LoadXml, WebReceive, and ParseXml.

Language identification

From the W3C XML Recommendation: "A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document."

The only valid values of the xml:lang=".." attribute that Janus SOAP accepts are the language identifier tags specified in IETF RFC 3066 (http://www.w3.org/TR/REC-xml/#RFC1766).

References

As mentioned, the XML support in Janus SOAP is heavily oriented to the concepts and facilities defined by the XML standards. There are two key aspects of XML that application developers should understand at an appropriate level of detail:

The syntax, structure, and nomenclature of an XML document.
For the XmlDoc API, the syntax, nomenclature, and meaning of an XPath expression.

In addition to, and as a subset of, those standards, the following shorter list of references should be useful in understanding the above key aspects:

http://en.wikipedia.org/wiki/XML	The Wikipedia entry for XML.
XML in a Nutshell: A Desktop Quick Reference (2nd edition)	By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML. For XML programming using `Janus SOAP` or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards.
XML background	http://www.w3.org/XML/1999/XML-in-10-points
http://en.wikipedia.org/wiki/XML_namespace	The Wikipedia entry for XML namespace.
http://en.wikipedia.org/wiki/Xpath	The Wikipedia entry for XPath.
http://msdn.microsoft.com/en-us/magazine/cc302158.aspx	Microsoft's .NET Framework XML classes.
http://oreilly.com/catalog/9780596003975	.NET and XML, by Niel M. Bornstein, published 2004 by O'Reilly & Associates.

W3C standards

As discussed earlier in this manual, SOAP (Simple Object Access Protocol) is an Internet standard. This section lists some of the XML-related standards documents that are available.

The World Wide Web Consortion (or "W3C") is the body that creates the XML standards, along with other Internet standards, such as HTML, XHTML, and HTTP. The term "Recommendation," in W3C parlance, means that the standard has been approved by the W3C.

Each document is shown with its title, the status of the standard and the date on which that status was achieved, and the URL that can be used to obtain the document:

Extensible Markup Language (XML) 1.0 (Third Edition)	W3C Recommendation 04 February 2004: http://www.w3.org/TR/REC-xml This is referred to as the W3C XML Recommendation throughout this article.
Namespaces spec	http://www.w3.org/TR/REC-xml-names This further constrains the form of element and attribute names in an XML document, and it provides a means for qualifying names so that different parts of a document can use different vocabularies.
XPath spec	http://www.w3.org/TR/xpath It is recommended that you start with section 5, “Data Model.”
XML Information Set	W3C Recommendation 4 February 2004: http://www.w3.org/TR/xml-infoset
XML Schema	W3C Recommendation, 2 May 2001 XML Schema Part 0: Primer: http://www.w3.org/TR/xmlschema-0/ XML Schema Part 1: Structures: http://www.w3.org/TR/xmlschema-1/ XML Schema Part 2: Datatypes: http://www.w3.org/TR/xmlschema-2/
SOAP Version 1.2	W3C Recommendation 24 June 2003 SOAP Version 1.2 Part 1: Messaging Framework: http://www.w3.org/TR/soap12-part1/ SOAP Version 1.2 Part 2: Adjuncts: http://www.w3.org/TR/soap12-part2/

The above documents are among the rich set of documents available from the World Wide Web Consortium. To browse for their complete public set of publications and useful links, go to:

http://www.w3.org/

XML processing in Janus SOAP: Difference between revisions

Latest revision as of 19:17, 13 May 2016

Contents

Standards relevant to Janus SOAP XML facilities

eXtensible Markup Language (XML)

Simple Object Access Protocol (SOAP)

Example SOAP request

Example SOAP response

XML Path Language (XPath) in the XmlDoc API

XML

XML example

XML syntax

Syntax of document, element, Attribute, Comment, PI

Char and Reference

ISO-10646 and EBCDIC characters

Entity references

Components of XMLDecl

Names and namespaces

Name and namespace syntax

Uniform Resource Identifier syntax

Well-formed documents and validation

Normalization during deserialization

Normalizing entity and character references

Normalizing whitespace characters

Normalized line-end

Normalized attribute value

Language identification

References

W3C standards

Navigation menu

@@ Line 5: / Line 5: @@
 Comments with "&NSPRVSN" are places that could be revisited when "SOAP rule" support available
 -->
-[[Janus SOAP]] provides User Language programmers with a substantial set of facilities for processing eXtensible Markup
+<var class="product">[[Janus SOAP]]</var> provides <var class="product">[[SOUL]]</var> programmers with a substantial set of facilities for processing eXtensible Markup
 Language (XML) documents.
-Among other benefits,
+Among other benefits, this enables rich and automated Web services based on a shared and open Web infrastructure.
-this enables rich and automated Web services based on a shared and open Web infrastructure.
+The design of this XML support is based on various standards, such as XML and [[XPath]].
-The design of this XML support is based on various standards, such as XML and XPath.
 Many sections in this article refer to these and other standards,
 for example, [[#Simple Object Access Protocol (SOAP)|Simple Object Access Protocol (SOAP)]].
 However, it is important to recognize:
-<ul>
+<blockquote><var class="product">Janus SOAP</var> enables you to process <i><b>any XML document</b></i>, whether or not you are using SOAP messages and envelopes.
-<li>''Janus SOAP'' enables you to process <i><b>any XML document</b></i>, whether or not you are using
+</blockquote>
-SOAP messages and envelopes.
-</ul>
-XML support is provided in two disjoint sets of classes in ''Janus SOAP'':
+XML support is provided in two disjoint sets of classes in <var class="product">Janus SOAP</var>:
 <dl>
 <dt>[[XmlDoc API]]
 <dd>The methods in these classes allow you to convert a character stream XML document into an
-internal format (an [[XmlDoc class|XmlDoc object]]) or to programmatically create an XmlDoc, to access and modify an
+internal format (an <var>[[XmlDoc class|XmlDoc]]</var> object) or to programmatically create an <var>XmlDoc</var>, to access and modify an
-XmlDoc, and to convert an XmlDoc into a character stream XML document.
+<var>XmlDoc</var>, and to convert an <var>XmlDoc</var> into a character stream XML document.
 <dt>[[XmlParser API]]
 <dd>This set of classes provides for event-based extraction of information from an XML document in
@@ Line 29: / Line 27: @@
 This can be beneficial when only a relatively small part of the XML document is to be processed.
 </dl>
 ==Standards relevant to Janus SOAP XML facilities==
 ===eXtensible Markup Language (XML)===
-XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can
+XML is a standard (endorsed by the World Wide Web Consortium, or W3C) which can be used for structuring almost any kind of data.
-be used for structuring almost any kind of data.
+Although the word "markup" reveals that the roots of XML are from
-Although the word &ldquo;markup&rdquo; reveals that the roots of XML are from
 document processing, and indeed the outermost entity in XML is called a
-&ldquo;document,&rdquo; XML is ideally suited to structuring almost any kind of
+"document," XML is ideally suited to structuring almost any kind of
 data that is exchanged between or within applications,
 particularly (although by no means exclusively) if they are communicating on a network.
 The syntax of XML provides for hierarchical structuring of data (again, the outer
-entity is called a document)
+entity is called a document) into the principle type called an '''element'''.
-into the principle type called an '''element'''.
 Elements and the other components of an XML document are described in [[#XML|XML]].
@@ Line 56: / Line 54: @@
 An XML document can be considered an abstract object: when XML
 is used for interchange between applications,
-it is usually &ldquo;serialized&ldquo;, or transmitted, completely
+it is usually "serialized", or transmitted, completely
 in character form.
 The advantage of this is that it is human-readable and can be
@@ Line 63: / Line 61: @@
 Additionally, standard network protocols can be used to exchange documents
 between a wide variety of applications on a wide variety of platforms.
-As the
+As the World Wide Web has demonstrated, using characters as the basis for
-World Wide Web has demonstrated, using characters as the basis for
 information interchange is extremely powerful and flexible.
 Beyond these core properties which make XML very attractive for structuring
 data, it has become the basis for a large family of standards.
-Often these standards are referred to as the XML &ldquo;family,&rdquo; in part
+Often these standards are referred to as the XML "family," in part
 because they are managed by the XML Working Group of the W3C.
 Some of these important standards are
 XML Schema, XML Stylesheet Transformations, XML Query, and Web Services
 Description Language (WSDL).
-See http://www.w3c.org
+See http://www.w3c.org for more information about these and other standards related to XML.
-for more information about these and other standards related to XML.
-Quoting from <i><b>XML in a Nutshell (2nd ed</b></i>) (see [[#References|References]]),
+Quoting from <i>XML in a Nutshell (2nd ed)</i> (see [[#References|References]]):
-<ul>
-<li>XML offers the tantalizing possibility of truly cross-platform, long term
+<blockquote>XML offers the tantalizing possibility of truly cross-platform, long term
-data formats. ...
+data formats. ... XML delivers portable data.
-XML delivers portable data.
 In many ways, XML is the most portable ... format designed since the ASCII text file.
-</ul>
+</blockquote>
-You can use XML strictly as an internal datastructure in your application,
+You can use XML strictly as an internal data structure in your application,
-or in ''Model 204'' files, or with operating system files, or with other programs using
+or in <var class="product">Model 204</var> files, or with operating system files, or with other programs using some communication mechanism.
-some communication mechanism.
 The simple, character-based format of XML enhances such communication.
-You can communicate with the Web (HTTP), either as a server application
+You can communicate with the Web (HTTP), either as a server application (for example,
-(for example,
+using <var class="product">[[Janus Web Server]]</var>) or making client XML requests (for example, using <var class="product">[[Janus Sockets]]</var> [[HTTP Helper]]).
-using [[Janus Web Server]]) or making client XML requests (for example, using [[Janus Sockets]] HTTP Helper).
+You can use native <var class="product">Model 204</var> IODEV communication facilities, or <var class="product">Model 204</var> MQ Series, or
-You can use native ''Model 204'' IODEV communication facilities, or ''Model 204'' MQ Series, or
 any facility that can send and receive streams of characters.
 ===Simple Object Access Protocol (SOAP)===
-The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports
+The Simple Object Access Protocol (SOAP) is a lightweight protocol that supports the exchange of structured
-the exchange of structured
 information between Web-based applications.
 SOAP employs XML to serialize the objects passed between applications.
 SOAP can be used in combination with a variety of existing firewall-friendly
 Internet protocols and formats including HTTP, SMTP, and MIME.
-SOAP supports a wide range of application paradigms, from messaging systems to
+SOAP supports a wide range of application paradigms, from messaging systems to Remote Procedure Call (RPC).
-Remote Procedure Call (RPC).
 SOAP is an excellent standard for information exchange between applications,
-so good that it is the reason for the name ''Janus SOAP''.
+so good that it is the reason for the name <var class="product">Janus SOAP</var>.
 It is important to recognize the following, however:
-<ul>
+<var class="product">Janus SOAP</var> enables you to process <i><b>any XML document</b></i>, whether
-<li>''Janus SOAP'' enables you to process <i><b>any XML document</b></i>''', whether
+or not you are using SOAP messages and envelopes.
-or not you are using SOAP messages and envelopes'''.
-</ul>
 <!-- &NSPRVSN -->
-In fact, with the current version,
+In fact, with the current version, although you can readily process formal SOAP
-although you can readily process formal SOAP
 messages, there are no features specially oriented toward that: all features are
 generalized for handling any kind of XML document.
 Later versions will add more functionality to incorporate the standard processing
-of SOAP messages, so your
+of SOAP messages, so your application will only need to deal with the application-specific
-application will only need to deal with the application-specific
 parts of the messages.
 ====Example SOAP request====
 This example SOAP message is a request to a SOAP server:
-<pre>
+<p class="code">POST /StockQuote HTTP/1.1
-    POST /StockQuote HTTP/1.1
+Host: www.stockquoteserver.com
-    Host: www.stockquoteserver.com
+Content-Type: text/xml; charset="utf-8"
-    Content-Type: text/xml; charset="utf-8"
+Content-Length: nnnn
-    Content-Length: nnnn
+SOAPAction: "Some-URI"
-    SOAPAction: "Some-URI"
-    <SoapEnv:Envelope
+<nowiki><SoapEnv:Envelope
-      xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/"
+  xmlns:SoapEnv="http://schemas.xmlsoap.org/soap/envelope/"
-      SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
+  SoapEnv:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
-      <SoapEnv:Body>
+  <SoapEnv:Body>
-          <m:GetLastTradePrice
+      <m:GetLastTradePrice
-             xmlns:m="http://sirius-software.com/samp/JSOAP/1">
+         xmlns:m="http://sirius-software.com/samp/JSOAP/1">
-             <symbol>EMC</symbol>
+         <symbol>EMC</symbol>
-          </m:GetLastTradePrice>
+      </m:GetLastTradePrice>
-      </SoapEnv:Body>
+  </SoapEnv:Body>
-    </SoapEnv:Envelope>
+</SoapEnv:Envelope></nowiki>
-</pre>
+</p>
 ====Example SOAP response====
 This example SOAP message could be a response to
 the above message:
-<pre>
+<p class="code">HTTP/1.1 200 OK
-    HTTP/1.1 200 OK
+Content-Type: text/xml; charset="utf-8"
-    Content-Type: text/xml; charset="utf-8"
+Content-Length: nnnn
-    Content-Length: nnnn
-    <SOAP-ENV:Envelope
+<nowiki><SOAP-ENV:Envelope
-      xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
+  xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
-      SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/>
+  SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"/></nowiki>
-       <SOAP-ENV:Body>
+   <SOAP-ENV:Body>
-           <m:GetLastTradePriceResponse xmlns:m="Some-URI">
+       <m:GetLastTradePriceResponse xmlns:m="Some-URI">
-               <Price>34.5</Price>
+           <Price>34.5</Price>
-           </m:GetLastTradePriceResponse>
+       </m:GetLastTradePriceResponse>
-       </SOAP-ENV:Body>
+   </SOAP-ENV:Body>
-    </SOAP-ENV:Envelope>
+</SOAP-ENV:Envelope>
-</pre>
+</p>
 ===XML Path Language (XPath) in the XmlDoc API===
 XPath is a language designed specifically to select nodes from an XML document.
-It is very powerful, yet it is based on familiar syntax that mimics an
+It is very powerful, yet it is based on familiar syntax that mimics an XML document's hierarchy.
-XML document's hierarchy.
+XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes on which to operate.
-XPath is the general mechanism used in the XmlDoc API for selecting one or more nodes
+It is a key component of XSLT, XPointer, and XLink, and it has a common foundation with XML Query.
-on which to operate.
-It is a key component of XSLT, XPointer, and
-XLink, and it has a common foundation with XML Query.
 An introduction to the use of XPath is provided in
@@ Line 173: / Line 159: @@
 ==XML==
-As explained above, XML provides the basis for a large
+As explained above, XML provides the basis for a large number of varied standards.
-number of varied standards.
 This section introduces the <i><b>W3C XML Recommendation</b></i>, that is, the XML standard.
-It gives you basic information about XML, explaining some of the concepts using the
+It gives you basic information about XML, explaining some of the concepts using the XmlDoc API (that is,
-XmlDoc API (that is,
+the methods of the <var>XmlDoc</var>, <var>XmlNodelist</var>, and <var>XmlNode</var> classes).
-the methods of the XmlDoc, XmlNodelist, and XmlNode classes).
+This approach gives you concrete examples which you can try in <var class="product">SOUL</var>,
-This approach gives you concrete examples which you can try in User Language,
 and which may make the abstract concepts easier to understand.
-The syntax of XML provides for hierarchical structuring of data (the outer
+The syntax of XML provides for:
-object is called a document) into '''elements'''.
+<ul>
+<li>Hierarchical structuring of data (the outer object is called a document) into '''elements'''.
+<p>
 An element has a name, which need not be unique within the document.
 An element can have any number of '''attributes''', each of which
-has a name (which must be unique within that element &mdash; but not within the
+has a name (which must be unique within that element &mdash; but not within the document) and a value.
-document) and a value.
+Within an element can be a series of values and ("sub-") elements, which provides XML with its hierarchical nature.</p>
-Within an element can be a series of values and (&ldquo;sub-&rdquo;) elements,
-which provides XML with its hierarchical nature.
+<li>Assigning unique identifiers to elements;
-<ul>
-<li>There is also a provision for assigning unique identifiers to elements;
 this provides even more structuring possibilities than simple hierarchy.
+<p>
 These identifiers are implemented with the element type definition
 features provided with either Document Type Declarations or with XML Schema.
 Element type definitions are omitted from our XML documentation; they
-are not supported in the current version.
+are not supported in the current version. </p>
 <!-- &NSCHVSN -->
 </ul>
-An XML document has exactly one outer, or &ldquo;top-level&rdquo; element,
+An XML document has exactly one outer, or "top-level," element, and this element
-which contains, as descendants,
+contains, as descendants, any other elements that may be in the document.
-any other elements that may be in the document.
 In addition to the data contained in elements and attributes, any
 number of '''comments''' may appear wherever an element may appear.
-There
+There is also a component called a processing instruction, or '''PI''',
-is also a component called a processing instruction, or '''PI''',
 which is effectively a comment that has a name.
-All names (element names, attribute names, entity references,
+All names (element names, attribute names, entity references, and PI targets) are case-sensitive; for example, a less-than symbol
-and PI targets) are case-sensitive; for example, a less-than symbol
+(<tt><</tt>) can be included in an attribute value if you use the characters
-(<) can be included in an attribute value if you use the characters
+<code>&amp;lt;</code> &mdash; but not if you use <code>&amp;LT;</code> or <code>&amp;Lt;</code>.
-&ldquo;<tt>&amp;lt;</tt>&rdquo; &mdash; but not if you use &ldquo;<tt>&amp;LT;</tt>&rdquo;
-or &ldquo;<tt>&amp;Lt;</tt>&rdquo;.
 The rest of this section explains the syntax of XML and various rules
-for XML documents, according to the <i><b>W3C XML Recommendation</b></i> (as mentioned in [[#References|References]],
+for XML documents, according to the <i>W3C XML Recommendation</i> (as mentioned in [[#References|References]],
-this includes both the XML specification per se, and the XML Namespaces
+this includes both the XML specification per se, and the XML Namespaces specification).
-specification).
+In ([[#XML syntax|XML syntax]], below) and elsewhere as appropriate, you will find
-In ([[#XML syntax|XML syntax]]) and elsewhere as appropriate, you will find
+comments about limitations imposed by the XmlDoc API on the <i>W3C XML Recommendation</i>.
-comments about limitations imposed by the XmlDoc API on the <i><b>W3C XML Recommendation</b></i>.
 ===XML example===
 The next example illustrates the major components of an XML document.
-The formatting into separate, indented lines is
+The formatting into separate, indented lines is provided for readability, but it is not significant for this and for most business data exchange applications.
-provided for readability, but it is not significant for this and for most
+The letter labels on the left are not part of the document; they are for the explanation which follows:
-business data exchange applications.
+<p class="code">X: <?xml version='1.1'?>
-The letter labels on the left are not part of the document; they
+A: &lt;!-- Purchase order follows -->
-are for the explanation which follows:
+B: <purchase_order>
-<pre>
+C:   <memo>Dave's order was "late"</memo>
-    X: <?xml version='1.1'?>
+D:   <?program-version 4.1?>
-    A: <!-- Purchase order follows -->
+E:   <pitm>
-    B: <purchase_order>
+       <partID>1234</partID>
-    C:   <memo>Dave's order was "late"</memo>
+F:     <price per="12" amt="1.280"/>
-    D:   <?program-version 4.1?>
+       <qty>36</qty>
-    E:   <pitm>
+G:   </pitm>
-           <partID>1234</partID>
+H:   <pitm>
-    F:     <price per="12" amt="1.280"/>
+I:     <price amt=".29"></price>
-           <qty>36</qty>
+       <partID>5678</partID>
-    G:   </pitm>
+       <qty>2</qty>
-    H:   <pitm>
+     </pitm>
-    I:     <price amt=".29"></price>
+   </purchase_order>
-           <partID>5678</partID>
+</p>
-           <qty>2</qty>
-         </pitm>
-       </purchase_order>
-</pre>
-In the following explanation of each of the labeled lines above,
+In the following explanation of each of the labeled lines above, references of the form '''['''<i>cnn</i>''']''', like <code>[B22]</code>,
-references of the form '''['''<i><b>cnn</b></i>''']'''
+are to productions in [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]] below.
-are to productions in [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]].
+<table class="thJustBold">
-<dl>
+<tr><th>X:
-<dt>X:
+<td><code><?xml version='1.1'?></code>
-<dd><tt><?xml version='1.1'?></tt>
+<p>
-<br>
 The XML Declaration (XMLDecl, [C23]) is an optional part of the prolog ([B22]), which
 is the set of components preceding the top-level element.
-If XMLDecl is present it must:
+If XMLDecl is present it must:</p>
 <ul>
 <li>Be the first markup in the document (only whitespace may precede it).
 <li>Specify at least the version (as of version 7.5 of the <var class="product">Sirius Mods</var>,
-&ldquo;1.0&rdquo; and &ldquo;1.1&rdquo; are the only valid versions).
+"1.0" and "1.1" are the only valid versions).
 </ul>
-The clauses in XMLDecl are positional, that is, they must be given in the order
+The clauses in XMLDecl are positional, that is, they must be given in the order shown in the syntax.</td></tr>
-shown in the syntax.
-<dt>A:
+<tr><th>A:</th>
-<dd><tt>&lt;!-- Purchase order follows --></tt>
+<td><code>&lt;!-- Purchase order follows --></code>
-<br>
+<p>
-This is a comment at top-level.
+This is a comment at top-level. [A1], [B22], and [D27] allow
-[A1], [B22], and [D27] allow
+zero or more comments and PIs before and after the top-level element.</p></td></tr>
-zero or more comments and PIs before and after the top-level element.
-<dt>B:
+<tr><th>B:</th>
-<dd><tt><purchase_order></tt>
+<td><code><purchase_order></code>
-<br>
+<p>
-This is the element start-tag or STag ([G40]) of the top-level element ([A1]).
+This is the element start-tag or STag ([G40]) of the top-level element ([A1]).</p></td></tr>
-<dt>C:
-<dd><tt><memo>Dave's order was "late"</memo></tt>
+<tr><th>C:</th>
-<br>
+<td><code><memo>Dave's order was "late"</memo></code>
-With &ldquo;leaf&rdquo; elements (known in XML Schema as elements with simple content),
+<p>
+With "leaf" elements (known in XML Schema as elements with simple content),
 that is, if the only thing between the STag and
 Etag is CharData ([P14]), you can usually implement the information either as an
 element (text) or as an attribute of the parent element.
 This text example highlights one small distinction, namely that
-AttValue ([M10]) has less flexibility:
+AttValue ([M10]) has less flexibility:</p>
 <ul>
 <li>If the value includes both apostrophes and quotation marks, either the
 apostrophes or the quotes must be escaped.
 <li>CharData not only allows
 quotes and apostrophes, but it also allows CDSect [Q18].
-</ul>
+</ul></td></tr>
-<dt>D:
-<dd><tt><?program-version 4.1?></tt>
+<tr><th>D:</th>
-<br>
+<td><code><?program-version 4.1?></code>
+<p>
 This is a PI [V16].
-Presumably the name (actually, the target) &ldquo;program-version&rdquo;
+Presumably the name (actually, the target) "program-version" is used by the application reading this document.</p></td></tr>
-is used by the application reading this document.
-<dt>E:
+<tr><th>E:</th>
-<dd><tt><pitm></tt>
+<td><code><pitm></code>
-<br>
+<p>
 This is the STag of an element which is contained
 within another element and which contains child elements;
-this allows you to group elements together.
+this allows you to group elements together.</p></td></tr>
-<dt>F:
-<dd><tt><price per="12" amt="1.280"/></tt>
+<tr><th>F:</th>
-<br>
+<td><code><price per="12" amt="1.280"/></code>
+<p>
 This is an example of the EmptyElemTag ([I44]), which can be useful
 if an element contains no data (just the name can be meaningful to
-the application), or if it only contains data using attributes.
+the application), or if it only contains data using attributes.</p></td></tr>
-<dt>G:
-<dd><tt></pitm></tt>
+<tr><th>G:</th>
-<br>
+<td><code></pitm></code>
+<p>
 This is the ETag [H42] of an element.
-The name must exactly match the STag for the element (again, XML is case sensitive).
+The name must exactly match the STag for the element (again, XML is case sensitive).</p></td></tr>
-<dt>H:
-<dd><tt><pitm></tt>
+<tr><th>H:</th>
-<br>
+<td><code><pitm></code>
-Here is another STag of an element;
+<p>
-it is the &ldquo;sibling&rdquo; of another with the same name.
+Here is another STag of an element; it is the "sibling" of another with the same name.
 The ability to have sub-elements and the ability to repeat elements with the
 same name in a given parent element are the important data modeling
-distinctions between elements and attributes.
+distinctions between elements and attributes.</p></td></tr>
-<dt>I:
-<dd><tt><price amt=".29"></price></tt>
+<tr><th>I:</th>
-<br>
+<td><code><price amt=".29"></price></code>
+<p>
 Note that not all instances of a given element type (the price item
 is an element type) must have the same attributes, nor must they have
-the same sub-structure.
+the same sub-structure. Also, these are optional:</p>
-Also, these are optional:
 <ul>
 <li>Whether an element has content.
 <li>Whether to use an STag immediately followed by an ETag (as is done here)
 or to use the EmptyElemTag (as is done above in item F).
-</ul>
+</ul></td></tr>
-</dl>
+</table>
 ===XML syntax===
 This section contains a version of the XML syntax.
-It is taken from the <i><b>W3C XML Recommendation</b></i>, which is the authoritative reference:
+It is taken from the <i>W3C XML Recommendation</i>, which is the authoritative reference:
-<pre>
+<p class="code"><nowiki>http://www.w3.org/TR/REC-xml</nowiki>
-    http://www.w3.org/TR/REC-xml
+</p>
-</pre>
 The syntax below has been changed from the standard in these ways:
 <ul>
 <li>The only structure in the XML syntax not supported
 in the current version is the
 <!-- &NDTDVSN -->
-Document Type Declaration, or DTD, (&ldquo;<!DOCTYPE...>&rdquo;).
+Document Type Declaration, or DTD, ("<!DOCTYPE...>").
 Although a DTD can be tolerated if you use the DTD_IGNORE option
-of the deserialization functions ([[LoadXml (XmlDoc/XmlNode function)|LoadXml]],
+of the deserialization functions (<var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>,
-[[WebReceive (XmlDoc function)|WebReceive]], and [[ParseXml (HttpResponse function)|ParseXml]])
+<var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, and <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>)
-&mdash; the information contained in the
+&mdash; the information contained in the DTD is not used nor made available to the <var class="product">SOUL</var> program.
-DTD is not used nor made available to the User Language program.
-Reflecting the absence of support for DTD,
+Reflecting the absence of support for DTD, the productions in the syntax that follows are altered to remove those
-the productions in the syntax that follows are altered to remove those
 parts of an XML document introduced in the DTD.
-'''Note:'''
+<p class="note">
-Much of the functionality of document type declarations may be better
+'''Note:''' Much of the functionality of document type declarations may be better
-provided using XML Schema, which is planned for a future version.
+provided using XML Schema, which is planned for a future version.</p>
-<li>The Char, Name, NameStartChar, and NameChar productions are taken from
-the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) .
+<li>The Char, Name, NameStartChar, and NameChar productions are taken from the [http://www.w3.org/TR/xml11/ XML 1.1 recommendation].
 As explained in [[#Char and Reference|Char and Reference]], only characters representable in 8-bit
-EBCDIC were handled prior to <var class="product">Sirius Mods</var> version 7.6,
+EBCDIC were handled prior to <var class="product">Sirius Mods</var> version 7.6, so fewer characters were supported in the production for
-so fewer characters were supported in the production for
 Char ([CA2]) in earlier <var class="product">Sirius Mods</var> releases.
 <li>The maximum length of an XML name is 300 characters (prior to version 7.9, the maximum was 127, and prior to version
 .7, the maximum was 100).
-<li>The productions are re-ordered
-(to make it easier to read the grammar), and letters are added before them,
+<li>The productions are re-ordered (to make it easier to read the grammar), and letters are added before them,
-so when [B22] is referred to in the text, you know that this is between
+so when <code>[B22]</code> is referred to in the text, you know that this is between [A<i>nn</i>] and [C<i>nn</i>] in this grammar, and this is production [22] for the same
-[Ann] and [Cnn] in this grammar, and this is production [22] for the same
+non-terminal (in this case, <code>prolog</code>) in the <i>W3C XML Recommendation</i>.
-non-terminal (in this case, <tt>prolog</tt>) in the <i><b>W3C XML Recommendation</b></i>.
 </ul>
 The conventions used are:
-<dl>
+<table>
-<dt>'<i>yyy</i>' (apostrophes) or "<i>yyy</i>" (quotes)
+<tr><th>'<i>yyy</i>' or "<i>yyy</i>"</th>
-<dd>Enclose an item <i><b>xxx</b></i> that must appear exactly as shown.
+<td>Enclosed item, <i><b>yyy</b></i>, must appear exactly as shown.</td></tr>
-<dt>#x<i>nn</i>
-<dd>Specifies the character (in ISO-10646) with code
+<tr><th>#x<i>nn</i></th>
-value <i><b>nn</b></i>.
+<td>Specifies the character (in ISO-10646) with code value <i><b>nn</b></i>.
-For example, <tt>#x09 #x0D #x0A #x20</tt> specify the
+<p>
-tab, carriage return, linefeed, and space characters, respectively.
+For example, <code>#x09 #x0D #x0A #x20</code> specify the
-<dt>[^<i>abc</i>]
+tab, carriage return, linefeed, and space characters, respectively.</p></td></tr>
-<dd>Specifies any character except
-<i><b>a</b></i>, <i><b>b</b></i>, or <i><b>c</b></i>.
+<tr><th>[^<i>abc</i>]</th>
-<dt>[<i>chars</i>]
+<td>Specifies any character except <i><b>a</b></i>, <i><b>b</b></i>, or <i><b>c</b></i>.</td></tr>
-<dd>Specifies any character within the set
-<i><b>chars</b></i>, where <i><b>chars</b></i> can be the concatenation of these sets:
+<tr><th>[<i>chars</i>]</th>
+<td>Specifies any character within the set <i><b>chars</b></i>, where <i><b>chars</b></i> can be the concatenation of these sets:
 <ul>
 <li><i><b>y</b></i>, meaning the single character <i><b>y</b></i>
 <li><i><b>y</b></i>'''-'''<i><b>z</b></i>, meaning characters in the range from
 <i><b>y</b></i> to <i><b>z</b></i>, inclusive
 </ul>
-The resulting set of
-<i><b>chars</b></i> is the union of the specified sets.
+The resulting set of <i><b>chars</b></i> is the union of the specified sets.</td></tr>
-<dt><i>set1</i> - <i>set2</i> (&ldquo;-&rdquo; not enclosed in [...])
-<dd>The set of strings described by <i><b>set1</b></i>, with the set of strings
+<tr><th><i>set1</i> - <i>set2</i> ("-" not enclosed in [...])</th>
-described by <i><b>set2</b></i> removed.
+<td>The set of strings described by <i><b>set1</b></i>, with the set of strings
-<dt>|
+described by <i><b>set2</b></i> removed.</td></tr>
-<dd>Separates alternatives.
-<dt>?
+<tr><th>|</th>
-<dd>Follows an optional item.
+<td>Separates alternatives.</td></tr>
-<dt>*
-<dd>Follows an item that can occur any number of times (even not at all).
+<tr><th>?</th>
-<dt>+
+<td>Follows an optional item.</td></tr>
-<dd>Follows an item that can occur one or more times.
-<dt>(<i>abc</i>) (parentheses)
+<tr><th>*</th>
-<dd>Group items
+<td>Follows an item that can occur any number of times (even not at all).</td></tr>
-<dt>[<i>rule</i>] (&ldquo;to the right&rdquo;)
-<dd>Marks an additional syntax rule.
+<tr><th>+</th>
-<dt>/*<i>comment</i>*/
+<td>Follows an item that can occur one or more times.</td></tr>
-<dd>Marks a comment.
-</dl>
+<tr><th>(<i>abc</i>) (parentheses)</th>
+<td>Groups items.</td></tr>
+<tr><th>[<i>rule</i>] ("to the right")</th>
+<td>Marks an additional syntax rule.</td></tr>
+<tr><th>/*<i>comment</i>*/</th>
+<td>Marks a comment.</td></tr>
+</table>
 The syntax is shown in three sections:
@@ Line 418: / Line 409: @@
 <li>The major components
 <li>The productions that describe individual characters
-<li>The components
+<li>The components of the "XML Declaration" (<code><?xml version=...?></code>)
-of the &ldquo;XML Declaration&rdquo; (<tt><?xml version=...?></tt>)
 </ol>
 ====Syntax of document, element, Attribute, Comment, PI====
-<pre style="skel">
+<p class="code">[A1]  document      ::= (prolog element Misc*) (Char* RestrictedChar Char*)
-[A1]  document     ::= (prolog element Misc*)
-                        - (Char* RestrictedChar Char*)
+[B22] prolog        ::= XMLDecl? Misc*
-[B22] prolog       ::= XMLDecl? Misc*
-[C23] XMLDecl      ::= '<?xml' VersionInfo
+[C23] XMLDecl       ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
-                       EncodingDecl? SDDecl? S? '?>'
-[D27] Misc         ::= Comment | PI | S
+[D27] Misc          ::= Comment | PI | S
-[E3]  S            ::= (#x20 | #x9    /* Whitespace */
-                       | #xD | #xA)+
+[E3]  S             ::= (#x20 | #x9  /* Whitespace */ | #xD | #xA)+
-[F39] element      ::= STag content ETag  [Element Type Match]
-                       | EmptyElemTag
+[F39] element       ::= STag content ETag  [Element Type Match] | EmptyElemTag
-[G40] STag         ::= '<' Name (S Attribute)* S? '>'  [Unique Att]
-[H42] ETag         ::= '</' Name S? '>'
+[G40] STag          ::= '<' Name (S Attribute)* S? '>'  [Unique Att]
+[H42] ETag          ::= '</' Name S? '>'
-[I44] EmptyElemTag ::= '<' Name (S Attribute)* S? '/>' [Unique Att]
+[I44] EmptyElemTag  ::= '<' Name (S Attribute)* S? '/>' [Unique Att]
-[NSC] NameStartChar ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
+[NSC] NameStartChar ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
-      [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
+                            [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
-      [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
+                            [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
-      [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
-[NC]  NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
+[NC]  NameChar      ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
-                         [#x0300-#x036F] | [#x203F-#x2040]
 [NA]  Name          ::= NameStartChar (NameChar)*
-</pre>
+</p>
-Within an XML document, the maximum length of a name (for example,
+Within an XML document, the maximum length of a name (for example, each of the prefix part the the local part of
-each of the prefix part the the local part of
 an element name) is 300 characters (prior to version 7.9, it was 127 characters, prior to version 7.7,
 the maximum length was 100 characters).
 Element and attribute names are also subject to
-restrictions related to XML Namespaces; see [[#Name and namespace syntax|Name and namespace syntax]].)
+restrictions related to XML Namespaces; see [[#Name and namespace syntax|Name and namespace syntax]].
-<pre style="skel">
+<p class="code">[L41] Attribute     ::= Name Eq AttValue
-[L41] Attribute     ::= Name Eq AttValue
 [M10] AttValue      ::= '"' ([^<&"] | Reference)* '"'
                          | "'" ([^<&'] | Reference)* "'"
@@ Line 463: / Line 453: @@
 [N25] Eq            ::= S? '=' S?
-[O43] content       ::= CharData? ( (element
+[O43] content       ::= CharData? ( (element | Reference | CDSect | PI | Comment) CharData? )*
-                        | Reference | CDSect | PI
-                        | Comment) CharData? )*
 [P14] CharData      ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
 [Q18] CDSect        ::= CDStart CData CDEnd
 [R19] CDStart       ::= '<![CDATA['
 [S20] CData         ::= (Char* - (Char* ']]>' Char*))
 [T21] CDEnd         ::= ']]>'
-[U15] Comment       ::= '<!--' ( (Char - '-')
+[U15] Comment       ::= '&lt;!--' ( (Char - '-') | ('-' (Char - '-')) )* '-->'
-                        | ('-' (Char - '-')) )* '-->'
-[V16] PI            ::= '<?' PITarget (S (Char* -
+[V16] PI            ::= '<?' PITarget (S (Char* (Char* '?>' Char*) ))? '?>'
-                        (Char* '?>' Char*) ))? '?>'
-[W17] PITarget      ::= Name - (('X' | 'x') ('M' | 'm')
-                        ('L' | 'l'))
-</pre>
+[W17] PITarget      ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
+</p>
 ====Char and Reference====
-<pre style="skel">
+<p class="code">[CA2]  Char           ::= [#x1-#xD7FF] | [#xE000-#xFFFD]
-[CA2]  Char           ::= [#x1-#xD7FF] | [#xE000-#xFFFD]
-[CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F]
+[CA2A] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | [#x7F-#x84] | [#x86-#x9F]
-                          | [#x7F-#x84] | [#x86-#x9F]
 [CB67] Reference      ::= EntityRef | CharRef
 [CD68] EntityRef      ::= '&' Name ';'
-[CC66] CharRef        ::= '&amp;#' [0-9]+ ';'
-                          | '&amp;#x' [0-9a-fA-F]+ ';'  [Legal Char]
+[CC66] CharRef        ::= '&amp;#' [0-9]+ ';' | '&amp;#x' [0-9a-fA-F]+ ';'  [Legal Char]
-</pre>
+</p>
 =====ISO-10646 and EBCDIC characters=====
 <ul>
-<li>Through <var class="product">Sirius Mods</var> version 7.5,
+<li>Through <var class="product">Sirius Mods</var> version 7.5, XmlDocs were maintained in EBCDIC, and
-XmlDocs were maintained in EBCDIC, and
+production <code>[CA2]</code> above did not allow the full range of ISO-10646 characters shown in the <i>W3C XML Recommendation</i>.
-production [CA2] above did not
+(ISO-10646 is the standard for the universal character set, also known as Unicode.)
-allow the full range of ISO-10646 characters shown in the <i><b>W3C XML Recommendation</b></i>.
-(ISO-10646 is the standard for the universal character set, also known as
-Unicode.)
 The XmlDoc API might have rejected an XML document
 because it contained an ISO-10646 character that could not be represented in EBCDIC.
-As of <var class="product">Sirius Mods</var> version 7.6, XmlDocs are maintained in Unicode
+As of <var class="product">Sirius Mods</var> version 7.6, <var>XmlDocs</var> are maintained in Unicode
 as supported by the <var class="product">Sirius Mods</var>.
-This is why production [CA2] shows that
+This is why production <code>[CA2]</code> shows that no Unicode characters greater then <code>U+FFFD</code> are allowed.
-no Unicode characters greater then U+FFFD are allowed.
 In addition, deserialization (with default options) of an XML document fails if the document
 contains a Unicode character that is not translatable to EBCDIC.
-The AllowUntranslatable option of the deserialization methods lets you
+The <var>AllowUntranslatable</var> option of the deserialization methods lets you circumvent this restriction.
-circumvent this restriction.
-The null character (#x0), normally restricted, is allowed in an XML
-document if the XmlDoc's AllowNull property is set to <tt>True</tt>.
-'''Note:'''
-Using the standard
-translation table provided with <var class="product">Sirius Mods</var> versions prior to 7.3,
-many EBCDIC characters (such as X'FF'),
-in addition to the &ldquo;control characters&rdquo; that were
-explicitly prohibited,
-were ''not'' legal XML characters
-because they did not translate to any Unicode character.
+The null character (<code>#x0</code>), normally restricted, is allowed in an XML
+document if the <var>XmlDoc</var>'s <var>AllowNull</var> property is set to <code>True</code>.
+<blockquote class="note">
+<p>'''Note:''' Using the standard translation table provided with <var class="product">Sirius Mods</var> versions prior to 7.3,
+many EBCDIC characters (such as <code>X'FF'</code>), in addition to the "control characters" that were
+explicitly prohibited, were ''not'' legal XML characters because they did not translate to any Unicode character.</p>
+<p>
 In <var class="product">Sirius Mods</var> version 7.3, the standard translation table was modified significantly.
 For more information about supported characters and character translation
-issues as of version 7.3, see [[??]] refid=u80. and [[??]] refid=cxe2u..
+issues as of version 7.3, see [[Unicode#Support for the ASCII subset of Unicode|Support for the ASCII subset of Unicode]] and [[Unicode#Corrected translations between ASCII/Unicode and EBCDIC|Corrected translations between ASCII/Unicode and EBCDIC]].</p> </blockquote>
-<li>As stated in "[[XmlDoc API#Transport: sending and receiving XML|Transport: sending and receiving XML]]", UTF-8, UTF-16, and ISO-8859-x
+<li>As stated in [[XmlDoc API#Transport: sending and receiving XML|Transport: sending and receiving XML]], UTF-8, UTF-16, and ISO-8859-x
 encodings are accepted (note that these must be given in all-capital letters  within the XML declaration).
 <li>XPath comparisons are performed using Unicode.
 As of version 7.3, it is the only type of ordered character comparison.
@@ Line 535: / Line 518: @@
 and could be controlled by the (now obsolete) [[XPathOrder (obsolete XmlDoc property)|XPathOrder]] property.
 </ul>
 =====Entity references=====
 <ul>
 <li>One purpose of an EntityRef is to allow a sequence of characters that
 may be illegal in a particular context of an XML document.
-For example, within an element's content, the string &ldquo;]]>&rdquo; is not
+For example, within an element's content, the string <code>]]></code> is not
-allowed, so you may replace the greater-than symbol (>) with
+allowed, so you may replace the greater-than symbol (<tt>></tt>) with either its character code in a CharRef, or with the predefined entity <code>&amp;gt;</code>:
-either its character code in a CharRef, or with
+<p class="code">]]&amp;gt;
-the predefined entity <tt>&amp;gt;</tt>:
+</p>
-<pre>
+<p>
-    ]]&amp;gt;
+A <code>Reference</code> (<code>EntityRef</code> or <code>CharRef</code>) is allowed only in an element's content (<code>[O43]</code>) or in <code>AttValue</code> (<code>[M10]</code>).</p>
-</pre>
-A Reference (EntityRef or CharRef)
-is allowed only in an element's content ([O43]) or in AttValue ([M10]).
 <li>There is a facility for defining your own entities in a DTD, but
-since DTDs are not supported in ''Janus SOAP'',
+since DTDs are not supported in <var class="product">Janus SOAP</var>,
 the only entity references supported are the five predefined entities:
-<dl>
-<dt>&amp;amp;
+<table class="thJustBold">
-<dd>ampersand (&)
+<tr><th>&amp;amp;</th>
-<dt>&amp;apos;
+<td>ampersand (<tt>&</tt>)</td></tr>
-<dd>apostrophe (')
-<dt>&amp;gt;
+<tr><th>&amp;apos;</th>
-<dd>greater than (>)
+<td>apostrophe (<tt>'</tt>)</td></tr>
-<dt>&amp;lt;
-<dd>less than (<)
+<tr><th>&amp;gt;</th>
-<dt>&amp;quot;
+<td>greater than (<tt>></tt>)</td></tr>
-<dd>double quotation mark (")
-</dl>
+<tr><th>&amp;lt;</th>
-'''Note:'''
+<td>less than (<tt><</tt>)</td></tr>
-As of <var class="product">Sirius Mods</var> version 7.6, you can use any of the XHTML entities
-(listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2)
+<tr><th>&amp;quot;</th>
+<td>double quotation mark (<tt>"</tt>)</td></tr>
+<tr><th>&amp;lsqb; <br>&amp;rsqb;</th>
+<td>left and right square brackets (<tt>[</tt> <tt>]</tt>) <br>(as of Model&nbsp;204 7.6)</td></tr>
+</table>
+<blockquote class="note">
+<p>'''Note:''' You can use any of the XHTML entities (listed at http://www.w3.org/TR/xhtml1/dtds.html#h-A2)
 to represent Unicode characters when converting from EBCDIC to Unicode.
 Character decoding must be in effect, however: you must be using
-the [[U (String function)|U]] constant function
+the <var>[[U (String function)|U]]</var> constant function or the <code>CharacterDecode=True</code> argument on the <var>[[EbcdicToUnicode (String function)|EbcdicToUnicode]]</var> function. </p>
-or the <tt>CharacterDecode=True</tt>
-argument on the [[EbcdicToUnicode (String function)|EbcdicToUnicode]] function.
-You can load into an XmlDoc a character represented by such an entity
+You can load into an <var>XmlDoc</var> a character represented by such an entity if you decode the entity reference before the character is processed by one of the XmlDoc API deserializing or direct storage methods. </blockquote>
-if you decode the entity reference before the character is processed
-by one of the XmlDoc API deserializing or direct storage methods.
 </ul>
 ====Components of XMLDecl====
-<pre style="skel">
+<p class="code">[XA24] VersionInfo  ::= S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')
-[XA24] VersionInfo  ::= S 'version' Eq
-                        ("'" VersionNum "'"
-                        | '"' VersionNum '"')
 [XB26] VersionNum   ::= ([a-zA-Z0-9_.:] | '-')+
-[XC80] EncodingDecl ::= S 'encoding' Eq
+[XC80] EncodingDecl ::= S 'encoding' Eq ('"' EncName '"' | "'" EncName "'" )
-                        ('"' EncName '"'
-                        | "'" EncName "'" )
-[XD81] EncName      ::= [A-Za-z] ([A-Za-z0-9._]
+[XD81] EncName      ::= [A-Za-z] ([A-Za-z0-9._] | '-')*   /* Only Latin chars */
-                        | '-')*   /* Only Latin chars */
-[XE32] SDDecl       ::= S 'standalone' Eq (
+[XE32] SDDecl       ::= S 'standalone' Eq ( ("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"') )
-                        ("'" ('yes' | 'no') "'")
+</p>
-                        | ('"' ('yes' | 'no') '"') )
-</pre>
 ===Names and namespaces===
-XML documents are allowed to contain
+XML documents are allowed to contain elements and attributes that are defined by one organization, as well as
-elements and attributes that are defined by one organization, as well as
 other elements and attributes that are defined by another organization.
-In order to achieve this organizational &ldquo;merging,&rdquo;
+In order to achieve this organizational "merging," the <i>XML Namespaces Recommendation</i> (http://www.w3.org/TR/REC-xml-names)
-the <i><b>XML Namespaces Recommendation</b></i>
-(http://www.w3.org/TR/REC-xml-names)
 provides for a way to qualify these merged names so that they will not conflict.
 Also, the Namespaces Recommendation provides a way for an application
-to examine, in effect, the &ldquo;defining organization&rdquo; of a name
+to examine, in effect, the "defining organization" of a name
 in an XML document, so that various properties can be inferred, and
-names from the same &ldquo;organization&rdquo; can be grouped together.
+names from the same "organization" can be grouped together.
-Conceptually, the Namespaces Recommendation qualifies a name with a
+Conceptually, the Namespaces Recommendation qualifies a name with a Uniform Resource Identifier ('''URI''').
-Uniform Resource Identifier ('''URI''').
 There are various rules for various types of URIs; one familiar type
-is the same as URLs on the World Wide Web, such as
+is the same as URLs on the World Wide Web, such as:
-<pre>
+<p class="code"><nowiki>http://www.w3.org/2001/XMLSchema</nowiki>
-    http://www.w3.org/2001/XMLSchema
+</p>
-</pre>
 The important aspect of a URI, as far as the names in an XML document
-are concerned, is simply that it is a unique string for the names
+are concerned, is simply that it is a unique string for the names that are associated with it.
-that are associated with it.
-The characters that are valid in a URI (shown in [[#Uniform Resource Identifier syntax|Uniform Resource Identifier syntax]])
+The characters that are valid in a URI (shown in [[#Uniform Resource Identifier syntax|Uniform Resource Identifier syntax]]) exceed the set of characters that are valid in an XML name.
-exceed the set of characters that are valid in an XML name.
+Therefore, the technique employed for XML Namespace qualification is to use a special kind of attribute &mdash; one that begins with "xmlns" &mdash; to associate a name '''prefix''' with a URI.
-Therefore, the technique employed for XML Namespace qualification is to
-use a special kind of attribute &mdash; one that begins
-with &ldquo;xmlns&rdquo; &mdash; to associate a name '''prefix''' with a URI.
 Then attaching a prefix to a name effectively attaches the URI to a name.
 The syntax for making this association, the namespace declaration, is explained in the next section.
 ====Name and namespace syntax====
-The <i><b>W3C XML Recommendation</b></i> syntax rule for names is shown in
+The <i>W3C XML Recommendation</i> syntax rule for names is shown in
-"[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]" (and
+[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]] (and
-repeated below) as the Name ([NA]), NameStartChar ([NSC]),
+repeated below) as the <code>Name</code> (<code>[NA]</code>), <code>NameStartChar</code> (<code>[NSC]</code>), and <code>NameChar</code> (<code>[NC]</code>) productions.
-and NameChar ([NC]) productions.
+The XML Namespaces Recommendation provides additional rules for Element and Attribute names (but not for PI targets).
-The XML Namespaces Recommendation provides additional
+From the Namespaces Recommendation, element and attribute names are both instances of <code>QName</code>:
-rules for Element and Attribute names (but not for PI targets).
-From the Namespaces Recommendation, element and attribute names
-are both instances of <tt>QName</tt>:
-<pre style="skel">
+<p class="code">[NSC] NameStartChar   ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] |
-[NSC] NameStartChar   ::=   ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] |
+                              [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] |
-      [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
+                              [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
-      [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
-      [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD]
-[NC]  NameChar   ::= NameStartChar | "-" | "." | [0-9] | #xB7 |
+[NC]  NameChar   ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]
-                     [#x0300-#x036F] | [#x203F-#x2040]
 [NA]  Name       ::= NameStartChar (NameChar)*
 [NB5] NCName     ::= (NameStartChar - ':') (NameChar - ':')*
 [NC6] QName      ::= (Prefix ':')? LocalPart
 [ND7] Prefix     ::= NCName
 [NE8] LocalPart  ::= NCName
-</pre>
+</p>
-Although the <i><b>W3C XML Recommendation</b></i> does not require that attribute and element names
+Although the <i>W3C XML Recommendation</i> does not require that attribute and element names
-follow the XML Namespaces Recommendation, the operation of XPath requires
+follow the XML Namespaces Recommendation, the operation of XPath requires it.
-it.
 Therefore, since XPath is so important for the XmlDoc API, its default operating
 mode is to require Namespaces conformance in the XML document.
-See the [[Namespace (XmlDoc property)|Namespace]] property.
+See the <var>[[Namespace (XmlDoc property)|Namespace]]</var> property.
 The restrictions and changes to the XML Recommendation are as follows:
 <ul>
-<li>The <tt>NameStartChar</tt> and <tt>NameChar</tt> productions are taken
+<li>The <code>NameStartChar</code> and <code>NameChar</code> productions are taken
 from the XML 1.1 recommendation (http://www.w3.org/TR/xml11/) .
 Starting with version 7.6 of the <var class="product">Sirius Mods</var>, XmlDocs are maintained in Unicode, as
 supported by the <var class="product">Sirius Mods</var>.
 That support excludes characters encoded in more than two bytes, so production
-[NSC], above, shows no Unicode characters greater than U+FFFD.
+<code>[NSC]</code>, above, shows no Unicode characters greater than <code>U+FFFD</code>.
 By default, deserialization of an XML document fails if the document
 contains a Unicode character that is not translatable to EBCDIC.
-The AllowUntranslatable argument of the deserialization methods lets you
+The <var>AllowUntranslatable</var> argument of the deserialization methods lets you circumvent this restriction.
-circumvent this restriction.
-<li>A name can have at most one colon (:), which separates
+<li>A name can have at most one colon (<tt>:</tt>), which separates
 the name into a non-null '''prefix''' and a non-null '''local name'''.
 <li>A name without a prefix is simply a local name.
-<li>The prefix, if any, must be associated with a '''namespace
-URI''' using an attribute of the form:
+<li>The prefix, if any, must be associated with a '''namespace URI''' using an attribute of the form:
-<blockquote style="xmp">
+<p class="code">xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"
-xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"
+</p>
-</blockquote>
 <!--   xmlns:prefix="URI" -->
 <!--?? xmlns:<i><b>prefix</b></i>="<i><b>URI</b></i>"-->
 For example, all elements (and attributes of those elements) within
-the content of the <tt>definitions</tt> element below can use the prefix
+the content of the <code>definitions</code> element below can use the prefix
-&ldquo;xsd&rdquo; to qualify their names to belong to the
+"xsd" to qualify their names to belong to the <nowiki>"http://www.w3.org/2001/XMLSchema"</nowiki> namespace:
-<nowiki>"http://www.w3.org/2001/XMLSchema"</nowiki> namespace:
+<p class="code"><nowiki><definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema"></nowiki>
-<pre style="xmp">
+  ... content of definitions element ...
-   <definitions xmlns:xsd="http://www.w3.org/2001/XMLSchema">
+</definitions>
-     ... content of definitions element ...
+</p>
-   </definitions>
-</pre>
+<li>The prefix <code>xml</code> is bound to the namespace URI
-<li>The prefix <tt>xml</tt> is bound to the namespace URI
+<code><nowiki>http://www.w3.org/XML/1998/namespace</nowiki></code>.
-<tt><nowiki>http://www.w3.org/XML/1998/namespace</nowiki></tt>.
 Neither can be used without the other.
-<br>
-<li>An element can also have a '''default namespace''' attribute,
+<li>An element can also have a '''default namespace''' attribute, which "declares" its namespace, of the form:
-which &ldquo;declares&rdquo; its namespace, of the form:
+<p class="code">xmlns="URI"
-<pre style="xmp">
+</p>
-    xmlns="URI"
-</pre>
 <!--??  xmlns="<i><b>URI</b></i>"-->
-<br>
 <li>Another form of default namespace declaration allows
 an element to disable any default namespace with:
-<pre>
+<p class="code">xmlns=""
-    xmlns=""
+</p>
-</pre>
 <li>A namespace declaration is syntactically the same as an Attribute.
 <li>The scope of a non-default namespace declaration is the element containing it, its
 attributes, and all descendant elements and their attributes, until another declaration
 of the prefix.
 <li>The scope of a default namespace declaration is the element containing it (but not
 the attributes of that element) and its descendant elements (but not their attributes),
@@ Line 722: / Line 688: @@
 <li>The namespace URI associated with a name is
 <ol>
-<li>the in-scope URI associated with the prefix of the name, if the name has
+<li>the in-scope URI associated with the prefix of the name, if the name has a prefix
-a prefix
-<li>for element names,
+<li>for element names, the in-scope default namespace URI, if the name does not have a prefix
-the in-scope default namespace URI, if the name does not have a prefix
 and there is a default namespace URI in scope
 <li>no namespace URI, otherwise
 </ol>
 <li>Two names are identical if they have the same local name and either
-they both do not have a namespace URI or they both have the same namespace
+they both do not have a namespace URI or they both have the same namespace URI.
-URI.
 </ul>
 ====Uniform Resource Identifier syntax====
-The form of a valid string used as a URI is specified in IETF RFC2396
+The form of a valid string used as a URI is specified in IETF RFC2396 (see http://www.faqs.org/rfcs/rfc2396.html).
-(see http://www.faqs.org/rfcs/rfc2396.html) .
 The rules are as follows:
 <ul>
 <li>Namespace URIs must be '''absolute''':
 they must start with a non-null prefix (called a
-&ldquo;scheme&rdquo;), followed by a colon (:) and a non-null suffix.
+"scheme"), followed by a colon (<tt>:</tt>) and a non-null suffix.
-<li>The scheme must start
-with a letter, which may be followed by any combination of letters, digits, and
+<li>The scheme must start with a letter, which may be followed by any combination of letters, digits, and
-the plus (+), hyphen (-), and period (.) characters.
+the plus (<tt>+</tt>), hyphen (<tt>-</tt>), and period (<tt>.</tt>) characters.
-<br>
-<li>The suffix can contain any of
+<li>The suffix can contain any of the following characters, in addition to letters and digits:
-the following characters, in addition to letters and digits:
+<p class="code">; (semicolon)                - (hyphen)
-<pre>
+/ (slash)                    _ (underscore)
-    ; (semicolon)                - (hyphen)
+? (question mark)            . (period)
-    / (slash)                    _ (underscore)
+&#58; (colon)                    ! (exclamation point)
-    ? (question mark)            . (period)
+@ (at sign)                  ~ (tilde)
-    : (colon)                    ! (exclamation point)
+& (ampersand)                * (asterisk)
-    @ (at sign)                  ~ (tilde)
+= (equal sign)               ' (apostrophe)
-    & (ampersand)                * (asterisk)
++ (plus sign)                ( (open parenthesis)
-    = (equal sign)               ' (apostrophe)
+$ (dollar sign)              ) (close parenthesis)
-    + (plus sign)                ( (open parenthesis)
+, (comma)
-    $ (dollar sign)              ) (close parenthesis)
+</p>
-    , (comma)
-</pre>
 The suffix can also contain:
 <ul>
-<li>At most one number sign (#).
+<li>At most one number sign (<tt>#</tt>).
-<li>As of <var class="product">Sirius Mods</var> 7.2, a percent (%) character followed by two hex digits
+<li>A percent (<tt>%</tt>) character followed by two hex digits
 to escape some other character.
@@ Line 770: / Line 734: @@
 <ul>
 <li>The hex digits A-F may be uppercase or lowercase.
 <li>The hexadecimal values are not replaced when URI processing is performed.
+<p>
-For example, even though the ASCII code for the number &ldquo;4&rdquo; is
+For example, even though the ASCII code for the number "4" is
-hexadecimal 34, the following two URIs are different and distinct:
+hexadecimal 34, the following two URIs are different and distinct:</p>
-<pre>
+<p class="code"><nowiki>http://my.URI.number4
-    http://my.URI.number4
+http://my.URI.number%34</nowiki>
-    http://my.URI.number%34
+</p>
-</pre>
 Thus, for instance, the following fragment:
-<pre>
+<p class="code">%n = %d:AddElement('x', , <nowiki>'http://my.URI.number4')
-    %n = %d:AddElement('x', , 'http://my.URI.number4')
+     %n:AddElement('x', , 'http://my.URI.number%34')
-         %n:AddElement('x', , 'http://my.URI.number%34')
+%d:Print
-    %d:Print
+%d:SelectionPrefix('f') = 'http://my.URI.number4'</nowiki>
-    %d:SelectionPrefix('f') = 'http://my.URI.number4'
+Print %d:SelectCount('//f:x') And 'matching node(s)'
-    Print %d:SelectCount('//f:x') And 'matching node(s)'
+</p>
-</pre>
 Will have the following result:
-<pre>
+<p class="code"><nowiki><x xmlns="http://my.URI.number4">
-    <x xmlns="http://my.URI.number4">
+  <x xmlns="http://my.URI.number%34"/></nowiki>
-      <x xmlns="http://my.URI.number%34"/>
+</x>
-    </x>
+matching node(s)
-matching node(s)
+</p>
-</pre>
 </ul>
 </ul>
 </ul>
 ===Well-formed documents and validation===
-Before an XML document can be processed, its structure must match the
+Before an XML document can be processed, its structure must match the rules expressed in the productions in
-rules expressed in the productions in
+[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]], along with
-"[[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]", along with
+the extra rules alluded to in square brackets (for example, <code>[Unique Att]</code>,
-the extra rules alluded to in square brackets (for example, <tt>[Unique Att]</tt>,
+indicating that a single attribute name may not be given twice in the list of attributes for an element).
-indicating that a single attribute name may not be given twice in the list
+When the syntax is correct, including these rules, the document is called '''well-formed'''.
-of attributes for an element).
-When the syntax is correct, including these rules, the document is
-called '''well-formed'''.
 The XmlDoc API enforces the syntax rules of well-formed documents.
@@ Line 810: / Line 770: @@
 In addition to this checking, an XML processor may also check to see that
 the format of the document matches the structure and restrictions
-declared for it in either
+declared for it in either the Document Type Declaration or the document's Schema.
-the Document Type Declaration or the document's Schema.
+If the document matches the type structure and restrictions, it is called '''valid'''.
-If the document matches the type structure and restrictions, it is
+In the <i>W3C XML Recommendation</i>, this validation of a document is an optional feature of an XML processor.
-called '''valid'''.
-In the <i><b>W3C XML Recommendation</b></i>, this validation of a document is an optional feature of
-an XML processor.
 <!-- &NSCHVSN -->
 With the current version, the XmlDoc API does not validate the XML document.
-A later version will incorporate this feature.
 Note that support of XML Schema is planned; Document Type Declarations
 have several shortcomings, including a limitation on the types of
 constraints that can be placed on the document, a specialized baroque
 syntax that doesn't conform to the element/attribute structure of
-XML, and incorporation of some features that have nothing to do with
+XML, and incorporation of some features that have nothing to do with document validation.
-document validation.
 ===Normalization during deserialization===
 When an XML processor, in particular the XmlDoc API, parses an XML document from
-character form into an internal representation, it must make some transformations
+character form into an internal representation, it must make some transformations of the document.
-of the document.
 The two most significant types of these transformations concern the following:
 <ul>
@@ Line 835: / Line 790: @@
 <li>Whitespace characters
 </ul>
 ====Normalizing entity and character references====
-Entity and character references are replaced by their entity and character
+Entity and character references are replaced by their entity and character counterparts before deserialization.
-counterparts before deserialization.
+For example, the entity reference <code>&amp;gt;</code> in the <code>content</code>
-For example, the entity reference <tt>&amp;gt;</tt> in the <tt>content</tt>
+of an element or in the <code>AttValue</code> of an Attribute, is handled exactly as if a greater-than symbol
-of an element or in the <tt>AttValue</tt> of an Attribute,
+(<tt>></tt>) occurred at that point in the document.
-is handled exactly as if a greater-than symbol
+Similarly, the character reference <code>&amp;#x5B;</code> is handled as if a left
-(>) occurred at that point in the document.
+square-bracket symbol (<tt>[</tt>) occurred at that point in the document.
-Similarly, the character
-reference <tt>&amp;#x5B;</tt> is handled as if a left
-square-bracket symbol ( [ ) occurred at that point in the document.
-This normalization occurs '''after''' whitespace normalization, which is
+This normalization occurs '''after''' whitespace normalization, which is discussed in the next section.
-discussed in the next section.
 ====Normalizing whitespace characters====
 In the XML syntax, the whitespace characters are (in hexadecimal,
 using ISO-10646 character codes):
-<dl>
+<table class="thJustBold">
-<dt>tab
+<tr><th>tab</th>
-<dd>x'09'
+<td>x'09'</td></tr>
-<dt>linefeed
+<tr><th>linefeed</th>
-<dd>x'0A'
+<td>x'0A'</td></tr>
-<dt>carriage return
+<tr><th>carriage return</th>
-<dd>x'0D'
+<td>x'0D'</td></tr>
-<dt>space
+<tr><th>space</th>
-<dd>x'20'
+<td>x'20'</td></tr>
-</dl>
+</table>
-In general, the whitespace characters can be used in the <tt>S</tt>
-production (shown in
+In general, the whitespace characters can be used in the <code>S</code> production (shown in
 [[#Syntax of document, element, Attribute, Comment, PI|Syntax of document, element, Attribute, Comment, PI]]),
-which must separate
+which must separate many of the tokens in a document (for example, it must follow the element name, if the <code>STag</code>
-many of the tokens in a document
+contains an Attribute) and may optionally be used in many other places (for example, it may appear before or after the equal sign (<tt>=</tt>)
-(for example, it must follow the element name, if the <tt>STag</tt>
-contains an Attribute) and may optionally be used in many other
-places
-(for example, it may appear before or after the equal sign (=)
 between an Attribute name and its value.
-The interplay of three factors determine the normalization of whitespace
+The interplay of three factors determine the normalization of whitespace characters during deserialization:
-characters during deserialization:
 <ul>
-<li>The <i><b>W3C XML Recommendation</b></i> specifies two normalizing transformations of whitespace:
+<li>The <i>W3C XML Recommendation</i> specifies two normalizing transformations of whitespace:
 <ol>
-<li>When a special combination of
-line-end characters &mdash; carriage return and linefeed &mdash;
+<li>When a special combination of line-end characters &mdash; carriage return and linefeed &mdash; occur '''anywhere'''
-occur '''anywhere'''
 in an XML document, they are replaced by a single linefeed character.
-Also, carriage returns not followed by a linefeed are
+Also, carriage returns not followed by a linefeed are replaced by a single linefeed character.
-replaced by a single linefeed character.
-<li>When any whitespace character appears in the value of an attribute,
+<li>When any whitespace character appears in the value of an attribute, it is replaced by a single space character.
-it is replaced by a single space character.
 </ol>
-The XmlDoc API always applies these transformations, and
+The XmlDoc API always applies these transformations, and the following two sub-sections describe them in more detail.
-the following two sub-sections describe them in
+<li>In addition to the XML standard whitespace transformations, the XmlDoc API deserialization methods offer options to
-more detail.
+control normalization of whitespace characters that occur in the <code>content</code> of an element.
-<li>In addition to the XML standard whitespace transformations,
+Those options are described in these pages:
-the XmlDoc API deserialization methods offer options to
-control normalization of whitespace characters that
-occur in the <tt>content</tt> of an element.
-Those options are described in these sections:
 <ul>
-<li>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]
+<li>[[LoadXml (XmlDoc/XmlNode function)]]
-<li>[[WebReceive (XmlDoc function)|WebReceive]]
+<li>[[WebReceive (XmlDoc function)]]
 </ul>
-<li>The XmlDoc API deserialization (and serialization) methods
-honor the <tt>xml:space</tt> attribute:
+<li>The XmlDoc API deserialization (and serialization) methods honor the <code>xml:space</code> attribute:
-After the XML standard whitespace transformations,
+After the XML standard whitespace transformations, any whitespace within the scope of <code>xml:space="preserve"</code>
-any whitespace within the scope of <tt>xml:space="preserve"</tt>
+is retained as is, regardless of the whitespace-handling option in effect for the deserialization method.
-is retained as is, regardless of
+Elements that are in the scope of <code>xml:space="default"</code> have whitespace handled
-the whitespace-handling option in effect for the deserialization method.
-Elements that are in the scope of <tt>xml:space="default"</tt>
-have whitespace handled
 according to the whitespace-handling option in effect for the deserialization.
 The individual method descriptions cited above have more information.
 </ul>
 =====Normalized line-end=====
-As specified in &ldquo;2.11 End-of-Line Handling&rdquo; of the <i><b>W3C XML Recommendation</b></i>,
+As specified in "2.11 End-of-Line Handling" of the <i>W3C XML Recommendation</i>,
-all instances of a carriage return character followed by a linefeed character
+all instances of a carriage return character followed by a linefeed character (CR-LF sequence),
-(CR-LF sequence),
 as well as all instances of a carriage return not followed by a linefeed,
 are converted to a single linefeed character.
@@ Line 919: / Line 858: @@
 This behavior only applies to deserialization: there is no modification
 of whitespace characters in values passed as the <i><b>value</b></i>
-argument of the XmlDoc API Add* and Insert* methods
+argument of the XmlDoc API Add* and Insert* methods that allow a value argument.
-that allow a value argument.
+Therefore the values of the <code>FOO1</code> and <code>FOO2</code> elements
-Therefore the values of the &ldquo;FOO1&rdquo; and &ldquo;FOO2&rdquo; elements
+created by the <var>LoadXml</var> (deserialization) and <var>AddElement</var> invocations below are different:
-created by the LoadXml (deserialization) and AddElement invocations below are different:
+<p class="code">&#42; Get EBCDIC carriage return and linefeed:
-<pre>
+%cl = $X2C('0D25')
-    * Get EBCDIC carriage return and linefeed:
-    %cl = $X2C('0D25')
-    * This Element value is linefeed:
+&#42; This Element value is linefeed:
-    %node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>')
+%node = %doc:LoadXml('<top> <FOO1>' With %cl With '</FOO1> </top>')
-    * This Element value is carriage return and linefeed:
+&#42; This Element value is carriage return and linefeed:
-    %node:AddElement('FOO2', %cl)
+%node:AddElement('FOO2', %cl)
-</pre>
+</p>
-Also, the normalization applies to the characters in the input
+Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution.
-serialized string, not the values after entity substitution.
+Therefore the values of <code>FOO1</code> and <code>FOO2</code> created by the following two <var>LoadXml</var> invocations are different:
-Therefore the values of &ldquo;FOO1&rdquo; and &ldquo;FOO2&rdquo; created by the following two
+<p class="code">&#42; Get EBCDIC carriage return and linefeed:
-LoadXml invocations are different:
+%cl = $X2C('0D25')
-<pre>
-    * Get EBCDIC carriage return and linefeed:
-    %cl = $X2C('0D25')
-    * Element value is linefeed:
+&#42; Element value is linefeed:
-    %doc:LoadXml('<FOO1>' With %cl With '</FOO1>')
+%doc:LoadXml('<FOO1>' With %cl With '</FOO1>')
-    %doc = New
+%doc = New
-    * Element value is carriage return and linefeed
+&#42; Element value is carriage return and linefeed
-    * (note, character references are ISO-10646):
+&#42; (note, character references are ISO-10646):
-    %doc:LoadXml('<FOO2>&amp;#x0D;&amp;#x0A;' With '</FOO2>')
+%doc:LoadXml('<FOO2>&amp;#x0D;&amp;#x0A;' With '</FOO2>')
-</pre>
+</p>
 Linefeed characters not removed by the normalization described above
 and belonging to the Text node child of an element
 (but not in any other type of node) can further be affected by the
-whitespace-handling options of
+whitespace-handling options of <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var> and <var>[[WebReceive (XmlDoc function)|WebReceive]]</var>.
-[[LoadXml (XmlDoc/XmlNode function)|LoadXml]] and [[WebReceive (XmlDoc function)|WebReceive]].
 =====Normalized attribute value=====
-After replacing all CR-LF sequences, and all other CR instances,
+After replacing all CR-LF sequences, and all other CR instances, by LF (as described in [[#Normalized line-end|Normalized line-end]]),
-by LF (as described in [[#Normalized line-end|Normalized line-end]]),
 attribute values have additional whitespace normalization.
-As specified in &ldquo;3.3.3 Attribute-Value Normalization&rdquo; of the <i><b>W3C XML Recommendation</b></i>,
+As specified in "3.3.3 Attribute-Value Normalization" of the <i>W3C XML Recommendation</i>,
-after the CR-LF normalization, every instance of a
+after the CR-LF normalization, every instance of a whitespace character (tab and linefeed)
-whitespace character (tab and linefeed)
 in an attribute value is converted to a space character.
-Leading and trailing spaces
+Leading and trailing spaces are not stripped, nor are sequences of multiple spaces collapsed.
-are not stripped, nor are sequences of multiple spaces
-collapsed.
 This behavior only applies to deserialization; that is, there is no modification
-of whitespace characters in attribute values passed as the <i><b>value</b></i>
+of whitespace characters in attribute values passed as the <var class="term">value</var>
-argument of the [[AddAttribute (XmlNode function)|AddAttribute]] function..
+argument of the <var>[[AddAttribute (XmlNode function)|AddAttribute]]</var> function.
-Therefore the values of the &ldquo;FOO&rdquo; attribute created by the following two
+Therefore the values of the <code>FOO</code> attribute created by the following two methods are different:
-methods are different:
+<p class="code">&#42; Get EBCDIC carriage return:
-<pre>
+%c = $X2C('0D')
-    * Get EBCDIC carriage return:
-    %c = $X2C('0D')
-    * Attribute value is space:
+&#42; Attribute value is space:
-    %doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>')
+%doc:LoadXml('<top FOO="' With %c With '"> <in/> </top>')
-    * Attribute value is carriage return:
+&#42; Attribute value is carriage return:
-    %doc:AddAttribute('FOO', %c, '/*/*')
+%doc:AddAttribute('FOO', %c, '/*/*')
-</pre>
+</p>
-Also, the normalization applies to the characters in the input
+Also, the normalization applies to the characters in the input serialized string, not the values after entity substitution.
-serialized string, not the values after entity substitution.
+Therefore the values of the <code>FOO</code> attribute created by the following two <var>LoadXml</var> invocations are different:
-Therefore the values of the &ldquo;FOO&rdquo; attribute created by the following two
+<p class="code">&#42; Get EBCDIC carriage return:
-LoadXml invocations are different:
+%c = $X2C('0D')
-<pre>
-    * Get EBCDIC carriage return:
-    %c = $X2C('0D')
-    * Attribute value is space:
+&#42; Attribute value is space:
-    %doc:LoadXml('<top FOO="' With %C With '"/>')
+%doc:LoadXml('<top FOO="' With %C With '"/>')
-    %doc = New
+%doc = New
-    * Attribute value is carriage return - note CR
+&#42; Attribute value is carriage return - note CR
-    * is the same in EBCDIC and ISO-10646:
+&#42; is the same in EBCDIC and ISO-10646:
-    %doc:LoadXml('<top FOO="#x0D;"/>')
+%doc:LoadXml('<top FOO="#x0D;"/>')
-</pre>
+</p>
-'''Note:'''
-Whitespace in an attribute (and in any type of node other than
+<p class="note">'''Note:''' Whitespace in an attribute (and in any type of node other than
 a Text node child of an element) is '''not''' affected by the
-whitespace-handling options of [[LoadXml (XmlDoc/XmlNode function)|LoadXml]],
+whitespace-handling options of <var>[[LoadXml (XmlDoc/XmlNode function)|LoadXml]]</var>,
-[[WebReceive (XmlDoc function)|WebReceive]], and [[ParseXml (HttpResponse function)|ParseXml]].
+<var>[[WebReceive (XmlDoc function)|WebReceive]]</var>, and <var>[[ParseXml (HttpResponse function)|ParseXml]]</var>. </p>
 ===Language identification===
 From the <i><b>W3C XML Recommendation</b></i>:
-&ldquo;A special attribute named xml:lang may be inserted in documents to
+"A special attribute named xml:lang may be inserted in documents to
 specify the language used in the contents
-and attribute values of any element in an XML document.&rdquo;
+and attribute values of any element in an XML document."
-In versions of ''Janus SOAP'' prior to 6.8, the <tt>xml:lang=".."</tt>
-attribute was accepted regardless of its value.
-As of version 6.8, the only valid values of such attributes are
-the language identifier tags specified in IETF RFC 3066
-(http://www.w3.org/TR/REC-xml/#RFC1766).
+The only valid values of the <code>xml:lang=".."</code> attribute that <var class="product">Janus SOAP</var> accepts are the language identifier tags specified in IETF RFC 3066 (http://www.w3.org/TR/REC-xml/#RFC1766).
 ==References==
-As mentioned, the XML support in ''Janus SOAP'' is heavily oriented to the concepts and facilities defined by
+As mentioned, the XML support in <var class="product">Janus SOAP</var> is heavily oriented to the concepts and facilities defined by
 the XML standards.
 There are two key aspects of XML that application developers should understand at an appropriate level of detail:
@@ Line 1,036: / Line 957: @@
 <td>By Elliotte Rusty Harold and W. Scott Means (Second Edition: June, 2002, publisher O'Reilly & Associates), this book is one of many to cover XML, Namespaces, XML Schema, XSLT, XPath, XML processors, and more. It has the benefit of its smaller size; its good examples; and its good summary of the history of XML.
 <p>
-For XML programming using ''Janus SOAP'' or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards. </p></td></tr>
+For XML programming using <var class="product">Janus SOAP</var> or other platforms, some of this book, and the others like it, may be irrelevant or even confusing (because it's scope is so large), but it is accurate and probably easier to read than the more formalized W3C standards. </p></td></tr>
 <tr><td>XML background</td>
 <td>http://www.w3.org/XML/1999/XML-in-10-points</td></tr>
@@ Line 1,054: / Line 975: @@
 This section lists some of the XML-related standards documents that are available.
-The World Wide Web Consortion (or &ldquo;W3C&rdquo;) is the body that creates the XML
+The World Wide Web Consortion (or "W3C") is the body that creates the XML
 standards, along with other Internet standards, such as HTML, XHTML, and HTTP.
-The term &ldquo;Recommendation,&rdquo; in W3C parlance, means that the
+The term "Recommendation," in W3C parlance, means that the
 standard has been approved by the W3C.
@@ Line 1,062: / Line 983: @@
 date on which that status was achieved,
 and the URL that can be used to obtain the document:
-<table>
+<table class="thJustBold">
 <tr><th nowrap>Extensible Markup Language (XML) 1.0 (Third Edition) </th>
 <td>W3C Recommendation 04 February 2004: <br>http://www.w3.org/TR/REC-xml
 <p>
-This is referred to as the <i><b>W3C XML Recommendation</b></i> throughout this article. </p></td></tr>
+This is referred to as the <i>W3C XML Recommendation</i> throughout this article. </p></td></tr>
 <tr><th>Namespaces spec </th>
 <td>http://www.w3.org/TR/REC-xml-names
@@ Line 1,093: / Line 1,014: @@
 [[Category:Overviews]]
+[[Category:Janus SOAP]]

&	ampersand (`&`)
'	apostrophe (`'`)
>	greater than (`>`)
<	less than (`<`)
"	double quotation mark (`"`)
[ ]	left and right square brackets (`[` `]`) (as of Model 204 7.6)

XML processing in Janus SOAP: Difference between revisions

Latest revision as of 19:17, 13 May 2016

Standards relevant to Janus SOAP XML facilities

eXtensible Markup Language (XML)

Simple Object Access Protocol (SOAP)

Example SOAP request

Example SOAP response

XML Path Language (XPath) in the XmlDoc API

XML

XML example

XML syntax

Syntax of document, element, Attribute, Comment, PI

Char and Reference

ISO-10646 and EBCDIC characters

Entity references

Components of XMLDecl

Names and namespaces

Name and namespace syntax

Uniform Resource Identifier syntax

Well-formed documents and validation

Normalization during deserialization

Normalizing entity and character references

Normalizing whitespace characters

Normalized line-end

Normalized attribute value

Language identification

References

W3C standards

Navigation menu

Search