Encoding (XmlDoc property): Difference between revisions
mNo edit summary |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
===Syntax terms=== | ===Syntax terms=== | ||
<table class="syntaxTable"> | <table class="syntaxTable"> | ||
<tr><th>%currentString</th> | <tr><th nowrap>%currentString</th> | ||
<td>The string value of <var class="term">doc</var>'s <var>Encoding</var> property. </td></tr> | <td>The string value of <var class="term">doc</var>'s <var>Encoding</var> property. </td></tr> | ||
<tr><th>doc</th> | <tr><th>doc</th> | ||
Line 21: | Line 21: | ||
==Usage notes== | ==Usage notes== | ||
<ul> | <ul> | ||
<li>The setting of the <var>Encoding</var> property concerns the serialization of an <var>XmlDoc</var>. | <li>The setting of the <var>Encoding</var> property concerns the serialization of an <var>XmlDoc</var>. It is related but not equivalent to the value specified for <code>encoding</code> in the XML declaration of an XML document that is being <i>deserialized</i> (for example, <code><?xml version="1.0" encoding="UTF-16"...</code>). | ||
<p>An XML document that is the input to deserialization might have one of the following as the value of <tt>encoding</tt> in the XML declaration:</p> | <p> | ||
An XML document that is the input to deserialization might have one of the following as the value of <tt>encoding</tt> in the XML declaration:</p> | |||
<dl> | <dl> | ||
<dt>UTF-8 | <dt>UTF-8 | ||
Line 29: | Line 30: | ||
<dt>UTF-16 | <dt>UTF-16 | ||
<dd>This encoding, part of the Unicode standard, specifies that most Unicode characters with codes less than X'10000' are represented by two bytes, and additional byte pairs are used for other characters. | <dd>This encoding, part of the Unicode standard, specifies that most Unicode characters with codes less than X'10000' are represented by two bytes, and additional byte pairs are used for other characters. | ||
<p>This is allowed as the input to deserialization only if the document is represented in UTF-16, which also requires that the first two bytes of the document input stream is a Byte Order Mark (either X'FEFF' for Big Endian or X'FFFE' for Little Endian).</p> | <p> | ||
<p>Obversely, if the input is represented in UTF-16 and the XML declaration is present with <code>encoding</code> specified, its value must be <code>UTF-16</code>. However, <code>encoding</code> may also be omitted.</p> | This is allowed as the input to deserialization only if the document is represented in UTF-16, which also requires that the first two bytes of the document input stream is a Byte Order Mark (either X'FEFF' for Big Endian or X'FFFE' for Little Endian).</p> | ||
<p> | |||
Obversely, if the input is represented in UTF-16 and the XML declaration is present with <code>encoding</code> specified, its value must be <code>UTF-16</code>. However, <code>encoding</code> may also be omitted.</p> | |||
<dt>ISO-8859-<i>n</i> | <dt>ISO-8859-<i>n</i> | ||
<dd>This encoding (where <i>n</i> may range from 1 through 9) is the encoding used for ASCII characters in the full range up to X'FF'. All nine of these encodings (<code>ISO-8859-1</code>, <code>ISO-8859-2</code>, etc.) are treated the same. | <dd>This encoding (where <i>n</i> may range from 1 through 9) is the encoding used for ASCII characters in the full range up to X'FF'. All nine of these encodings (<code>ISO-8859-1</code>, <code>ISO-8859-2</code>, etc.) are treated the same. | ||
<p>This is allowed if the input contains ASCII, EBCDIC, or <var class="product">User Language</var> <var>Unicode</var> characters.</p> | <p> | ||
<p>This is required if the input contains ASCII characters greater than X'7F'.</p> | This is allowed if the input contains ASCII, EBCDIC, or <var class="product">User Language</var> <var>Unicode</var> characters.</p> | ||
<p> | |||
This is required if the input contains ASCII characters greater than X'7F'.</p> | |||
<dt><i>none</i> | <dt><i>none</i> | ||
<dd>If there is not an XML declaration containing an <code>encoding</code> value, the input is examined and processed using the character code set contained in the input, whether it is UTF-8, UTF-16, ASCII, EBCDIC, or <var class="product">User Language</var> <var>Unicode</var>. | <dd>If there is not an XML declaration containing an <code>encoding</code> value, the input is examined and processed using the character code set contained in the input, whether it is UTF-8, UTF-16, ASCII, EBCDIC, or <var class="product">User Language</var> <var>Unicode</var>. | ||
</dl> | </dl> | ||
All of the above values of <code>encoding</code> in the XML declaration must be specified in all-uppercase letters. | All of the above values of <code>encoding</code> in the XML declaration must be specified in all-uppercase letters. | ||
<p> | |||
The allowed XML declaration <tt>encoding</tt> values and input character sets are also described in two tables shown in [[LoadXml_(XmlDoc/XmlNode_function)#Usage_notes|"Usage Notes for LoadXML"]]. </p></li> | |||
<li>If a document is deserialized into an <var>XmlDoc</var>, the value of the <var>Encoding</var> property of the <var>XmlDoc</var> remains at or changes to <code>UTF-8</code> if either <code>UTF-8</code> or <code>UTF-16</code> is specified as the <code>encoding</code> value specified in the "XML declaration" of the document. Otherwise, the value of the <var>Encoding</var> property does not change. </li> | |||
<li><var>Encoding</var> may be a non-null string only if the value of the <var>[[Version (XmlDoc property)|Version]]</var> property is also a non-null string. </li> | |||
<li>The only impact of the <var>Encoding</var> property is the presence or absence of an encoding specification in the serialized form of the <var>XmlDoc</var>. | <li>The only impact of the <var>Encoding</var> property is the presence or absence of an encoding specification in the serialized form of the <var>XmlDoc</var>. | ||
</ul> | </ul> | ||
Line 50: | Line 58: | ||
<p class="code">begin | <p class="code">begin | ||
%doc object xmlDoc | %doc object xmlDoc | ||
%doc = new | |||
%doc:[[LoadXml_(XmlDoc/XmlNode_function)|loadXml]]('<a/>') | %doc:[[LoadXml_(XmlDoc/XmlNode_function)|loadXml]]('<a/>') | ||
%doc:[[Version (XmlDoc property)|Version]] = '1.0' | %doc:[[Version (XmlDoc property)|Version]] = '1.0' | ||
Line 62: | Line 71: | ||
==Request-cancellation errors (for set method)== | ==Request-cancellation errors (for set method)== | ||
This list is not exhaustive: it does <i>not</i> include all the errors that are request cancelling. | |||
<ul> | <ul> | ||
<li><var>Version</var> property is the null string, and <var class="term">newString</var> argument is not the null string. | <li><var>Version</var> property is the null string, and <var class="term">newString</var> argument is not the null string. |
Latest revision as of 21:15, 2 September 2015
Encoding of XML document (XmlDoc class)
The Encoding property indicates the value of encoding in the "XML declaration", if any, at the beginning of the serialized XML document.
Syntax
%currentString = doc:Encoding doc:Encoding = newString
Syntax terms
%currentString | The string value of doc's Encoding property. |
---|---|
doc | An XmlDoc object expression. |
newString | The string value to assign to doc's Encoding property. The only values to which you may set the Encoding property are:
|
Usage notes
- The setting of the Encoding property concerns the serialization of an XmlDoc. It is related but not equivalent to the value specified for
encoding
in the XML declaration of an XML document that is being deserialized (for example,<?xml version="1.0" encoding="UTF-16"...
).An XML document that is the input to deserialization might have one of the following as the value of encoding in the XML declaration:
- UTF-8
- This encoding, part of the Unicode standard, specifies that Unicode characters with codes less than X'80' are represented by one byte, and additional bytes are used for other characters. This is also allowed in the input to deserialization if the document is represented in EBCDIC or User Language Unicode.
- UTF-16
- This encoding, part of the Unicode standard, specifies that most Unicode characters with codes less than X'10000' are represented by two bytes, and additional byte pairs are used for other characters.
This is allowed as the input to deserialization only if the document is represented in UTF-16, which also requires that the first two bytes of the document input stream is a Byte Order Mark (either X'FEFF' for Big Endian or X'FFFE' for Little Endian).
Obversely, if the input is represented in UTF-16 and the XML declaration is present with
encoding
specified, its value must beUTF-16
. However,encoding
may also be omitted. - ISO-8859-n
- This encoding (where n may range from 1 through 9) is the encoding used for ASCII characters in the full range up to X'FF'. All nine of these encodings (
ISO-8859-1
,ISO-8859-2
, etc.) are treated the same.This is allowed if the input contains ASCII, EBCDIC, or User Language Unicode characters.
This is required if the input contains ASCII characters greater than X'7F'.
- none
- If there is not an XML declaration containing an
encoding
value, the input is examined and processed using the character code set contained in the input, whether it is UTF-8, UTF-16, ASCII, EBCDIC, or User Language Unicode.
All of the above values of
encoding
in the XML declaration must be specified in all-uppercase letters.The allowed XML declaration encoding values and input character sets are also described in two tables shown in "Usage Notes for LoadXML".
- If a document is deserialized into an XmlDoc, the value of the Encoding property of the XmlDoc remains at or changes to
UTF-8
if eitherUTF-8
orUTF-16
is specified as theencoding
value specified in the "XML declaration" of the document. Otherwise, the value of the Encoding property does not change. - Encoding may be a non-null string only if the value of the Version property is also a non-null string.
- The only impact of the Encoding property is the presence or absence of an encoding specification in the serialized form of the XmlDoc.
Examples
The following example sets the Version and Encoding properties:
begin %doc object xmlDoc %doc = new %doc:loadXml('<a/>') %doc:Version = '1.0' %doc:encoding = 'UTF-8' %doc:print End
The example result follows:
<?xml version="1.0" encoding="UTF-8"?> <a/>
Request-cancellation errors (for set method)
This list is not exhaustive: it does not include all the errors that are request cancelling.
- Version property is the null string, and newString argument is not the null string.
- newString is invalid.