XPath

From m204wiki
Jump to navigation Jump to search

This chapter has information to help you use XPath arguments to various XmlDoc API methods. Most of the information is taken from the XPath 1 standard, which is the authoritative reference:

    http://www.w3.org/TR/xpath

References to Version 2 of the XPath standard in this manual are to the XPath 2 standard, which became a W3C Recommendation on January 23, 2007:

    http://www.w3.org/TR/xpath20

The five sections in this chapter explain, respectively, the following:

  1. How to understand the components of an XPath expression, which is composed of steps.
  2. The syntax of XPath.
  3. Some subtle aspects of XPath.
  4. Specific XPath axis combinations to avoid.
  5. The subset of XPath supported by the current version of the XMlDoc API.

XPath operation

The purpose of XPath is to select a subset of nodes from a document. This selection is done using an expression, as described by the PathExpr production (these syntax productions for XPath are shown in XPath syntax). The simple form (that is, without parentheses) of a PathExpr expression is called a Location Path. (the LocPath production ([1]) in the syntax).

A Location Path consists of a series of Steps (Step ([4]) production). Each Step operates by taking an input set of nodes from the preceding step, and creating an output set of nodes. The output of the last step is the set of nodes selected by the XPath expression.

An example XPath expression is:

    pitm[2]/partnum

This expression contains two Steps (the slash symbol ( / ) is used to separate the Steps in a Location Path).

Often a Step will start with an element name, which selects all the child elements with that name. In the above example, partnum children of pitm elements are selected. These child relationships are one kind of relationship between the input to a Step and the first part of the algorithm; the kinds of relationships, or Axes, are shown in the AxisName ([6]) production.

The element names in the above example are a form of NodeTest, described by the NodeTest ([7]) production. The NodeTest is used to restrict the set of nodes.

Square brackets ([ ... ]) in a Step surround another form of restriction, which is called a Predicate, given by the production ([8]) with the same name. A Predicate is a much more open-ended type of restriction, allowing various functions and operations, including Booleans.

The operation of a Step is as follows:

  1. A Step consists of an Axis, NodeTest, and zero or more Predicates.
  2. The input to a Step is a set of context nodes.
  3. The Axis produces sets of nodes, one set for each context node.
  4. Each of these sets is filtered by the NodeTest.
  5. Each of the resulting sets is filtered by the first Predicate.
  6. Each of the sets which are output by a Predicate is filtered by the following Predicate, if any.
  7. The final filtered sets are combined (using set union), and the result is the set of nodes which becomes input to the following Step.
  8. The result of the final Step is the result of the Location Path.

Axes

The various forms of the AxisName ([6]) production generate nodes based on a context node using the simple tree relationships described by the name. For example, attribute:: (abbreviated as @, the at sign) generates the set of all attributes of a context node.

The XmlDoc API supports the following axes (be sure to also read Performance considerations: Document order, certain axes):

ancestor
Contains the parent of the context node, its parent, and so on, to and including the Root node of the XmlDoc.
ancestor-or-self
The same contents as ancestor, except that it also includes the context node.
attribute
Contains the attributes of the context node, which must be an element.
child
Contains the children of the context node.
descendant
Contains the context node children, their children, and so on. Since the children of a node do not include its attributes, this axis does not include any attributes (so, this is not equivalent to a “sub-tree”).
descendant-or-self
The same contents as descendant, except that it also includes the context node. Specified by //, an abbreviation for the step consisting of descendant-or-self::node(), this axis can be used, for example, to locate an element by its name, if the “path” to it is not known:
    //foo
following
Contains all the nodes that are after the context node in document order, excluding any descendants and excluding attribute nodes.
following-sibling
Contains all the siblings of the context node that are positioned after that node. Contains no siblings if context node is an attribute.
parent
Contains the parent of the context node. Each node has only one parent, except the root node, which has no parent.
preceding-sibling
Contains all the siblings of the context node that are positioned before that node. Contains no siblings if context node is an attribute.
self
Contains the context node itself.

The following XPath axes are not supported in the current version:

namespace
In keeping with the XPath 2 recommendation, Sirius does not plan to support this axis at any time. You can obtain the information provided by namespace declarations by using certain XmlDoc API methods, for example, the URI function on an XmlNode.
preceding
Support for this axis may be added in a later version.

NodeTests

The various forms of the NodeTest ([7]) production filter nodes as follows:

NodeType '(' ')'
This selects any node that has the respective node type, for example comment() selects all Comment nodes in a node set.
'processing-instruction' '(' Lit ')'
This selects any Processing Instruction node if the target name is equal to the value of Lit.
'*' | NCName ':' '*' | QName
These forms test the name of a node, after restricting the type of node to the “principal node type” of the Axis, as follows:
  • Name tests in the attribute:: Axis restrict to Attribute nodes.
  • Name tests in any other Axis restrict to Element nodes.

The name tests then filter the resulting nodes as follows:

'*'
This selects a node of the selected type regardless of the node's name. The “selected type” is the principal node type of the node subset selected by the preceding Axis. The default Axis type is child, so the default node type is Element.
NCName ':' '*'
This selects a node of the selected type if the node has an associated namespace equal to the URI associated with NCName.
QName
This selects a node of the selected type if the node has the same name as QName.

Namespace URI associations and QName equality are discussed in Names and namespaces.

Predicates

Each nodeSet that is the result of NodeTest filtering is input to the series of Predicates in the Step. Each Predicate's result sets are passed to the following one, and a union of the results of the last Predicate (or the NodeTest, if there are no Predicates) forms the result of the Step.

There are a variety of Predicates, and except for a numeric Predicate, a Predicate selects a node if the value of the Predicate, converted to a Boolean, is true.

Within a single step:

  • Multiple Predicates are allowed (as of Sirius Mods 7.2); prior to version 7.2, only some cases of multiple predicates in a step were valid.
  • Location paths within Predicates can themselves contain Predicates (as of Sirius Mods 7.2); prior to version 7.2, nested predicates are not allowed.

Common forms of predicates use Location Path expressions, XPath functions, or a combination of these. For example, the following path selects all contact children of the second cust element that have a fax child element:

    /active/cust[position(2)]/contact[fax]

Location Paths (described in XPath operation) in a predicate can be any supported Location Path, including multi-step and absolute expressions.

You can also use a location path or the number() function, followed by a comparison operator and literal, as a predicate of an XPath expression, as described in Comparison tests in predicates.

For a description of the XPath predicates currently supported in the XmlDoc API, see Predicates supported in the current version

Functions

There are many functions defined for XPath; this section merely gives a sample of some of them. Furthermore, as of the current version, many of these are not supported. See XPath functions supported in the current version for a list of the XPath functions currently supported.

In the XmlDoc API, XPath functions are used only in predicates.

Here are some XPath functions that return a numeric result:

last()
Returns the size (number of nodes) of the set that the predicate is filtering
position()
Returns the position of the node in the set that the predicate is filtering The position in the set is dependent on the Axis in effect for the Predicate. The following axes use the reverse of document order to arrange the nodes in a node set:
  • ancestor
  • ancestor-or-self
  • preceding
  • preceding-sibling

All other axes use document order to arrange the nodes in a node set. Note: If a PathExpr is parenthesized and followed by Predicates, and if position() is used in those predicates, the axis in effect is the child axis.

count(nodeSet)
Returns the number of nodes in the argument nodeSet Notice that this function uses a nodeSet type argument. As in this example, a LocPath can be passed:
    /book/chapter[count(section) >= 3]

This expression will select all chapters that have three or more sections.

number(object?)
Returns the numeric value of the argument (after stripping leading and trailing blanks if it is a string object). If the argument cannot be converted to a number, it returns the special value “NaN” (“Not A Number”), which is not equal to any other object (including another expression whose result is NaN). The default argument is the context node (of the “containing” expression, which, for our purposes, is the context node of the predicate containing the number() function). If the argument is a nodeSet, the value of the first node (in document order) of the argument nodeSet is converted to a numeric value as above. If the argument is the empty nodeSet, NaN is returned.

Here are some XPath functions that return a string result:

string(object?)
This function, like several other XPath functions, allows different kinds of arguments: for example, it can be used to convert a number to a string. The string() function is implicitly used when a comparison is made between a node and a string value, for example:
    /book/chapter[@title = 'Introduction']

In this case, each node in the node set that is the result of the @title PathExpr is converted using string(), then compared to the string literal Introduction.

The default argument of the string() function is the context node of the expression. The string() function, when given a node set argument, uses the string value of the first node, in document order, of that node set.

substring(string, number, number?)
Returns the substring of the first argument, starting at the position specified by the second argument, and for the number of characters specified by the third argument (or the remainder of the string, if the third argument is omitted). As with most XPath expressions, conversions are freely done, so if the first argument is not a manifest string type, it is converted to one using the string() function.

This XPath function returns a boolean result:

not(boolean)
Returns true if its argument is false, and false otherwise. For example, not(paragraph[3]) selects a node if it is not the third paragraph child.

XPath syntax

This section contains a version of the XPath syntax. See XML syntax for an explanation of the syntax conventions. The syntax below has been changed from that in the XPath Recommendation in these ways:

  • Names of some non-terminals have been changed (for example, “Lit” rather than “Literal”).
  • Some productions have been collapsed. This introduces superficial ambiguity that is dealt with as needed, for example, showing the precedence of operators.
  • A few of the productions have been moved, to illustrate that the PathExpr is the syntax goal. It is the form of expression that selects a set of nodes, which is the purpose of XPath in the XmlDoc API. In other places in this manual, an “XPath expression” means a PathExpr.

For a “cross-reference” to the productions as contained in the XPath Recommendation, see XPath syntax cross-reference.

See also XPath supported in the current version.

[A19] PathExpr  ::= LocPath
        | PrimaryExpr Predicate+
        | PrimaryExpr Predicate* '/' RelativeLocPath
        | PrimaryExpr Predicate* '//' RelativeLocPath
 
[B15] PrimaryExpr ::= Variable  |  Lit
        | Number  |  FunctionCall
        | '(' UnaryExpr ')'  |  '(' Expr ')'
 
[C27] UnaryExpr ::= PathExpr ('|' PathExpr)*
        | '-' UnaryExpr  |  PrimaryExpr
 
 [1]  LocPath ::= RelativeLocPath
        | AbsoluteLocPath
 [2]  AbsoluteLocPath ::= '/' RelativeLocPath?
        | '//' RelativeLocPath
 [3]  RelativeLocPath ::= Step ('/' Step)*
        | RelativeLocPath '//' Step
 
 [4]  Step ::=  '.'              /* self::node()    */
        | '..'                   /* parent::node()  */
        | (AxisName '::' | '@')? NodeTest Predicate*
 
 [6]  AxisName ::= 'ancestor'
        | 'ancestor-or-self'    | 'attribute'
        | 'child'               | 'descendant'
        | 'descendant-or-self'  | 'following'
        | 'following-sibling'   | 'namespace'
        | 'parent'              | 'preceding'
        | 'preceding-sibling'   | 'self'
 
 [7]  NodeTest ::=  '*'  |  NCName ':' '*'  |  QName
        | NodeType '(' ')'
        | 'processing-instruction' '(' Lit ')'
 
 [8]  Predicate ::= '[' Expr ']'
 
 [16] FunctionCall ::= FunctionName
        '(' ( Expr ( ',' Expr )*)? ')'
 
 [21] Expr ::= EqExpr ( ('and' | 'or') EqExpr )*
 [23] EqExpr ::= RelExpr ( ('=' | '!=') RelExpr )*
 [24] RelExpr ::= NumExpr
        ( ('<' | '>' | '<=' | '>=') NumExpr )*
 
 [25] NumExpr ::= UnaryExpr
        ( ('+' | '-' | '*' | 'div' | 'mod') UnaryExpr )*
 
 [29] Lit ::= '"' [^"]* '"'
        | "'" [^']* "'"
 [30] Number ::= [0-9]+ ('.' [0-9]*)?
 [35] FunctionName ::= QName - NodeType
 [36] Variable ::= '$' QName
 [38] NodeType ::= 'comment'    |  'text'
        | 'processing-instruction'  | 'node'

For information about XPath functions, see Functions.

Syntax notes:

  • In [A19], [2], and [3], the double slash (//) is an abbreviation for
        /descendant-or-self::node()/
    
  • When an at sign (@) is used in a Step ([4]), it is an abbreviation for
        attribute::
    
  • The syntax for Step ([4]) notes that it may begin directly with a NodeTest. In that case, child:: is implied before the NodeTest.
  • The syntax for QName and NCName are given in Name and namespace syntax.
  • The precedence of expressions from Expr ([21]), EqExpr ([23]), RelExpr ([24]), and NumExpr ([25]) is as follows (lowest precedence first):
    1. or (Short-circuit evaluation)
    2. and (Short-circuit evaluation)
    3. =, != (On node sets geared to one operand singleton)
    4. <=, <, >=, > (Node sets geared to one operand singleton; no string ordering)
    5. +, -
    6. *, div, mod

    The operators are all left-associative. For example, 3>2>1 is the same as (3>2)>1, which evaluates to false.

  • The only forms of PrimaryExpr ([B15]) that can create node sets, and so be used in a PathExpr ([A19]), are the id() function and parenthesized LocPath ([1]) (or parenthesized unions of them, with '|' ([C27])).

See also the notes in XPath supported in the current version.

XPath syntax cross-reference

Here is a listing of all first productions contained in various numbered sections and unnumbered subsections from the XPath Recommendation. This may be helpful if you want to cross-reference the productions shown in XPath syntax with those in the XPath Recommendation.

Section in XPath Recommendation
Production number and name
2 Location Paths
[1] LocationPath
2.1 Location Steps
[4] Step
2.2 Axes
[6] AxisName
2.3 Node Tests
[7] NodeTest
2.4 Predicates
[8] Predicate
2.5 Abbreviated Syntax
[10] AbbreviatedAbsoluteLocationPath
3.1 Basics
[14] Expr
3.2 Function Calls
[16] FunctionCall
3.3 NodeSets
[18] UnionExpr
3.5 Numbers
[25] AdditiveExpr
3.7 Lexical Structure
[28] ExprToken
. . .
[39] ExprWhitespace (last production)

Some notes on XPath usage

The following subsections describe some subtle issues in XPath, which the XmlDoc API implements exactly as specified in the recommendations.

“//” and “.”

As mentioned in Performance considerations: Document order, certain axes, the descendant-or-self axis (commonly appearing in an XPath expression with the // abbreviation) should generally be avoided, due to possibly incurring extra CPU and DKRD overhead. In addition, if you have verified the performance questions, be sure that you understand the meaning of //. For example:

  • The expression //chapter[1] is not the first chapter in the document; it is the first chapter child of each element in the document.
  • An XPath expression that begins with // is an absolute expression; if you want to use it at the start of a relative XPath expression, make use of the . step:
        %chapter = %doc:SelectSingleNode('/book/chapter[title="Aerobics"]')
        %sections = %chapter('.//section')
    

Attributes: not children, and excluded from most axes

One subtle point to observe is that attributes are not children of their parents! As stated in the XPath Recommendation:

  • 2.2 Axes
  • . . .
    • the descendant axis contains the descendants of the context node; a descendant is a child or a child of a child and so on; thus the descendant axis never contains attribute or namespace nodes

Also, Attributes cannot be parents, and they are explicitly excluded from some axes.

The only axes which may include Attribute nodes are:

  • ancestor-or-self
  • attribute (abbreviated @)
  • descendant-or-self (abbreviated //)
  • self

Note that the child axis is not in the above list, and so it will never include an Attribute node; thus, since child is the default axis, an Attribute node will never the result of a step which does not explicitly include one of the above (abbreviated or unabbreviated) axes.

Order of nodes: node sets versus nodelists

The order of nodes in an XML document is the order in which the node (or its start-tag, in the case of an Element node) first occurs in the serial form of the document. Thus, in document order, the Root node is first, an Element node occurs before its Attribute nodes and Namespace declarations, which appear before the children of the Element, and so on.

For the sake of simplicity and to be consistent with XSLT, the order of nodes in an XmlNodelist in the XmlDoc API is also document order.

In XPath, however, document order is not always used. In an XPath expression step, the axis implies an order. This order is important for the position() and last() predicate functions, which filter a node based on its order in a node set. This order is the same as document order, except for the following axes (for which the order is the reverse of document order):

  • ancestor
  • ancestor-or-self
  • preceding (not supported by the XmlDoc API)
  • preceding-sibling

For example, consider the following document:

    <top>
      <a/>
      <b/>
      <c/>
      <d/>
      <e/>
    </top>

Using XPath to select the first of the “following” siblings (/*/c/following-sibling::*[1]) yields the equivalent of the first element in document order: d. However, selecting the first of the preceding siblings (/*/c/preceding-sibling::*[1]) yields the first element in reverse document order: b.

This reverse ordering is apparent in some contexts but not in others. For example, the XPath expression in the following statement is used against the document above (call it %doc) to select nodes into an XmlNodelist:

    %nodelist = %doc:SelectNodes('/*/c/preceding-sibling::*()')

The XmlDoc API (re)arranges the two found nodes in %nodelist into document order: a and b, in that order, and the following statement selects b, the second of the nodes:

    Print %nodelist:Item(2):LocalName

Yet, if you use the following statement (instead of the previous two) in an attempt to directly select the “second” preceding sibling, the result is node a:

    Print %doc:LocalName('/*/c/preceding-sibling::*[2]')

The Item method, above, selects the second node in the set of document-ordered nodes in the XmlNodelist. The position() function above (the 2 within the brackets) selects the second node in the set of reverse-document-ordered nodes passed from the preceding-sibling axis after filtering by the * NodeTest.

Performance considerations: Document order, certain axes

This section discusses the performance implications of evaluating certain XPath expressions. The expressions of concern have a common characteristic — they are not simple XPath expressions.

Simple XPath expressions, which have no special performance considerations, are any of these:

  • One or more steps containing only child, attribute, or self axes.
  • A parent axis used in the first step, after which may be one or more steps containing only child, attribute, or self axes.
  • A following-sibling axis used alone.

The rest of this section considers XPath expressions that are not simple and therefore might have negative performance implications. If your use of XPath is confined to the simple expressions defined above, the following discussion is not your concern.

The XPath expression arguments of methods like SelectNodes and UnionSelected in the XmlNodelist class designate a set of nodes. In addition to these “set-valued” methods, XPath expressions can be used in many XmlDoc API methods (SelectSingleNode, Value, DeleteSubtree, QName, and more) to operate on a single node that satisfies the expression. For a simple XPath expression (described above), the “single node“ methods may be able to determine the desired node by scanning fewer nodes, giving better performance than the set-valued methods for the same expression.

The single-node XPath selection by the XmlDoc API returns the first node in document order, but with non-simple XPath expressions, this is not the same as the first node found by the XPath internal selection algorithm, which may visit nodes in a different order. In those cases, an entire subtree is examined to determine the first node, in document order, that the XPath expression selects.

In other words, given an XPath expression expr that uses any of the axis cases described below in "The extra-processing expressions" and given any single-node selection method XMeth, this expression:

%obj:XMeth(expr)

scans as many nodes as:

%obj:SelectNodes(expr):Item(1):XMeth

For all other XPath expressions, the number of nodes scanned by the first of these two approaches may be significantly lower, because the first node internally selected will also be the first node in document order.

For example, consider the following document:

    <top>
       <a>
         <b x="1"/>
       </a>
       <b x="2"/>
    </top>

When, say, Value('/*/*/b/@x') is evaluated, the document search ends when the first match is found (and the Value method returns 1).

But when Value('//b/@x') is evaluated, the document search first finds the match x=2, then it continues searching the entire document for all matches, to ensure that the match which is lowest in document order (x=1) is the result.

The performance implications of the expressions that involve extra processing apply to the set-valued methods as well. The set methods must produce their results in document order, but the nodes selected during XPath evaluation may be selected in an order (due to the selection algorithm) that differs from document order.

The extra-processing expressions

Extra processing can occur in the following cases:

  1. The presence of the preceding-sibling axis
  2. The presence of the ancestor axis
  3. The presence of the ancestor-or-self axis
  4. The presence of the following axis, if it is not the first step in the expression
  5. The presence of any of these axis-combinations:
    • One of the following axes:
      • descendant
      • descendant-or-self
      • following
      • parent, if it is not the first step in the expression

      followed, in a subsequent step, by any of these axes:

      • parent
      • child
      • following-sibling
      • descendant
      • descendant-or-self
    • The parent axis, if it is not the first step in the expression, followed, in a subsequent step, by the
      • attribute axis.

In addition to the cost of the actual XPath search performed with the above expressions, they can incur an additional cost for XPath evaluation. If the document has been modified in such a way that the internal order of the nodes cannot be guaranteed to be the same as document order (this will always happen with any of the XML Insert..Before methods, and usually will happen with any of the Add.. methods), then the entire document (not only the subtree being searched) must be scanned so that the order is adjusted. This does not involve any internal movement of the nodes, but it does require a full scan.

Note:

  1. One important exception to the above rules is the descendant&hyph.or&hyph.self::node() step followed immediately by the child axis without any predicate. An example of the usual way to specify this is:
        //chapter
    

    In this case, the internal node selection algorithm operates in document order, and no extra processing is incurred.

    Even with this special case, it is better to avoid the descendant-or-self step (specified explicitly or by using //) if your document structure lends itself to explicitly specifying the “intermediate” elements with “*” (or even better, with their names) that should be matched.

  2. The considerations described in this section only apply to the “outer” XPath expression; they do not apply to any expression within a predicate. Although it is still better, for the sake of efficiency, to prune the search by explicitly specifying “intermediate” elements rather than using //, there is no efficiency concern due to the internal order of node selection with an XPath predicate such as the following:
        Print %d:Value('/book/chapter' With -
           '[.//credit/details/@auth="Dave"]')
    
  3. In conclusion, except when you must use the “//chapter” exception discussed in Note 1, above, avoid these extra-processing axes and axis combinations (especially in outer XPath expressions) if your documents are relatively large and performance is a consideration.

XPath supported in the current version

This section contains a condensed excerpt of the XPath syntax, showing only those parts of XPath used in the current version. It also explains any differences in the result of XPath expressions in the XmlDoc API versus that specified in the XPath 1 standard.

See also:

In most cases, the syntax below is a subset of the XPath 1 standard; the syntax of NumericLit ([30]), however, is an extension of XPath 1, whose comparable production (Number, [30]) is limited to the production below of DecimalNumber ([30d]).

 [1]  LocPath ::= RelativeLocPath
         | '/' RelativeLocPath?
 
 [3]  RelativeLocPath ::= Step ('/' Step)*
 
 [4]  Step ::= '.'     /* self::node()  */
         | '..'        /* parent::node() */
         | (AxisName '::' | '@')? NodeTest Predicate*
 
 [6]  AxisName ::= 'ancestor'
        | 'ancestor-or-self'    | 'attribute'
        | 'child'               | 'descendant'
        | 'descendant-or-self'  | 'following'
        | 'following-sibling'   | 'parent'
        | 'preceding-sibling'   | 'self'
 
 [7]  NodeTest ::=  '*'  |  NCName ':' '*'  |  QName
         | 'node()' | 'comment()'  | 'text()'
         | 'processing-instruction(' StringLit? ')'
 
 [8]  Predicate ::= '[' PredExpr ']' | 'not' '(' PredExpr ')'
 
      PredExpr ::= 'position()' CmpOp PositiveInteger
         | PositiveInteger  /* “simple position test” */
         | PathExpr    /* “Existence test” */
         | Comparison
         | PredExpr ('and' | 'or') PredExpr
         | '(' PredExpr ')'
 
      Comparison ::= PathExpr CmpOp StringLit
         | PathExpr CmpOp NumericLit
         | 'number' '(' PathExpr? ')' CmpOp NumericLit
 
      CmpOp ::= '=' | '!=' | '<' | '>' | '<=' | '>='
 
 [29] StringLit ::= '"' [^"]* '"' | "'" [^']* "'"
 
 [30] NumericLit ::= DecimalNumber ('E' ('+' | '-')? NonNegInteger)?
 
 [30d] DecimalNumber ::= ('+' | '-')? NonNegInteger Fraction?
         | ('+' | '-')? NonNegInteger '.'
         | ('+' | '-')? Fraction
 
 [30f] Fraction ::= '.' NonNegInteger
 
 [30p] PositiveInteger ::= [1-9] [0-9]*
 
 [30p] NonNegInteger ::= [0-9]+

Explanatory notes

The following notes are intended only to explain the above syntax; they do not present any limitations on XPath support in the XmlDoc API:

  • When the at sign (@) is used in a Step ([4]), it is an abbreviation for
        attribute::
    
  • The syntax for Step ([4]) notes that it may begin directly with a NodeTest. In that case, child:: is implied before the NodeTest.
  • The syntax for QName is given in Name and namespace syntax.
  • A node is selected by a PositiveInteger predicate if the position of the node, in the set which the predicate is filtering, is equal to that PositiveInteger.
  • A node is selected by an Existence test if the result of the PathExpr, using that node as the context node, is non-empty.
  • A node is selected by a Comparison if any node in the result of the PathExpr, using that node as the context node, holds the specified relationship to the Lit.

Restrictions and limitations

The following notes concern limitations on XPath support in the XmlDoc API:

  • One way to summarize the XPath productions that are not supported in the current version is to list the XPath operators that are not supported, as shown in the following table:
    Unsupported operators     Meaning
    +  -  *  div  mod         arithmetic
    |                         union
    

    Note: As of Sirius Mods version 7.2, parentheses are allowed for grouping within Boolean expressions within predicates, but this is the only place they are supported.

  • As of the current version, XPath function support is limited enough that it can be shown in the syntax above. However, for clarity, see XPath functions supported in the current version for a list of supported functions and for any differences between the the XmlDoc API implementation and the XPath 1 definition of a function.
  • The Boolean operators (and, or) and relational operators (= != < <= > >=) are supported only in (some) predicates (see Comparison tests in predicates).
  • XPath variables ($<var_name>) are not supported.
  • The numeric constant +/- infinity is not supported.
  • The size of an XPath expression is limited to approximately 26 steps, if each has an NCName NodeTest.
  • In the XmlDoc API, a numeric value (either a literal or a node value) may be of any form available in User Language. In particular, “E-format” literals, such as 1.003E-5 (even though they are not very common in XML documents) may be specified. The same form of numbers is available in XPath 2. XPath 1 only allows decimal numbers; it does not allow E-format literals nor node values.
  • The precision used in the XmlDoc API XPath support is that provided by User Language — namely, 15 decimal digits.
  • If the XPath support in the XmlDoc API attempts to convert a long string (that is, longer than 255 bytes) or a number whose absolute value is beyond the capabilities of User Language (maximum absolute value approximately 7.237E75), the request is cancelled.

Predicates supported in the current version

The XmlDoc API supports these predicates:

  • Two types of Location-Path-expression predicates:
    • A Location Path (that is, production [1], LocPath in XPath supported in the current version) used as an existence test If the nodeSet that results from the Location Path is non-empty, the predicate evaluates as true. The usual (but not only) purpose of this predicate is to select a node if it has at least one attribute or element with a given name. For example, the following expression selects all contact children of cust elements, if the cust element has an invoice child element and contact has a fax child element:
          /active/cust[invoice]/contact[fax]
      
    • A Location Path expression with a comparison operator and literal For example, @price > 200 selects a node if the numeric value of the node's price Attribute is greater than 200. See Comparison tests in predicates for further discussion.
  • These types of function predicates:
    • A “simple” position test using a numeric literal n This test is equivalent to the implicit use of the position() function in the predicate term position()=n. For example:
          /book/chapter[2]/section[9]/paragraph[3]
      
    • The number() function with a location path argument, followed by a comparison to a numeric literal For example, number(@size) > 30 selects a node if the numeric value of the node's size Attribute is greater than 30. This predicate differs from the similar Location Path example above (@price > 200) primarily in that it allows the Attribute value to be non-numeric. The previous Location Path example cancels the request if a numeric comparison is performed with a price Attribute whose value is non-numeric. See Comparison tests in predicates for further discussion.
    • The position() function, followed by a comparison operator, followed by an integer literal, which may be negative or zero For example:
          /book/chapter[position(<!--thinsp-->)>1]/section[2]
      
    • The not() function, which returns the opposite boolean value of its boolean argument. For example:
          /book/chapter[2]/section[9]/not(paragraph[3])
      
  • Nested predicates. For example, this statement selects Chapters whose first Section has a Racy attribute:
        %lis = %bk:SelectNodes('Chapter[Section[1 and @Racy]]')
    
  • Multiple predicates in a single step. For example, using the position() function to filter based on the position of nodes from the preceding predicate, rather than from the step's NodeTest:
        /book/chapter[author="Alex"] [2]
    

    The preceding two-predicate step selects the second chapter child that is authored by Alex, while the following expression selects the second chapter child of the book, if its author is Alex:

        /book/chapter[author="Alex" and 2]
    

    Parentheses for grouping in Boolean expressions are supported as of Sirius Mods version 7.2. For example:

        chapter[@type="methods" and
          (@class="Stringlist" or @class="Daemon")]
    
    note

    Prior to version 7.2, you could only simulate this parenthetical grouping by using a technique like the following:

        chapter[@type="methods"]
          [@class="Stringlist" or @class="Daemon"]
    

    And some Boolean parenthesized expressions could not use this technique, for example:

        [@a="w" or @a="x" and (@a="y" or @a="z")]
    
  • Combination predicates. Predicates may combine any of these supported functions and supported Location Path expressions using the and and or Boolean operators. For example:
        /active/cust[invoice and position>1]
    

XPath functions supported in the current version

The following XPath functions are supported in the current version, and "Functions" gives their XPath 1 definitions. Any differences between the XPath 1 definition and the the XmlDoc API implementation are shown below. Note: In discussing XPath functions, the name of the function followed by an empty pair of parentheses (for example, number()) is sometimes used to name the function, whether or not the particular function being discussed takes arguments.

  • position(), XPath
  • not(bool) The function argument is a Boolean expression, and the function result is true if the value of the argument is false, and it is false otherwise. Notes:
    • The result of the not() function applied to a comparison expression is different than the result of the same expression with the complementary comparison. For example, this statement selects children that have the value of the status attribute equal to “pending”:
          %lis = %nod:SelectNodes('*[@status="pending"]')
      

      This statement selects children that have the value of the status attribute equal to something other than “pending”:

          %lis = %nod:SelectNodes('*[@status!="pending"]')
      

      This statement selects children that have the value of the status attribute equal to something other than “pending” or that have no status attribute:

          %lis = %nod:SelectNodes('*[not(@status="pending")]')
      
  • number(nodeset?) The XmlDoc API number() function differs from the XPath 1 definition as follows:
    1. XPath 1 allows a variety of argument types (for example, a string literal); the XmlDoc API allows only a nodeSet argument.
    2. In XPath 1, if a nodeSet argument to number() contains more than one node, the first node (in document order) is converted to a number and returned.
      In the XmlDoc API, if the argument result contains more than one node, the request is cancelled, which is consistent with the XPath 2 standard.
    3. The definition of a numeric value for number() (after stripping leading and trailing whitespace) is the same as the NumericLit production ([30]) in XPath supported in the current version. This is consistent with the XPath 2 standard, and is an extension of the XPath 1 definition of number(), which only accepts numbers of the form DecimalNumber ([30d]).

Comparison tests in predicates

In the XPath standard (XPath 1 and XPath 2), either operand in a comparison test in a predicate may be any form of XPath expression. The predicate evaluates as true if the comparison is true of at least one node in the resulting nodeSet, and typically the purpose of the predicate is to select the nodes for which the comparison is true.

For example, the following expression selects all item children of order elements that have a price attribute whose value is greater than 9.99:

    order/item[@price > 9.99]

XmlDoc API predicate comparisons differ from XPath 1 comparisons:

  • In XmlDoc API predicates, comparison operands are more restricted than in the XPath standard. As is explained in the syntax discussion below, you can use only a Location Path or the number() function, followed by a comparison operator, followed by a literal.
  • The XmlDoc API uses XPath 2 comparisons. The XPath 1 standard does not provide for exception conditions and it does not provide for ordered string comparisons. This is also true for Microsoft .Net, which follows the XPath 1 standard. The XmlDoc API follows the XPath 2 standard by providing for exceptions (implemented as request cancellation conditions) and providing for ordered string comparisons.

As of the current version, the only forms of comparisons are these:

   position() relOp integer

   LocPath relOp "stringLiteral"

   LocPath relOp numericLiteral

   number([LocPath]) relOp numericLiteral


Where:

position()
This function is discussed in Predicates supported in the current version.
relOp
One of the comparisons: =, !=, <, <=, >, >=
integer
An integer, whose precision is limited to 15 decimal digits (as in User Language).
LocPath
An XPath location expression (that is, production [1], LocPath in XPath supported in the current version). As of the current version, such an expression still has the limitation that it may not contain a predicate.
"stringLiteral"
A quoted string literal value, which must not exceed 255 bytes. The XmlDoc API has always supported ordered string comparisons, but the XPath 1 standard does not. For more information about these comparisons, see Ordered string comparisons.
numericLiteral
A numeric literal value, whose precision is limited to 15 decimal digits (as in User Language). For additional format and size limitations, see "Restrictions and limitations". For more information about support for comparisons of a Location Path to a numeric literal, see Direct numeric comparison. Numeric literals in predicate comparisons are supported in the XmlDoc API as of Sirius Mods version 7.0.
number([LocPath])
The number() function with an optional Location Path argument. A comparison using the number() function is very similar to comparison of a Location Path to a numeric literal (Direct numeric comparison). Comparing the result of number() to a literal gives a result according to their relative values and to the comparison operator. For example, shirt[number(@size) > 30] selects nodes that have a size greater than 30. The significant difference between using a Location Path and using a number() function in a numeric comparison is that the request is cancelled in the former case if a node in the comparison is non-numeric. This difference is discussed briefly below and in greater detail in "number(LocPath) comparisons for non-numeric data". These are the effects of the function's LocPath argument (they are consistent with the XPath 2 standard):
  • If the result of the LocPath argument is a single node, number() converts the value of the node, after stripping leading and trailing whitespace, to a number, or to the special value NaN (“Not a Number“) if the stripped value of the node is not numeric.
  • If the result of the argument has more than one node, the request is cancelled. For further details, see number() comparisons that cause request cancellation.
  • If the LocPath argument is the empty nodeSet, the result of the number() function is NaN. See number(LocPath) != n, LocPath result is empty node-set for examples. Note: In XPath 1 (which has no exception conditions), if there is more than one node in the nodeSet argument, the value of the first node (in document order) is used.
  • If you omit the LocPath argument, the default argument “.” (the context node) is used; that is, the node that is being filtered by the predicate gets converted to a numeric value. For example: in the following XPath expression, the number() function converts the value of the size Attribute to a number:
        /*/shirt/@size[number() > 10]
    

If the number() result that is compared to a literal is NaN, the comparison is always false (or, in the case of the != operator, is always true). This is important to note, because it means number() can be used to avoid the request cancellation to which numeric comparisons are subject if the nodes evaluated by a predicate may be non-numeric. For further discussion, see number(LocPath) comparisons for non-numeric data.

If you are using number() to avoid request cancellation for a numeric comparison because the nodes evaluated by a predicate may be non-numeric, and you are using the “not equals” comparison (!=), remember that an empty nodeSet argument will give a true comparison result (as a consequence of the rule for comparing NaN). You can filter out the nodes included by empty nodeSet comparisons by expanding the locationPath expression from number(LocPath) to LocPath and number(LocPath), as described in "number(LocPath) != n, LocPath result is empty node-set".

The number() function is supported as of Sirius Mods version 7.0. Note: In the XmlDoc API, the number() function must be immediately followed by a comparison operator and a numeric literal. This limitation is not required by XPath 1 or XPath 2.

Ordered string comparisons

If an XPath expression contains a locationPath subexpression and a quoted string with an ordered comparison (that is, a comparison other than “=” and “!=”), the result is based on a byte-by-byte ordered comparison between each item of the nodeSet result of the subexpression and the literal string value.

Consider the following example:

    %nlis = %nod:SelectNodes('order[@date>"2007-01-01"]')

If the value of the date Attribute node of an order Element child is, for example, "2007-05-17", that order Element node will be included in the result.

This behavior has always been available in the XmlDoc API: if a comparison literal is bracketed in double or single quotation marks, a string comparison is performed, whether or not the literal has a numeric format (this is consistent with XPath 2 string comparisons). Note, however, that most practical ordered comparisons involve numeric values, which are supported in Sirius Mods version 7.0.

In XPath 1, any ordered comparison is done by first converting each operand to a numeric value and then performing the comparison, and the result of any ordered comparison of a non-numeric node value is false. Therefore, the XPath 1 result of the above example would always be empty, because the literal is a non-numeric value.

Direct numeric comparison

If an XPath expression contains a Location Path subexpression compared to a literal numeric value, the result is true if any node in the subexpression result, converted to a numeric value, has the specified relationship (“=”, “<”, etc.) to the literal value.

In the following example, an order child is in the result if it has an item child whose price Attribute node is greater than 99.99:

    %nlis = %nod:SelectNodes('order[item@price > 99.99]')

For more examples, see "Successful direct numeric comparisons" below.

If any node value used in the direct numeric comparison is non-numeric, the request is cancelled. For examples, see "Direct numeric comparisons that cause request cancellation" below.

Numeric comparisons are supported as of Sirius Mods version 7.0.

The discussion that follows makes references to the following “Clothes” document:

    <Clothes>
       <shirt size="32"  type="dress"  sku="100"/>
       <shirt size="33"  type="sport"  sku="101"/>
       <shirt size="M"   type="sport"  sku="102"/>
       <shirt size="34" type="frilly" sku="103"/>
    </Clothes>
  1. Successful direct numeric comparisons If a predicate contains a comparison of a Location Path to a numeric literal, the comparison is true if the numeric value, after stripping leading and trailing whitespace, of any of the nodes in the locationPath result has the specified relationship to the numeric literal. If none of the nodes has the relationship (which includes the case that the Location Path result is empty), the result of the comparison is false. For example, using the Clothes document described above, the following statement prints sku="100".
        %doc:Print('/*/shirt[@size<040]/@sku')
    

    Note that the numeric value of the node and the numeric value of the literal are compared, so the leading zero in 040 here is ignored. An equivalent comparison could be @size<040.00, etc. The following statement prints sku="101":

        %doc:Print('/*/shirt[@size<40 and @type="sport"]/@sku')
    

    The following statement prints sku="103"; the comparison of the size Attribute is processed for only one Element, which has a numeric size. As discussed below, this request would fail if the order of the attribute subexpressions were reversed.

        %doc:Print('/*/shirt[@type="frilly" and @size<40]/@sku')
    
  2. Direct numeric comparisons that cause request cancellation If any of the nodes used in a direct numeric comparison has a value that is non-numeric after stripping leading and trailing whitespace, the request is cancelled. For example, using the Clothes document described above, the following statement causes the request to be cancelled when the size attribute ("M") of the second sport shirt is compared to the number 40 (note the difference between this XPath expression and the last one in "Successful direct numeric comparisons" above):
        %doc:Print('/*/shirt[@size<40 and @type="frilly"]/@sku')
    

    The following statement causes the request to be cancelled (at the same Element), because SelectNodes continues after the first selection, unlike the Print example with the same XPath expression in "Successful direct numeric comparisons" above):

        %sh = %doc:SelectNodes( -
           '/*/shirt[@size<40 and @type="sport"]/@sku')
    

    If you want the request cancellation to be avoided in statements like these, consider using the number() function (see number(LocPath) comparisons for non-numeric data).

number(LocPath) comparisons for non-numeric data

The number() function can often be used to avoid request cancellation due to the presence of non-numeric data in a direct numeric comparison.

For example, both statements from "Direct numeric comparisons that cause request cancellation" above can avoid request cancellation (when used with the particular document discussed in that section) if the locationPath @size is “converted” using the number() function.

The following statement prints sku="103", even though the size Attribute equal to "M" is processed before the selected Element:

    %doc:Print('/*/shirt[number(@size)<40 and -
       @type="frilly"]/@sku')

Similarly, the following statement succeeds even though the size Attribute equal to "M" is processed (and not selected):

    %sh = %doc:SelectNodes( -
       '/*/shirt[number(@size)<40 and @type="sport"]/@sku')

Note that comparisons with the number() function are always false for a non-numeric node value, unless the comparison is "!=". The following statements both print None found:

    Print %doc:ValueDefault( -
        '/*/shirt[number(@size) < 40 and @sku="102"]/@size', -
        'None found')
 
    Print %doc:ValueDefault( -
        '/*/shirt[number(@size) >= 40 and @sku="102"]/@size', -
        'None found')

The following, however, prints M — the size of shirt with this SKU; its size is not less than 40 nor greater than or equal to 40, but it is not equal to 40 (nor any other number):

    Print %doc:ValueDefault( -
        '/*/shirt[number(@size) != 40 and @sku="102"]/@size', -
        'None found')

Note: Before substituting the number() function into a direct numeric comparison, you should be aware of two differences between direct numeric comparison and the use of number():

number(LocPath) != n, LocPath result is empty node-set

When a predicate contains the number() function followed by the “!=” comparison, if the nodeSet result is empty, the result of the comparison is true; if any other comparison operator is used, the result is false.

For example, consider this document:

    <t>
       <w a="1"/>
       <x a="PI"/>
       <y a="e" b="1"/>
       <z a="e" b="2"/>
    </t>

If you are using a numeric comparison to search for Attribute a, you should use the number() function to avoid request cancellation, because a has non-numeric values. The following statement sets the result nodelist to the Element w:

    %nlis = %doc:SelectNodes('/t/*[number(@a) = 1]')

The following statement sets the result nodelist to the Elements x, y, and z:

    %nlis = %doc:SelectNodes('/t/*[number(@a) != 1]')

The b Attribute, however, does not have any non-numeric values, so it can be used without number(). Each of the following two statements sets the result nodelist to the Element y:

    %nlis = %doc:SelectNodes('/t/*/[@b = 1]')
    %nlis = %doc:SelectNodes('/t/*[number(@b) = 1]')

However, the following two statements differ in their result:

    %nlis = %doc:SelectNodes('/t/*[@b != 1]')
    %nlis = %doc:SelectNodes('/t/*[number(@b) != 1]')

The first sets the result nodelist to the Element z, while the second includes Elements w and x as well as the Element z. Since they do not contain the b Attribute, the result of number(@b)!=1 at elements w and x is true.

If you want to make number() similar to a direct comparison in this respect, you “and” the locationPath argument with the number() factor in the predicate. So, for example, the following sets the result nodelist to the Element z, just like the direct comparison approach:

    %nlis = %doc:SelectNodes('/t/*[@b and number(@b) != 1]')

Note: The other way in which number() differs from direct comparison is described in number() comparisons that cause request cancellation.

number() comparisons that cause request cancellation

When a predicate contains the number() function, the request is cancelled if the value of the nodeSet argument to the number() function has more than one node.

For example, consider this document:

    <t>
       <x a="1" b="2"/>
       <y b="pi" a="3.14159265"/>
    </t>

If you are searching for all Elements that have any Attribute greater than 1, you can use the locationPath @* as a wildcard comparison for any Attribute. However, you cannot use direct comparison, because some of the attributes are non-numeric.

So, you might try to use number(@*), as in the following example:

    %nlis = %doc:SelectNodes('/t/*[number(@*) > 1]')

However, this will cause a request cancellation, because the value of @* contains more than one node. In such situations, you must decide which node is to be converted to a number for the comparison. In this case, you probably want to use:

    %nlis = %doc:SelectNodes( -
       '/t/*[number(@a) > 1 or number(@b) > 1]')

This will set the result nodelist to Elements x and y.