Skip to content
Stefan Bodewig edited this page Jan 23, 2016 · 14 revisions

Selecting Nodes

Sometimes you are not interested in the whole of a document but only want to compare parts of it, for example when your document contains a lot of boilerplate XML and you are just filling in a small part of it. You can use a NodeFilter to tell XMLUnit which parts it should ignore and which to compare.

Once XMLUnit is focused on the interesting parts, it may need help to pick the correct pairs of XML nodes to compare. The most strict scenario is one where the trees must be completely identical and the order of nodes is significant at every level - but there is a surprisingly big number of use cases where order is completely irrelevant. NodeMatcher is responsible for telling XMLUnit which nodes of the two documents it compares need to be matched with each other.

In order to properly use NodeFilter and NodeMatcher it is crucial to understand that XMLUnit traverses the document from its root element to the leaves in a depth-first approach and whenever it encounters an XML element, it consults NodeFilter to prune the child nodes that are not interesting and NodeMatcher to pick the branches of the two XML documents that should get compared. Once a branch has been chosen, there is no going back.

For example, assume a control document of

<table>
  <tbody>
    <tr>
      <th>some key</th>
      <td>some value</td>
    </tr>
    <tr>
      <th>another key</th>
      <td>another value</td>
    </tr>
  </tbody>
</table>

and a test document of

<table>
  <tbody>
    <tr>
      <th>another key</th>
      <td>another value</td>
    </tr>
    <tr>
      <th>some key</th>
      <td>some value</td>
    </tr>
  </tbody>
</table>

If your requirement is to ignore the order of <tr>s but identify matching rows based on the textual content of the <th> nodes, then NodeMatcher must already select the "correct" <tr> elements when it gets passed in the children of <tbody>. Once XMLUnit is set on the <tr> branches, there is no way to match nodes from one branch to those of another one.

This is you can't just say "match elements based on their name and textual content" because any two <tr>s have the same element name and the same textual content - none at all if ignoring element content whitespace. Therefore XMLUnit would simply match the <tr>s in document order an not select the rows the way you want them to be selected.

So when deciding what to prune in NodeFilter and in particular which parts to match in NodeMatcher you have to follow your structure towards the root of the document tree and find the common ancestor that needs to make the right decision for the order of branches you need.

NodeFilter and AttributeFilter

NodeFilter and AttributeFilter aren't interfaces of their own right but just Predicate<(Xml)Node> and Predicate<(Xml)Attr(ibute)> functional interfaces or delegates.

When XMLUnit visits an element, it will invoke the configured NodeFilter for each of the child nodes and ignore all nodes where the filter returned false.

Likewise it will invoke the configured AttributeFilter for each attribute of the element and ignore those where the filter returns false.

By default - if no NodeFilter or AttributeFilter have been configured at all - all child nodes and attributes are part of the comparison process.

As of XMLUnit 2.0.0 there is no public built-in implementation of NodeFilter or AttributeFilter.

NodeMatcher

(I)NodeMatcher searches the nodes which should be compared from the list of test- and control-nodes. It is invoked with the children of the current elements of the control and test documents and returns the matching pairs of nodes. Any node not returned as part of a matching pair is considered "unmatched" and will result in a failed CHILD_LOOKUP comparison.

Usually you won't implement (I)NodeMatcher itself but rather use the default implementation DefaultNodeMatcher and configure it to you needs.

DefaultNodeMatcher, ElementSelector and NodeTypeMatcher

The DefaultNodeMatcher implementation delegates the decision for each node to the ElementSelector and NodeTypeMatcher implementations passed in as arguments to its constructor.

  • ElementSelector: is used for all nodes of type (Xml)Element. The default implementation always returns true which makes XMLUnit compare all elements in document order.
  • NodeTypeMatcher: is used for any other nodes that are not (Xml)Elements. The default implementation matches nodes by their node type with one exception, CDATA and Text-nodes are considered the same kind of node.

ElementSelector receives a single element node from the control and the test document and decides, whether those two elements should be compared with each other by XMLUnit. DefaultNodeMatcher will try to match each control element with each test element that hasn't been matched already trying to stay in document order.

For example, when comparing

<root>
  <a/>
  <b/>
  <c/>
  <d/>
</root>

with

<root>
  <d/>
  <a/>
  <e/>
  <b/>
</root>

Assuming the configured ElementSelector would return true if the element names matched. DefaultNodeMatcher would invoke ElementSelector with the following pairs (the first one from the control, the second from the test document):

First argument Second argument Comment
a d
a a => matching pair found
b e tries to keep element order, so doesn't start over again
b b => matching pair found
c d hit end of list, start from the front
c e list exhausted, no match for c at all
d d hit end of list, start from the front => match

It is possible to configure DefaultNodeMatcher to use more than one ElementSelector when matching elements. If you do so, DefaultNodeMatcher will first try to find a matching test node for a given control node by consulting the first ElementSelector. If it didn't find any match it uses the second ElementSelector and so on.

ElementSelector is most likely the part that needs to get customized most often since the exact logic of matching branches with each other is very specific to each single use case.

Note that when you make XMLUnit visit elements in a different order than document order XMLUnit will report differences of type CHILD_NODELIST_SEQUENCE which in turn results in a SIMILAR outcome by DifferenceEvaluators.Default. If you want to suppress this difference completely you'll have to provide a custom DifferenceEvaluator as well.

XMLUnit comes with a several ElementSelector implementations most of which are available as static members of the ElementSelectors class.

ElementSelectors.Default

This is the ElementSelector used by DefaultNodeMatcher if no ElementSelector has been configured explicitly. It simply matches elements in document order, i.e. the first child element of any given control element is compared to the first child element of any given test element, the second to the second and so on.

Actually document order is ensured by DefaultNodeMatcher itself, this ElementSelector simply always returns true.

This implementation doesn't care about element names at all.

ElementSelectors.byName

Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same.

It doesn't care for namespace prefixes at all, neither does any of the other built-in ElementSelectors.

ElementSelectors.byNameAndText

Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and their textual content matches.

Example:

Control XML:

<flowers>
	<flower>Roses</flower>
	<flower>Daisy</flower>
	<flower>Crocus</flower>
</flowers>

Test XML:

<flowers>
	<flower>Daisy</flower>
	<flower>Roses</flower>
	<flower>Crocus</flower>
</flowers>

Without custom ElementSelector you will get a difference "Expected text value 'Roses' but was 'Daisy' ... ".

With a custom ElementSelectors.byNameAndText you can ensure the "right" nodes are compared with each others:

String controlXml = "<flowers><flower>Roses</flower><flower>Daisy</flower><flower>Crocus</flower></flowers>";
String testXml = "<flowers><flower>Daisy</flower><flower>Roses</flower><flower>Crocus</flower></flowers>";

Diff myDiff = DiffBuilder.compare(controlXml).withTest(testXml)
        .checkForSimilar() // a different order is always 'similar' not equals.
        .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byNameAndText))
        .build();

Assert.assertFalse("XML similar " + myDiff.toString(), myDiff.hasDifferences());

for Java, or for .NET:

string controlXml = "<flowers><flower>Roses</flower><flower>Daisy</flower><flower>Crocus</flower></flowers>";
string testXml = "<flowers><flower>Daisy</flower><flower>Roses</flower><flower>Crocus</flower></flowers>";

var myDiff = DiffBuilder.Compare(controlXml).WithTest(testXml)
        .CheckForSimilar() // a different order is always 'similar' not equals.
        .WithNodeMatcher(new DefaultNodeMatcher(ElementSelectors.ByNameAndText))
        .Build();

Assert.IsFalse(myDiff.hasDifferences(), "XML similar " + myDiff.toString());

ElementSelectors.byNameAndAllAttributes

Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and all attributes (as identified by their local name and namespace URI) have the same value.

ElementSelectors.byNameAndAttributes

Two elements are matched if their qualified name - i.e. the local name and the namespace URI (if any) are the same and all attributes who's names have been given as parameters have the same value.

There are two overloads of ElementSelectors.byNameAndAttributes, on accepts Strings and one QNames or XmlQualifiedNames. The string-arg version only considers attributes in the null-namespace (i.e. those with only a local name and no associated namespace URI).

ElementSelectors.byNameAndAttributesControlNS

Is a variant of ElementSelectors.byNameAndAttributes where attribute local names are given as strings and the namespace URI is expected to be the one defined for the attribute on the control element - this only works properly if the local names of the attributes are unique for the given elements.

ElementSelectors.byXPath

Expects an XPath expression yielding elements (where the XPath context "." is the current control or test element) and another ElementSelector as arguments. An additional overload allows you to provide the namespace context for the XPath expression.

When comparing to elements, the XPath expression is applied to the test and control elements and the resulting node lists are compared to each other using the given ElementSelector. The control and test elements match, if the given ElementSelector finds matching pairs for all node lists returned by the XPath expression.

This is a (partial) option for a case like the <table> example from the beginning of this chapter.

ElementSelectors.byXPath(".//th", ElementSelectors.byNameAndText) would match the "correct" <tr>s to each other. It is only a partial solution since it also works for <th> and <td> only by accident (the node lists are empty, so they match trivially) and blindly using byXPath in more complex scenarios is likely to fail.

ElementSelectors.not, and, or and xor

These are combiners for other ElementSelectors, where not negates an ElementSelector, or returns true if any of the given selectors does, all returns true if all of the given selectors would do and xor returns true if one of the two given selectors returns true and the other one returns false. To be honest xor is only there for completeness, so far we haven't seen a usecase for it.

There is an important difference between ElementSelectors.or and passing several ElementSelectors to the constructor of DefaultNodeMatcher. or will apply all ElementSelectors to each pair of elements immediately, while DefaultNodeMatcher tries all control elements for the first ElementSelector before consulting the second.

Example

Assuming

<root>
  <a>x</a>
  <b/>
  <a>y</a>
</root>

and

<root>
  <a>y</a>
  <b>some text</b>
  <a>x</a>
</root>

and you want to match by element name and nested textual content - but fall back to just the element's name if there is no match including the textual content.

Using DefaultNodeMatcher(ElementSelectors.byNameAndText, ElementSelectors.byName) will match the <a>s with matching textual content, just as required. Using ElementSelectors.or(ElementSelectors.byNameAndText, ElementSelectors.byName) the byNameAndText will return false for the first <a> elements, but byName will return true and so the "wrong" <a>s get compared to each other.

conditional ElementSelectors

ElementSelectors.conditionalSelector expects a Predicate that is applied to the control element when the selector is invoked and another ElementSelector. It returns true if and only if both the Predicate and the wrapped ElementSelector return true. This can be used together with the boolean combiners to build more complex ElementSelectors.

ElementSelectors.selectorForElementNamed is a convenience shortcut for conditionalSelector for the pretty common case of applying a given ElementSelector only to elements of a certain name. If has two overloads that either only uses the element's local name (when using a string argument) or the local name and namespace URI (when using the QName or XmlQualifiedName argument).

Using ElementSelectors.conditionalBuilder allows several ElementSelectors to be combined based on Predicates. It can be used to set up specific selectors for special nodes and combine them with a default to use for all elements that didn't match any of the predicates.

As explained above byXPath is only a partial solution to the problem of the beginning of this document. A more robust solution would be

ElementSelectors.conditionalBuilder()
    .whenElementIsNamed("tr").thenUse(ElementSelectors.byXPath("./th", ElementSelectors.byNameAndText))
    .elseUse(ElementSelectors.byName)
    .build();

The C# version would look almost the same just with capitalized member names.

MultiLevelByNameAndTextSelector

MultiLevelByNameAndTextSelector is one of two built-in ElementSelectors that is not accessible via a member of ElementSelectors but rather implemented in a class of its own - for .NET MultiLevelByNameAndTextSelector.CanBeCompared is the actual ElementSelector delegate.

It extends the idea of ElementSelectors.byNameAndText by matching elements of their names match as must the names of their only child elements for as many levels as is configured inside of the constructor and in addition the text nested into the child nested as deeply as given as constructor argument must match. This means new MultiLevelByNameAndTextSelector(1) and ElementSelectors.byNameAndText check the same properties.

This ElementSelector is only useful in very specific scenarios and has mostly been added to provide a replacement for XMLUnit for Java 1.x's MultiLevelElementNameAndTextQualifier.

ByNameAndTextRecSelector

ByNameAndTextRecSelector - or ByNameAndTextRecSelector.CanBeCompared in the .NET case - is the heir of XMLUnit for Java 1.x's RecursiveElementNameAndTextQualifier which has become the infamous default answer on Stackoverflow for user questions about XMLUnit not matching the proper elements - even and in particular when it would be the wrong choice.

ByNameAndTextRecSelector matches two elements, if their local name and namespace URIs (if any) and their nested text is the same and this condition also holds true for all nested child elements.

Many times it works in complex scenarios but often only by accident - for example if there is no nested text at all and a byName element selector would work at all.

Rather than using ByNameAndTextRecSelector blindly it is recommended to apply it only sparingly combined with more specific ElementSelectors inside a conditional construct. It would solve the problem of the example at the beginning of this document and probably is a better choice than byXPath as XPath expressions come at a higher price than the DOM tree traversal ByNameAndTextRecSelector has to perform. So for the sake of completeness

ElementSelectors.ConditionalBuilder()
    .WhenElementIsNamed("tr").ThenUse(ByNameAndTextRecSelector.CanBeCompared)
    .ElseUse(ElementSelectors.ByName)
    .Build();

would select the proper <tr>s - the Java version would be

ElementSelectors.conditionalBuilder()
    .whenElementIsNamed("tr").thenUse(new ByNameAndTextRecSelector())
    .elseUse(ElementSelectors.byName)
    .build();
Clone this wiki locally