Skip to content

Providing Input to XMLUnit

Stefan Bodewig edited this page May 30, 2015 · 16 revisions

Source and ISource

All core parts of XMLUnit use a single abstraction for "pieces of XML" they are supposed to work on. For Java this is javax.xml.transform.Source and for .NET we've created Org.XmlUnit.ISource which basically adds a wrapper around an XmlReader.

For Java many implementations of said interface are part of the Java class library, for .NET we've added the corresponding

  • ReaderSource - just wraps an existing XmlReader
  • DOMSource - creates a Source from an XmlNode
  • StreamSource - creates a Source from a TextReader, Stream or a string holding an URI
  • LinqSource - creates a Source from an XNode

At the time of this writing there is no XML-Serialization based equivalent of JAXBSource for .NET.

In order to make it easier to create instances of Source or ISource there a builder, that provides a fluent API.

CommentLessSource

CommentLessSource is a decorator of a different source and provides XML that consists of the original source's content with all comments removed.

Use this wrapper if you want XMLUnit to ignore comments.

This is class is used under the covers if you tell DiffBuilder to ignore comments.

WhitespaceStrippedSource

WhitespaceStrippedSource is a decorator of a different source that removes all empty text nodes and trims the remaining text nodes.

The main use of this decorator is to remove all "element content whitespace", i.e. text content between XML elements that is just an artifact of "pretty printing" XML.

Examples

Empty text nodes are removed:

<element>
</element>

becomes

<element></element>

Text Nodes are stripped:

<element>
  foo
</element>

becomes

<element>foo</element>

If the XML content has been created in memory rather than been deserialized from an external source it could contain adjacent Text nodes so that

<element>
  foo
  bar
</element>

could become

<element>foobar</element>

or

<element>
foo
bar
</element>

depending on how the document has been structured. In order to get more control the input had to be normalized (using Document.normalize() or XmlDocument.Normalize()) before wrapping it in a WhitespaceStrippedSource - or by using an additional NormalizedSource wrapper.

WhitespaceNormalizedSource

WhitespaceNormalizedSource is a decorator of a different source that replaces all whitespace characters found in Text nodes with Space characters and collapses consecutive whitespace characters into a single Space.

Examples

<element>a

    b
</element>

becomes

<element>a b </element>

NormalizedSource

NormalizedSource performs XML normalization on the wrapped document. This means adjacent text nodes are merged to single nodes and empty Text nodes removed (recursively). For Java when wrapping a Document rather than a Node additional normalizations may be preformed - see XmlNode.Normalize for .NET and Node#normalize as well as Document#normalizeDocument for Java.

When reading documents a parser usually puts the document into normalized form anway. You will only need to perform XML normalization on DOM trees you have created programmatically.

InputBuilder

Clone this wiki locally