public static class XOMUtil.Normalizer
extends java.lang.Object
' ', '\t', '\r', '\n'
.
This class is rarely needed by applications, but when it is needed it's pretty useful.
Modifier and Type | Field and Description |
---|---|
static XOMUtil.Normalizer |
COLLAPSE
Whitespace normalization replaces each sequence of whitespace in the string
by a single
' ' space character; Further, leading and trailing
whitespaces are removed, if present, ala String.trim() . |
static XOMUtil.Normalizer |
PRESERVE
Whitespace normalization returns the string unchanged; hence indicates no
whitespace normalization should be performed at all; This is typically the
default for applications.
|
static XOMUtil.Normalizer |
REPLACE
Whitespace normalization replaces each whitespace character in the
string with a
' ' space character. |
static XOMUtil.Normalizer |
STRIP
Whitespace normalization removes strings that consist of whitespace-only
(boundary whitespace), retaining other strings unchanged.
|
static XOMUtil.Normalizer |
TRIM
Whitespace normalization removes leading and trailing whitespaces, if
present, ala
String.trim() . |
Modifier and Type | Method and Description |
---|---|
void |
normalize(ParentNode node)
Recursively walks the given node subtree and merges runs of consecutive
(adjacent)
Text nodes (if present) into a single Text node containing
their string concatenation; Empty Text nodes are removed. |
public static final XOMUtil.Normalizer PRESERVE
public static final XOMUtil.Normalizer REPLACE
' '
space character.public static final XOMUtil.Normalizer COLLAPSE
' '
space character; Further, leading and trailing
whitespaces are removed, if present, ala String.trim()
.public static final XOMUtil.Normalizer TRIM
String.trim()
.public static final XOMUtil.Normalizer STRIP
public final void normalize(ParentNode node)
Text
nodes (if present) into a single Text node containing
their string concatenation; Empty Text nodes are removed. If present, CDATA
nodes are treated as Text nodes.
After merging consecutive Text nodes into a single Text node, the
given whitespace normalization algorithm is applied to each resulting
Text node. The semantics of the PRESERVE algorithm are the same as with the
DOM method org.w3c.dom.Node.normalize()
.
Note that documents built by a nu.xom.Builder
with the default
nu.xom.NodeFactory
are guaranteed to never have adjacent or empty
Text nodes. However, subsequent manual removal or insertion of nodes to the
tree can cause Text nodes to become adjacent, and updates can cause Text
nodes to become empty.
Text normalization with the whitespace PRESERVE algorithm is necessary to achieve strictly standards-compliant XPath and XQuery semantics if a query compares or extracts the value of individual Text nodes that (unfortunately) happen to be adjacent to other Text nodes. Luckily, such use cases are rare in practical real-world scenarios and thus a user hardly ever needs to call this method before passing a XOM tree into XQuery or XPath.
Example Usage:
Element foo = new Element("foo"); foo.appendChild(""); foo.appendChild("bar"); foo.appendChild(""); Element elem = new Element("elem"); elem.appendChild(""); elem.appendChild(foo); elem.appendChild("hello "); elem.appendChild("world"); elem.appendChild(" \n"); elem.appendChild(foo.copy()); elem.appendChild(""); XOMUtil.Normalizer.PRESERVE.normalize(elem); System.out.println(XOMUtil.toDebugString(elem));PRESERVE yields the following normalized output:
[nu.xom.Element: elem] [nu.xom.Element: foo] [nu.xom.Text: bar] [nu.xom.Text: hello world \n] [nu.xom.Element: foo] [nu.xom.Text: bar]In contrast, REPLACE yields the following hello world form:
[nu.xom.Element: elem] [nu.xom.Element: foo] [nu.xom.Text: bar] [nu.xom.Text: hello world ] [nu.xom.Element: foo] [nu.xom.Text: bar]Whereas, COLLAPSE yields:
[nu.xom.Element: elem] [nu.xom.Element: foo] [nu.xom.Text: bar] [nu.xom.Text: hello world] [nu.xom.Element: foo] [nu.xom.Text: bar]TRIM yields:
[nu.xom.Element: elem] [nu.xom.Element: foo] [nu.xom.Text: bar] [nu.xom.Text: hello world] [nu.xom.Element: foo] [nu.xom.Text: bar]Finally, STRIP yields the same as PRESERVE because the example has no whitepace-only results:
[nu.xom.Element: elem] [nu.xom.Element: foo] [nu.xom.Text: bar] [nu.xom.Text: hello world \n] [nu.xom.Element: foo] [nu.xom.Text: bar]
node
- the subtree to normalize