Class XOMUtil.Normalizer

java.lang.Object
net.datenwerke.eximport.nuxlets.XOMUtil.Normalizer
Enclosing class:
XOMUtil

public static class XOMUtil.Normalizer extends Object
Standard XML algorithms for text and whitespace normalization (but not for Unicode normalization); type safe enum. XML whitespace is ' ', '\t', '\r', '\n'.

This class is rarely needed by applications, but when it is needed it's pretty useful.

  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    static final XOMUtil.Normalizer
    Whitespace normalization replaces each sequence of whitespace in the string by a single ' ' space character; Further, leading and trailing whitespaces are removed, if present, ala String.trim().
    static final XOMUtil.Normalizer
    Whitespace normalization returns the string unchanged; hence indicates no whitespace normalization should be performed at all; This is typically the default for applications.
    static final XOMUtil.Normalizer
    Whitespace normalization replaces each whitespace character in the string with a ' ' space character.
    static final XOMUtil.Normalizer
    Whitespace normalization removes strings that consist of whitespace-only (boundary whitespace), retaining other strings unchanged.
    static final XOMUtil.Normalizer
    Whitespace normalization removes leading and trailing whitespaces, if present, ala String.trim().
  • Method Summary

    Modifier and Type
    Method
    Description
    final void
    normalize(ParentNode node)
    Recursively walks the given node subtree and merges runs of consecutive (adjacent)
    invalid reference
    Text
    nodes (if present) into a single Text node containing their string concatenation; Empty Text nodes are removed.

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • PRESERVE

      public static final XOMUtil.Normalizer PRESERVE
      Whitespace normalization returns the string unchanged; hence indicates no whitespace normalization should be performed at all; This is typically the default for applications.
    • REPLACE

      public static final XOMUtil.Normalizer REPLACE
      Whitespace normalization replaces each whitespace character in the string with a ' ' space character.
    • COLLAPSE

      public static final XOMUtil.Normalizer COLLAPSE
      Whitespace normalization replaces each sequence of whitespace in the string by a single ' ' space character; Further, leading and trailing whitespaces are removed, if present, ala String.trim().
    • TRIM

      public static final XOMUtil.Normalizer TRIM
      Whitespace normalization removes leading and trailing whitespaces, if present, ala String.trim().
    • STRIP

      public static final XOMUtil.Normalizer STRIP
      Whitespace normalization removes strings that consist of whitespace-only (boundary whitespace), retaining other strings unchanged.
  • Method Details

    • normalize

      public final void normalize(ParentNode node)
      Recursively walks the given node subtree and merges runs of consecutive (adjacent)
      invalid reference
      Text
      nodes (if present) into a single Text node containing their string concatenation; Empty Text nodes are removed. If present, CDATA nodes are treated as Text nodes.

      After merging consecutive Text nodes into a single Text node, the given whitespace normalization algorithm is applied to each resulting Text node. The semantics of the PRESERVE algorithm are the same as with the DOM method org.w3c.dom.Node.normalize().

      Note that documents built by a

      invalid reference
      nu.xom.Builder
      with the default
      invalid reference
      nu.xom.NodeFactory
      are guaranteed to never have adjacent or empty Text nodes. However, subsequent manual removal or insertion of nodes to the tree can cause Text nodes to become adjacent, and updates can cause Text nodes to become empty.

      Text normalization with the whitespace PRESERVE algorithm is necessary to achieve strictly standards-compliant XPath and XQuery semantics if a query compares or extracts the value of individual Text nodes that (unfortunately) happen to be adjacent to other Text nodes. Luckily, such use cases are rare in practical real-world scenarios and thus a user hardly ever needs to call this method before passing a XOM tree into XQuery or XPath.

      Example Usage:

       Element foo = new Element("foo");
       foo.appendChild("");
       foo.appendChild("bar");
       foo.appendChild("");
       
       Element elem = new Element("elem");
       elem.appendChild("");
       elem.appendChild(foo);
       elem.appendChild("hello   ");
       elem.appendChild("world");
       elem.appendChild(" \n");
       elem.appendChild(foo.copy());
       elem.appendChild("");
       
       XOMUtil.Normalizer.PRESERVE.normalize(elem);
       System.out.println(XOMUtil.toDebugString(elem));
       
      PRESERVE yields the following normalized output:
       [nu.xom.Element: elem]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
           [nu.xom.Text: hello   world \n]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
       
      In contrast, REPLACE yields the following hello world form:
       [nu.xom.Element: elem]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
           [nu.xom.Text: hello   world  ]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
       
      Whereas, COLLAPSE yields:
       [nu.xom.Element: elem]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
           [nu.xom.Text: hello world]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
       
      TRIM yields:
       [nu.xom.Element: elem]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
           [nu.xom.Text: hello   world]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
       
      Finally, STRIP yields the same as PRESERVE because the example has no whitepace-only results:
       [nu.xom.Element: elem]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
           [nu.xom.Text: hello   world \n]
           [nu.xom.Element: foo]
               [nu.xom.Text: bar]
       
      Parameters:
      node - the subtree to normalize