All Packages  Class Hierarchy  This Package  Previous  Next  Index

Class org.xmlmiddleware.xmlutils.DOMNormalizer

java.lang.Object
   |
   +----org.xmlmiddleware.xmlutils.DOMNormalizer

public class DOMNormalizer
extends Object
Utility methods that treat a DOM tree as if it consisted only of element, attribute, and text nodes.

The methods in this class behave as if the DOM tree consisted only of element, attribute, and fully-normalized text nodes. Except for serialize(), they modify the DOM tree as necessary, replacing entity references with their children, discarding comment and processing instruction nodes, merging adjacent text and CDATA nodes, and so on.

For example, suppose we have the following DOM tree:

                           ELEMENT(A)
                               |
             -------------------------------------
             |           |           |           |
         ELEMENT(B)  TEXT("foo")  ENTITYREF  TEXT("bar")
                                     |
                         -----------------------
                         |           |         |
                     CDATA("asdf")   PI     ELEMENT(C)
 

This class behaves as if the first child of element node A is element node B, the next sibling of element node B is the text node "fooasdf", and the next sibling of the text node "fooasdf" is the element node C. That is, it normalizes the tree to the following:

                           ELEMENT(A)
                               |
             ----------------------------------------
             |              |           |           |
         ELEMENT(B)  TEXT("fooasdf")  ELEMENT(C)  TEXT("bar")

The serialized form of this is:

    <A><B/>fooasdf<C/>bar</A>

The code assumes that the tree will be traversed in depth-first, width-second order, with the methods in this class, such as getFirstChild, replacing the corresponding methods in DOM's Node interface. This is done so that the tree can be processed in a single pass, rather than a normalization pass and a reading pass. The result of using methods in this class when traversing the tree in any other order is undefined.

Version:
2.0
Author:
Ronald Bourret

Constructor Index

 o DOMNormalizer()

Method Index

 o expandEntityRef(Node)
Replace an entity reference node by its children.
 o getFirstChild(Node)
Get the first normalized child node.
 o getNextSibling(Node)
Get the next normalized sibling node.
 o normalizeNode(Node)
Get the first normalized node at or sibling-wise after the input node.
 o normalizeText(Node)
Normalize text nodes, starting with the current node.
 o serialize(Node, boolean, boolean)
Serialize the normalized version of a node.

Constructors

 o DOMNormalizer
 public DOMNormalizer()

Methods

 o getFirstChild
 public static Node getFirstChild(Node node)
Get the first normalized child node.

Parameters:
node - Parent node.
Returns:
First normalized child node or null if there is no normalized child node.
 o getNextSibling
 public static Node getNextSibling(Node node)
Get the next normalized sibling node.

Parameters:
node - Starting node.
Returns:
First normalized sibling node or null if there is no normalized sibling node.
 o normalizeNode
 public static Node normalizeNode(Node node)
Get the first normalized node at or sibling-wise after the input node.

This method expands entity references in place, discards processing instruction and comment nodes, and concatenates adjacent text and CDATA nodes.

Parameters:
node - Starting node.
Returns:
The node. Null if such a node does not exist or if the type of the input node is not a type that can legally occur beneath an element node.
 o normalizeText
 public static Node normalizeText(Node node)
Normalize text nodes, starting with the current node.

If the input node is a text or CDATA node, concatenate its value with the values of all immediately following text or CDATA nodes. A following text or CDATA node is considered to be immediately following if the only nodes between it and the input node are text, CDATA, comment, or processing instruction nodes. (Comment and processing instruction nodes are discarded.)

If the input node is not a text or CDATA node, or if the input node has no parent, then no normalization takes place and the input node is returned.

If the input node is a CDATA node, it is converted to a text node.

Parameters:
node - Text or CDATA node to normalize.
Returns:
The normalized node.
 o expandEntityRef
 public static Node expandEntityRef(Node node)
Replace an entity reference node by its children.

This method returns the first child of the entity reference. This child is now at the same level that the entity reference node was at. If the input node is not an entity reference node, or if the input node does not have a parent, null is returned.

This method does not attempt to normalize text, nor does it expand entity reference nodes that are children of the input node. It is normally called only by other methods, which do normalize text and expand nested entity reference nodes.

Parameters:
node - Entity reference node to expand.
Returns:
The first child of the entity reference node.
 o serialize
 public static String serialize(Node node,
                                boolean childrenOnly,
                                boolean escapeMarkup)
Serialize the normalized version of a node.

This method behaves as if entity references were expanded, CDATA sections replaced with text, and comments and PIs didn't exist. It does not actually modify the DOM tree.

Parameters:
node - The node to serialize.
childrenOnly - Whether to serialize the node itself or only its children. For example, for an element node, this dictates whether a tag for the element itself is included in the returned value.
escapeMarkup - Whether to replace '<' and '&' with entity references (&lt;, &amp;) or serialize them literally (<, &).
Returns:
The serialized node. If the node is a comment, document type, entity, notation, or processing instruction node, or if the node is a text or CDATA node and childrenOnly is true, null is returned.

All Packages  Class Hierarchy  This Package  Previous  Next  Index