org.aitools.programd.util
Class XMLKit

java.lang.Object
  extended by org.aitools.programd.util.XMLKit

public class XMLKit
extends java.lang.Object

A collection of XML utilities.


Field Summary
private static java.lang.String AMPERSAND
          XML chars prohibited in some contexts and their escaped equivalents.
private static java.lang.String BEGIN_PRE
          The beginning of an HTML preformatted element, including namespace attribute.
private static int BEGIN_PRE_LEN
          The length of BEGIN_PRE.
private static java.lang.String BR
          An HTML <br/> element, including namespace attribute.
private static int BR_LEN
          The length of BR.
static java.lang.String CDATA_END
          CDATA end marker.
static java.lang.String CDATA_START
          CDATA start marker.
static java.lang.String COMMENT_END
          Comment end marker.
static java.lang.String COMMENT_START
          Comment start marker.
private static java.lang.String EMPTY_ELEMENT_TAG_END
          An empty element tag end marker.
private static java.lang.String EMPTY_STRING
          An empty string.
private static java.lang.String[] EMPTY_STRING_ARRAY
          An empty string array (one element).
private static java.lang.String ENCODING_EQUALS_QUOTE
          The string 'encoding="'.
private static int ENCODING_EQUALS_QUOTE_LENGTH
          The length of ENCODING_EQUALS_QUOTE .
private static java.lang.String END_P
          An HTML </p> end tag.
private static int END_P_LEN
          The length of END_P.
private static java.lang.String END_PRE
          The end of an HTML preformatted element.
private static int END_PRE_LEN
          The length of END_PRE.
private static java.lang.String END_TAG_START
          The beginning of an end tag.
protected static java.lang.String EQUAL_QUOTE
          A common string we search for when parsing attributes in tags.
private static java.lang.String GREATER_THAN
           
private static java.lang.String LESS_THAN
           
private static java.lang.String LINE_SEPARATOR
          The system line separator.
private static char MARKER_END
          A tag end marker.
private static char MARKER_START
          A tag start marker.
protected static char QUOTE_MARK
          A quote mark, for convenience.
private static java.lang.String SPACE
          A space, for convenience.
private static java.lang.String SPACE_XMLNS_EQUALS_QUOTE
          The string '" xmlns=\""'.
private static java.lang.String SYSTEM_ENCODING
          The system default file encoding; defaults to UTF-8!!!
protected static javax.xml.parsers.DocumentBuilder utilBuilder
          A DocumentBuilder for producing new documents.
protected static org.w3c.dom.Document utilDoc
          A document for producing new elements.
protected static java.lang.String WHITESPACE_REGEX
          The regex for whitespace.
private static java.lang.String XML_AMPERSAND
           
private static java.lang.String XML_GREATER_THAN
           
private static java.lang.String XML_LESS_THAN
           
private static java.lang.String XML_PI_START
          The start of an XML processing instruction.
private static java.lang.String XMLNS
          The string '"xmlns"'.
 
Constructor Summary
XMLKit()
           
 
Method Summary
static java.lang.String convertXMLUnicodeEntities(java.lang.String input)
           Converts XML Unicode character entities into their character equivalents within a given string.
static int elementCount(org.w3c.dom.NodeList list)
          Returns the number of elements in the nodelist and its descendants.
static java.lang.String escapeXMLChars(char[] ch, int start, int length)
          Like escapeXMLChars(String), but takes an array of chars instead of a String.
static java.lang.String escapeXMLChars(java.lang.String input)
           Replaces the following characters with their "escaped" equivalents: & with &amp; < with &lt; > with &gt; ' with &apos; " with &quot;
static java.lang.String[] filterViaHTMLTags(java.lang.String input)
           Breaks a message into multiple lines at an HTML <br/>, except if it comes at the beginning of the message, or ending HTML </p>.
static java.lang.String filterWhitespace(java.lang.String input)
           Filters all whitespace: line separators and multiple consecutive spaces are replaced with a single space, and any leading or trailing whitespace characters are removed.
static java.lang.String filterXML(java.lang.String input)
          Removes all characters that are not considered XML characters from the input.
static java.util.List<org.w3c.dom.Element> getAllElementsNamed(org.w3c.dom.Element element, java.lang.String name)
          Returns the all elements with the given name that are children of the given element, or null if there is no such element.
static java.lang.String getDeclaredXMLEncoding(java.io.InputStream in)
          Returns the declared encoding string from the XML resource supposedly connected to a given InputStream, or the system default if none is found.
static javax.xml.parsers.DocumentBuilder getDocumentBuilder(java.net.URL schemaLocation, java.lang.String schemaDescription)
          Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.
static java.util.List<org.w3c.dom.Element> getElementChildrenOf(org.w3c.dom.Element element)
          Returns the element children of the given element.
static org.w3c.dom.Element getFirstElementChildOf(org.w3c.dom.Element element)
          Returns the first element child of the given element.
static org.w3c.dom.Element getFirstElementIn(org.w3c.dom.NodeList list)
          Returns the first element member (if there is one) of the given nodelist.
static org.w3c.dom.Element getFirstElementNamed(org.w3c.dom.Element element, java.lang.String name)
          Returns the first element with the given name that is a child of the given element, or null if there is no such element.
static java.lang.String getChildText(org.w3c.dom.Element element, java.lang.String childName)
          Gets the text of the named child from of the given element.
static javax.xml.parsers.SAXParser getSAXParser(java.net.URL schemaLocation, java.lang.String schemaDescription)
          Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.
static javax.xml.validation.Schema getSchema(java.net.URL schemaLocation, java.lang.String schemaDescription)
          Attempts to get the schema at the given location.
static java.lang.String getSpaces(int count)
           
static org.w3c.dom.Document parseAsDocumentFragment(java.lang.String text)
           
static java.lang.String removeMarkup(java.lang.String input)
          Removes all tags from a string (retains character content of tags, however).
private static java.lang.String renderAttributes(org.xml.sax.Attributes attributes)
          Renders a set of attributes.
private static java.lang.String renderAttributes(org.w3c.dom.NamedNodeMap attributes)
          Renders a set of attributes.
static java.lang.String renderEmptyElement(org.w3c.dom.Element element, boolean includeNamespaceAttribute)
          Renders a given element as an empty element, including a namespace declaration, if requested.
static java.lang.String renderEndTag(org.w3c.dom.Element element)
          Renders a given element as an end tag.
static java.lang.String renderStartTag(org.w3c.dom.Element element, boolean includeNamespaceAttribute)
          Renders a given element as a start tag, including a namespace declaration, if requested.
static java.lang.String renderStartTag(java.lang.String elementName, org.xml.sax.Attributes attributes, boolean includeNamespaceAttribute, java.lang.String namespaceURI)
          Renders a given element name and set of attributes as a start tag, including a namespace declaration, if requested.
private static void renderXML(org.w3c.dom.Node node, int level, boolean atStart, java.lang.StringBuilder result, boolean includeNamespaceAttribute, boolean indent)
          Formats an XML node, putting the result into the buffer passed as an argument.
static java.lang.String renderXML(org.w3c.dom.NodeList list, boolean indent)
          Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static java.lang.String renderXML(org.w3c.dom.NodeList list, boolean includeNamespaceAttribute, boolean indent)
          Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static java.lang.String renderXML(org.w3c.dom.NodeList list, int level, boolean atStart, boolean includeNamespaceAttribute, boolean indent)
          Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static java.lang.String renderXML(java.lang.String content, boolean includeNamespaceAttribute, boolean indent)
          Formats XML from a single long string into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).
static java.lang.String unescapeXMLChars(java.lang.String input)
           Replaces the following "escape" strings with their character equivalents: &amp; with & &lt; with < &gt; with > &apos; with ' &quot; with "
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EMPTY_STRING

private static final java.lang.String EMPTY_STRING
An empty string.

See Also:
Constant Field Values

EMPTY_STRING_ARRAY

private static final java.lang.String[] EMPTY_STRING_ARRAY
An empty string array (one element).


SPACE

private static final java.lang.String SPACE
A space, for convenience.

See Also:
Constant Field Values

LINE_SEPARATOR

private static final java.lang.String LINE_SEPARATOR
The system line separator.


MARKER_START

private static final char MARKER_START
A tag start marker.

See Also:
Constant Field Values

MARKER_END

private static final char MARKER_END
A tag end marker.

See Also:
Constant Field Values

END_TAG_START

private static final java.lang.String END_TAG_START
The beginning of an end tag.

See Also:
Constant Field Values

EMPTY_ELEMENT_TAG_END

private static final java.lang.String EMPTY_ELEMENT_TAG_END
An empty element tag end marker.

See Also:
Constant Field Values

CDATA_START

public static final java.lang.String CDATA_START
CDATA start marker.

See Also:
Constant Field Values

CDATA_END

public static final java.lang.String CDATA_END
CDATA end marker.

See Also:
Constant Field Values

COMMENT_START

public static final java.lang.String COMMENT_START
Comment start marker.

See Also:
Constant Field Values

COMMENT_END

public static final java.lang.String COMMENT_END
Comment end marker.

See Also:
Constant Field Values

EQUAL_QUOTE

protected static final java.lang.String EQUAL_QUOTE
A common string we search for when parsing attributes in tags.

See Also:
Constant Field Values

QUOTE_MARK

protected static final char QUOTE_MARK
A quote mark, for convenience.

See Also:
Constant Field Values

WHITESPACE_REGEX

protected static final java.lang.String WHITESPACE_REGEX
The regex for whitespace.

See Also:
Constant Field Values

AMPERSAND

private static final java.lang.String AMPERSAND
XML chars prohibited in some contexts and their escaped equivalents.

See Also:
Constant Field Values

XML_AMPERSAND

private static final java.lang.String XML_AMPERSAND
See Also:
Constant Field Values

LESS_THAN

private static final java.lang.String LESS_THAN
See Also:
Constant Field Values

XML_LESS_THAN

private static final java.lang.String XML_LESS_THAN
See Also:
Constant Field Values

GREATER_THAN

private static final java.lang.String GREATER_THAN
See Also:
Constant Field Values

XML_GREATER_THAN

private static final java.lang.String XML_GREATER_THAN
See Also:
Constant Field Values

XML_PI_START

private static final java.lang.String XML_PI_START
The start of an XML processing instruction.

See Also:
Constant Field Values

ENCODING_EQUALS_QUOTE

private static final java.lang.String ENCODING_EQUALS_QUOTE
The string 'encoding="'.

See Also:
Constant Field Values

ENCODING_EQUALS_QUOTE_LENGTH

private static final int ENCODING_EQUALS_QUOTE_LENGTH
The length of ENCODING_EQUALS_QUOTE .


SYSTEM_ENCODING

private static final java.lang.String SYSTEM_ENCODING
The system default file encoding; defaults to UTF-8!!!


SPACE_XMLNS_EQUALS_QUOTE

private static final java.lang.String SPACE_XMLNS_EQUALS_QUOTE
The string '" xmlns=\""'.

See Also:
Constant Field Values

XMLNS

private static final java.lang.String XMLNS
The string '"xmlns"'.

See Also:
Constant Field Values

BR

private static final java.lang.String BR
An HTML <br/> element, including namespace attribute. ('"<br xmlns=\"http://www.w3.org/1999/xhtml\"/>"')

See Also:
Constant Field Values

BR_LEN

private static final int BR_LEN
The length of BR.


END_P

private static final java.lang.String END_P
An HTML </p> end tag. ('"</p>"')

See Also:
Constant Field Values

END_P_LEN

private static final int END_P_LEN
The length of END_P.


BEGIN_PRE

private static final java.lang.String BEGIN_PRE
The beginning of an HTML preformatted element, including namespace attribute.

See Also:
Constant Field Values

BEGIN_PRE_LEN

private static final int BEGIN_PRE_LEN
The length of BEGIN_PRE.


END_PRE

private static final java.lang.String END_PRE
The end of an HTML preformatted element.

See Also:
Constant Field Values

END_PRE_LEN

private static final int END_PRE_LEN
The length of END_PRE.


utilBuilder

protected static javax.xml.parsers.DocumentBuilder utilBuilder
A DocumentBuilder for producing new documents.


utilDoc

protected static org.w3c.dom.Document utilDoc
A document for producing new elements.

Constructor Detail

XMLKit

public XMLKit()
Method Detail

unescapeXMLChars

public static java.lang.String unescapeXMLChars(java.lang.String input)

Replaces the following "escape" strings with their character equivalents:

  • &amp; with &
  • &lt; with <
  • &gt; with >
  • &apos; with '
  • &quot; with "

Parameters:
input - the string on which to perform the replacement
Returns:
the string with entities replaced

escapeXMLChars

public static java.lang.String escapeXMLChars(java.lang.String input)

Replaces the following characters with their "escaped" equivalents:

  • & with &amp;
  • < with &lt;
  • > with &gt;
  • ' with &apos;
  • " with &quot;

Parameters:
input - the string on which to perform the replacement
Returns:
the string with entities replaced

escapeXMLChars

public static java.lang.String escapeXMLChars(char[] ch,
                                              int start,
                                              int length)
Like escapeXMLChars(String), but takes an array of chars instead of a String. This might be faster (but should be tested).


filterXML

public static java.lang.String filterXML(java.lang.String input)
Removes all characters that are not considered XML characters from the input.

Parameters:
input - the input to filter
Returns:
the input with all non-XML characters removed

convertXMLUnicodeEntities

public static java.lang.String convertXMLUnicodeEntities(java.lang.String input)

Converts XML Unicode character entities into their character equivalents within a given string.

This will handle entities in the form &#xxxx; (decimal character code, where xxxx is a valid character code), or &#xxxxx (hexadecimal character code, where xxxx is a valid character code).

Parameters:
input - the string to process
Returns:
the input with all XML Unicode character entity codes replaced

getDeclaredXMLEncoding

public static java.lang.String getDeclaredXMLEncoding(java.io.InputStream in)
                                               throws java.io.IOException
Returns the declared encoding string from the XML resource supposedly connected to a given InputStream, or the system default if none is found.

Parameters:
in - the input stream
Returns:
the declared encoding
Throws:
java.io.IOException - if there was a problem reading the input stream

parseAsDocumentFragment

public static org.w3c.dom.Document parseAsDocumentFragment(java.lang.String text)
Parameters:
text - a document fragment
Returns:
a Document created by parsing the given text as a document fragment

renderXML

public static java.lang.String renderXML(java.lang.String content,
                                         boolean includeNamespaceAttribute,
                                         boolean indent)
Formats XML from a single long string into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).

Parameters:
content - the XML content to format
includeNamespaceAttribute - whether to include the namespace attribute
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

renderXML

public static java.lang.String renderXML(org.w3c.dom.NodeList list,
                                         boolean indent)
Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false). This is a convenience method that assumes that we should include namespace attributes.

Parameters:
list - the list of XML nodes
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

renderXML

public static java.lang.String renderXML(org.w3c.dom.NodeList list,
                                         boolean includeNamespaceAttribute,
                                         boolean indent)
Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).

Parameters:
list - the list of XML nodes
includeNamespaceAttribute - whether to include the namespace attribute
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

renderXML

public static java.lang.String renderXML(org.w3c.dom.NodeList list,
                                         int level,
                                         boolean atStart,
                                         boolean includeNamespaceAttribute,
                                         boolean indent)
Formats XML from a node list into a nicely indented multi-line string (if indent is true), or just a long string (if indent is false).

Parameters:
list - the list of XML nodes
level - the level (for indenting; no meaning if indenting is off)
atStart - whether the whole XML string is at its beginning
includeNamespaceAttribute - whether to include the namespace attribute
indent - whether to render the string in an indented, multiline fashion
Returns:
the formatted XML

renderXML

private static void renderXML(org.w3c.dom.Node node,
                              int level,
                              boolean atStart,
                              java.lang.StringBuilder result,
                              boolean includeNamespaceAttribute,
                              boolean indent)
Formats an XML node, putting the result into the buffer passed as an argument.

Parameters:
node - the node to format
level - the level (for indenting; no meaning if indenting is off)
atStart - whether the whole XML string is at its beginning
result - the buffer into which to place the result
includeNamespaceAttribute - whether to include the namespace attribute
indent - whether to render the string in an indented, multiline fashion

filterWhitespace

public static java.lang.String filterWhitespace(java.lang.String input)
                                         throws java.lang.StringIndexOutOfBoundsException

Filters all whitespace: line separators and multiple consecutive spaces are replaced with a single space, and any leading or trailing whitespace characters are removed. Any data enclosed in <![CDATA[ ]]> sections, however, is left as-is (including the CDATA markers).

Parameters:
input - the input to filter
Returns:
the input with white space filtered.
Throws:
java.lang.StringIndexOutOfBoundsException - if there is malformed text in the input.

elementCount

public static int elementCount(org.w3c.dom.NodeList list)
Returns the number of elements in the nodelist and its descendants. Useful for seeing whether there are no elements, only text.

Parameters:
list - a list of nodes
Returns:
the number of elements in the nodelist and its descendants

filterViaHTMLTags

public static java.lang.String[] filterViaHTMLTags(java.lang.String input)

Breaks a message into multiple lines at an HTML <br/>, except if it comes at the beginning of the message, or ending HTML </p>. Other tags are just removed.

Generally used to format output nicely for a console.

Parameters:
input - the string to break
Returns:
one line per array item

removeMarkup

public static java.lang.String removeMarkup(java.lang.String input)
Removes all tags from a string (retains character content of tags, however).

Parameters:
input - the string from which to remove markup
Returns:
the input without tags

renderStartTag

public static java.lang.String renderStartTag(org.w3c.dom.Element element,
                                              boolean includeNamespaceAttribute)
Renders a given element as a start tag, including a namespace declaration, if requested.

Parameters:
element - the element to render
includeNamespaceAttribute - whether to include the namespace attribute
Returns:
the rendering of the element

renderStartTag

public static java.lang.String renderStartTag(java.lang.String elementName,
                                              org.xml.sax.Attributes attributes,
                                              boolean includeNamespaceAttribute,
                                              java.lang.String namespaceURI)
Renders a given element name and set of attributes as a start tag, including a namespace declaration, if requested.

Parameters:
elementName - the name of the element to render
attributes - the attributes to include
includeNamespaceAttribute - whether or not to include the namespace attribute
namespaceURI - the namespace URI
Returns:
the rendering result

renderEmptyElement

public static java.lang.String renderEmptyElement(org.w3c.dom.Element element,
                                                  boolean includeNamespaceAttribute)
Renders a given element as an empty element, including a namespace declaration, if requested.

Parameters:
element - the element to render
includeNamespaceAttribute - whether to include the namespace attribute
Returns:
the result of the rendering

renderAttributes

private static java.lang.String renderAttributes(org.xml.sax.Attributes attributes)
Renders a set of attributes.

Parameters:
attributes - the attributes to render
Returns:
the rendered attributes

renderAttributes

private static java.lang.String renderAttributes(org.w3c.dom.NamedNodeMap attributes)
Renders a set of attributes.

Parameters:
attributes - the attributes to render
Returns:
the rendered attributes

renderEndTag

public static java.lang.String renderEndTag(org.w3c.dom.Element element)
Renders a given element as an end tag.

Parameters:
element - the element to render
Returns:
the result of the rendering

getSpaces

public static java.lang.String getSpaces(int count)
Parameters:
count - the number of spaces to return
Returns:
the given number of spaces.

getSAXParser

public static javax.xml.parsers.SAXParser getSAXParser(java.net.URL schemaLocation,
                                                       java.lang.String schemaDescription)
Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.

Parameters:
schemaLocation - location of the schema to use
schemaDescription - short (one word or so) description of the schema
Returns:
the parser

getDocumentBuilder

public static javax.xml.parsers.DocumentBuilder getDocumentBuilder(java.net.URL schemaLocation,
                                                                   java.lang.String schemaDescription)
Sets up a SAX parser that is schema-aware, processes XIncludes, and is set to use the schema at the given location.

Parameters:
schemaLocation - location of the schema to use
schemaDescription - short (one word or so) description of the schema
Returns:
the parser

getSchema

public static javax.xml.validation.Schema getSchema(java.net.URL schemaLocation,
                                                    java.lang.String schemaDescription)
Attempts to get the schema at the given location.

Parameters:
schemaLocation - location of the schema to use
schemaDescription - short (one word or so) description of the schema
Returns:
the schema

getElementChildrenOf

public static java.util.List<org.w3c.dom.Element> getElementChildrenOf(org.w3c.dom.Element element)
Returns the element children of the given element.

Parameters:
element - the element whose children are wanted
Returns:
the element children of the given element

getFirstElementChildOf

public static org.w3c.dom.Element getFirstElementChildOf(org.w3c.dom.Element element)
Returns the first element child of the given element.

Parameters:
element - the element whose child is wanted
Returns:
the first element child of the given element

getFirstElementIn

public static org.w3c.dom.Element getFirstElementIn(org.w3c.dom.NodeList list)
Returns the first element member (if there is one) of the given nodelist.

Parameters:
list - the nodes to scan
Returns:
the first element member of the given list

getAllElementsNamed

public static java.util.List<org.w3c.dom.Element> getAllElementsNamed(org.w3c.dom.Element element,
                                                                      java.lang.String name)
Returns the all elements with the given name that are children of the given element, or null if there is no such element.

Parameters:
element - the element whose children should be examined
name - the name of the element desired
Returns:
the desired elements, or null

getFirstElementNamed

public static org.w3c.dom.Element getFirstElementNamed(org.w3c.dom.Element element,
                                                       java.lang.String name)
Returns the first element with the given name that is a child of the given element, or null if there is no such element.

Parameters:
element - the element whose children should be examined
name - the name of the element desired
Returns:
the desired element, or null

getChildText

public static java.lang.String getChildText(org.w3c.dom.Element element,
                                            java.lang.String childName)
Gets the text of the named child from of the given element.

Parameters:
element -
childName -
Returns:
the text of the named child from of the given element