JDOM and XML Parsing, Part 1 (收藏来源于 www.oracle.com)

类别:Java 点击:0 评论:0 推荐:
jdom  使JAVA处理XML变的更加简单!

Documents are represented by the org.jdom.Documentclass. You can construct a document from scratch:

// This builds: <root/> Document doc = new Document(new Element("root"));

Or you can build a document from a file, stream, system ID, or URL:

// This builds a document of whatever's in the given resource SAXBuilder builder = new SAXBuilder(); Document doc = builder.build(url);

Putting together a few calls makes it easy to create a simple document in JDOM:

// This builds: <root>This is the root</root> Document doc = new Document(); Element e = new Element("root"); e.setText("This is the root"); doc.addContent(e);

If you're a power user, you may prefer to use "method chaining," in which multiple methods are called in sequence. This works because the set methods return the object on which they acted. Here's how that looks:

Document doc = new Document( new Element("root").setText("This is the root"));

For a little comparison, here's how you'd create the same document, using JAXP/DOM:

// JAXP/DOM DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.newDocument(); Element root = doc.createElement("root"); Text text = doc.createText("This is the root"); root.appendChild(text); doc.appendChild(root);

Writing with XMLOutputter

A document can be output to many different formats, but the most common is a stream of bytes. In JDOM, the XMLOutputter class provides this capability. Its default no-argument constructor attempts to faithfully output a document exactly as stored in memory. The following code produces a raw representation of a document to a file.

// Raw output XMLOutputter outp = new XMLOutputter(); outp.output(doc, fileStream);

If you don't care about whitespace, you can enable trimming of text blocks and save a little bandwidth:

// Compressed output outp.setTextTrim(true); outp.output(doc, socketStream);

If you'd like the document pretty-printed for human display, you can add some indent whitespace and turn on new lines:

outp.setTextTrim(true); outp.setIndent(" "); outp.setNewlines(true); outp.output(doc, System.out);

When pretty-printing a document that already has formatting whitespace, be sure to enable trimming. Otherwise, you'll add formatting on top of formatting and make something ugly.

Navigating the Element Tree

JDOM makes navigating the element tree quite easy. To get the root element, call:

Element root = doc.getRootElement();

To get a list of all its child elements:

List allChildren = root.getChildren();

To get just the elements with a given name:

List namedChildren = root.getChildren("name");

And to get just the first element with a given name:

Element child = root.getChild("name");

The List returned by the getChildren() call is a java.util.List, an implementation of the List interface all Java programmers know. What's interesting about the List is that it's live. Any changes to the List are immediately reflected in the backing document.

// Remove the fourth child allChildren.remove(3); // Remove children named "jack" allChildren.removeAll(root.getChildren("jack")); // Add a new child, at the tail or at the head allChildren.add(new Element("jane")); allChildren.add(0, new Element("jill"));

Using the List metaphor makes possible many element manipulations without adding a plethora of methods. For convenience, however, the common tasks of adding elements at the end or removing named elements have methods on Element itself and don't require obtaining the List first:

root.removeChildren("jill"); root.addContent(new Element("jenny"));

One nice perk with JDOM is how easy it can be to move elements within a document or between documents. It's the same code in both cases:

Element movable = new Element("movable"); parent1.addContent(movable); // place parent1.removeContent(movable); // remove parent2.addContent(movable); // add

With DOM, moving elements is not as easy, because in DOM elements are tied to their build tool. Thus a DOM element must be "imported" when moving between documents.

With JDOM the only thing you need to remember is to remove an element before adding it somewhere else, so that you don't create loops in the tree. There's a detach() method that makes the detach/add a one-liner:

parent3.addContent(movable.detach());

If you forget to detach an element before adding it to another parent, the library will throw an exception (with a truly precise and helpful error message). The library also checks Element names and content to make sure they don't include inappropriate characters such as spaces. It also verifies other rules, such as having only one root element, consistent namespace declarations, lack of forbidden character sequences in comments and CDATA sections, and so on. This feature pushes "well-formedness" error checking as early in the process as possible.

Handling Element Attributes

Element attributes look like this:

<table width="100%" border="0"> ... </table>

With a reference to an element, you can ask the element for any named attribute value:

String val = table.getAttributeValue("width");

You can also get the attribute as an object, for performing special manipulations such as type conversions:

Attribute border = table.getAttribute("border"); int size = border.getIntValue();

To set or change an attribute, use setAttribute():

table.setAttribute("vspace", "0");

To remove an attribute, use removeAttribute():

table.removeAttribute("vspace");

Working with Element Text Content An element with text content looks like this:

<description> A cool demo </description>

In JDOM, the text is directly available by calling:

String desc = description.getText();

Just remember, because the XML 1.0 specification requires whitespace to be preserved, this returns "\n A cool demo\n". Of course, as a practical programmer you often don't want to be so literal about formatting whitespace, so there's a convenient method for retrieving the text while ignoring surrounding whitespace:

String betterDesc = description.getTextTrim();

If you really want whitespace out of the picture, there's even a getTextNormalize() method that normalizes internal whitespace with a single space. It's handy for text content like this:

<description> Sometimes you have text content with formatting space within the string. </description>

To change text content, there's a setText() method:

description.setText("A new description");

Any special characters within the text will be interpreted correctly as a character and escaped on output as needed to maintain the appropriate semantics. Let's say you make this call:

element.setText("<xml/> content");

The internal store will keep that literal string as characters. There will be no implicit parsing of the content. On output, you'll see this:

&lt;xml/&gt; content<elt>

This behavior preserves the semantic meaning of the earlier setText() call. If you want XML content held within an element, you must add the appropriate JDOM child element objects.

Handling CDATA sections is also possible within JDOM. A CDATA section indicates a block of text that shouldn't be parsed. It is essentially a "syntactic sugar" that allows the easy inclusion of HTML or XML content without so many &lt; and &gt; escapes. To build a CDATA section, just wrap the string with a CDATA object:

element.addContent(new CDATA("<xml/> content"));

What's terrific about JDOM is that a getText() call returns the string of characters without bothering the caller with whether or not it's represented by a CDATA section.

Dealing with Mixed Content

Some elements contain many things such as whitespace, comments, text, child elements, and more:

<table> <!-- Some comment --> Some text <tr>Some child element</tr> </table>

When an element contains both text and child elements, it's said to contain "mixed content." Handling mixed content can be potentially difficult, but JDOM makes it easy. The standard-use cases—retrieving text content and navigating child elements—are kept simple:

String text = table.getTextTrim(); // "Some text" Element tr = table.getChild("tr"); // A straight reference

For more advanced uses needing the comment, whitespace blocks, processing instructions, and entity references, the raw mixed content is available as a List:

List mixedCo = table.getContent(); Iterator itr = mixedCo.iterator(); while (itr.hasNext()) { Object o = i.next(); if (o instanceof Comment) { ... } // Types include Comment, Element, CDATA, DocType, // ProcessingInstruction, EntityRef, and Text }

As with child element lists, changes to the raw content list affect the backing document:

// Remove the Comment. It's "1" because "0" is a whitespace block. mixedCo.remove(1);

If you have sharp eyes, you'll notice that there's a Text class here. Internally, JDOM uses a Text class to store string content in order to allow the string to have parentage and more easily support XPath access. As a programmer, you don't need to worry about the class when retrieving or setting text-only when acce—sing the raw content list.

For details on the DocType, ProcessingInstruction, and EntityRef classes, see the API documentation at jdom.org.

本文地址:http://com.8s8s.com/it/it12380.htm