Sunday, July 31, 2011

Keep Your Hands Off of My Whitespace!

We Can Put a Man on the Moon...

Groovy has some awesome XML reading and parsing features that make it a breeze for developers to create new XML strings or to parse existing XML strings.  The XMLSlurper and associated GPathResult classes make it easy to traverse and manipulate the DOM of an XML document/string.  On top of that, the builder support in Groovy (MarkupBuilder, StreamingMarkupBuilder) make it much easier for developers to create structured documents and get essentially built-in commenting for free (since the builder syntax essentially describes the hierarchical document by itself).  With all of these improvements and modern conveniences provided by Groovy regarding XML, you would think that it would be easy to perform the following task:
  1. Read in a file containing XML
  2. Parse the file and find a particular element
  3. Edit the value of said element
  4. Update the file with the changes, preserving the original formatting and namespace(s) of the file.
Good luck.  The builders are great for creating new documents.  While you can use the StreamingMarkupBuilder to handle data read from a file, it does NOT preserve the white-space (and you have to know what additional calls need to be made to preserve any namespaces in the original XML document).  This was a choice made by the implementer, which certainly makes sense for the normal use case of the StreamingMarkupBuilder (creating XML on the fly as a response to a request), where white-space is irrelevant (and takes up precious bytes ;) ).  So, are we just doomed to lose are pretty, human readable formatting when editing XML?  The answer is no.  Luckily, there are some other classes provided by Groovy that will let you do things similar to the normal Groovy XML manipulation approach (slurper, markup builders and GPath).

DOMination

The solution to the problem above is to use the groovy.xml.DOMBuilder and groovy.xml.dom.DOMCategory classes to manipulate XML, while still preserving the formatting/white-space.  Assume that you already have a java.io.File object pointing to an XML file.  You can do the following to manipulate the contents of that file:

    def xml = file.text
    def document = groovy.xml.DOMBuilder.parse(new StringReader(xml)))
    def root = document.documentElement
    use(groovy.xml.dom.DOMCategory) {
        // manipulate the XML here, i.e. root.someElement?.each { it.value = 'new value'}
    }

    def result = groovy.xml.dom.DOMUtil.serialize(root)

    file.withWriter { w ->
        w.write(result)
    }

With 10-15 lines of Groovy code, we have just loaded XML from a file, manipulated its contents, and written it back out to file, while preserving all formatting from the original file.  I wasted about 4 hours trying to figure this out before I stumbled upon the DOMCategory class.  For more information on editing XML using DOMCategory, see the Groovy tutorial on it here.

5 comments:

  1. Wow,this is old and no comments. This is great and almost exactly what i need to do. Except i can't figure out from your example exactly how to add an element or "manipulate the xml"
    i keep getting errors like 'DOMCategory$NodesList' has no method appendNode() or variations on this that

    ReplyDelete
    Replies
    1. Forgot to add that i've found the docs, but they're not quite laid out in a manner that seems to work. that appendNode in the examples always breaks for me. i probably should go through it again.

      Delete
    2. Hi Jeff...the example in this post was made using Groovy 1.8, so it's possible that some of the methods have been tweaked in Groovy 2.x and might explain why you are seeing the no method found exceptions.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. When using Groovy 2.4.x with XmlUtil instead of DOMUtil (DOMUtil exists no more), the original formatting is gone :( The XML file is completely re-formatted.

    ReplyDelete