ProjectPythonSourceForge
Welcome
Introduction
Suitability
Future Plans
Setup
Downloading XMLObject
Installing XMLObject
XMLObjApp
The XMLObjApp Application
Classes
Special Attributes
XML Attributes
Child Tags
File Menu
Miscellaneous Operation Notes
Unicode and ASCII Strings
Manually Editing Your Parser
Outputting XML
XMLObject
XMLObject -- XML to Object Conversion
Stack -- Tracks the Document Hierarchy
All Docs on One Page

Introduction

XMLObject is a Python module that simplifies the handling of XML streams by converting the data into objects.

There are several ways to manipulate XML in Python.

  • xml.sax is an industry-standard interface, but as such, it does not take advantage of any of the things that really make Python worth using as a programming language. xml.sax lets you define handler functions which it will call while parsing an XML file, but you have to create your own code to actually turn the incoming events into a coherent Python structure.

  • xml.dom is another industry-standard interface. xml.dom takes care of parsing the entire XML file into a single hierarchy of objects, but the accessor methods are again pretty clumsy. Parsing a file with xml.dom is pretty simple, but the cost is in code overhead of actually using the parsed data.

  • pyRXP is a blazingly-fast, validating, XML parser. pyRXP parses an XML file into a tuple of tuples, which is easier to traverse than the output of xml.dom, but the resulting code to handle a tuple of tuples isn't particularly readible or easy to maintain.

XMLObject easily parses an XML file and allows you to manipulate it in memory in an intuitive and easy-to-read manner. To compare, let's parse the following XML with XMLObject, xml.dom, pyRXP, and xml.sax:

Example 1.1. Sample Addressbook XML

<AddressBook>
    <User ID="1">
        <Nickname>Gre7g</Nickname>
        <Email>gre7g@wolfhome.com</Email>
        <Address>2235 West Sunnyside Dr.</Address>
        <City>Cedar City</City>
        <State>MT</State>
        <Zip>74720</Zip>
    </User>
    <User ID="2">
        <Nickname>Jimbo</Nickname>
        <BestFriend ID="1" />
        <Email>jimbo@hotmail.com</Email>
        <Address>115 North Main</Address>
        <Address>Apartment #309</Address>
        <City>Columbia</City>
        <State>MO</State>
        <Zip>65201</Zip>
    </User>
</AddressBook>

In this simple address book, each user is indexed by a unique identifier ID. Let's parse the XML with XMLObject, and print out the e-mail address for the user who has ID=2.

Example 1.2. Decoding XML with XMLObject

>>> import XMLObject, AddressBook
>>> Book = XMLObject.Parse("AddressBook.xml", "AddressBook",
...     AddressBook.AddressBook)
>>> Book.User["2"].Email
u'jimbo@hotmail.com'

Not only is the e-mail address easy to find, the Python code to find it is simple to read – and will therefore be easy to maintain. Let's compare this with xml.dom code that accomplishes the same task.

Example 1.3. Decoding XML with xml.dom

>>> from xml.dom import minidom
>>> Book = minidom.parse("AddressBook.xml")
>>> Users = Book.getElementsByTagName("User")
>>> User = filter(lambda Node:
...     Node.attributes["ID"].value == "2", Users)[0]
>>> User.getElementsByTagName("Email")[0].firstChild.data
u'jimbo@hotmail.com'

It takes a little less code to parse the XML file with xml.dom, but look at how many hoops we have to jump through to find some text data that is only a couple layers down from the top of the structure. The built-in function filter does a great job of finding the correct user, but it does hurt the code's legibility. A for loop would be more readible, but it would take another few lines of code.

Now let's find the e-mail address with pyRXP.

Example 1.4. Decoding XML with pyRXP

>>> from types import *
>>> import pyRXP
>>> AddressBook = open("AddressBook.xml").read()
>>> Book = pyRXP.Parser()(AddressBook)
>>> Users = filter(lambda Node: type(Node) == TupleType,
...     Book[2])
>>> User = filter(lambda Node: Node[1]["ID"] == "2",
...     Users)[0]
>>> Children = filter(lambda Node: type(Node) == TupleType,
...     User[2])
>>> filter(lambda Node: Node[0] == "Email",
...     Children)[0][2][0]
'jimbo@hotmail.com'

I love pyRXP's power and speed, but yuck! Three lambda functions and eight references to a constant index do not make for readible code!

Decoding XML with xml.sax is no small undertaking. I'll save you all the editorial comments and allow the code to speak for itself:

Example 1.5. Decoding XML with xml.sax

from types import *
from xml import sax

class cAddressBook(DictType):
    pass

class cUser:
    pass

class cMyParser(sax.ContentHandler):
    def __init__(self):
        self.Stack, DOM = [], None

    def startElement(self, Name, Attrs):
        if Name == "AddressBook":
            self.Stack.append(cAddressBook())
        elif Name == "User":
            User = cUser()
            self.Stack[-1][Attrs["ID"]] = User
            self.Stack.append(User)
        else:
            self.Stack.append(Name)

    def characters(self, Content):
        if Content.strip(): setattr(self.Stack[-2], self.Stack[-1], Content)

    def endElement(self, Name):
        X = self.Stack.pop()
        if not self.Stack: self.DOM = X

def Main():
    Parser = cMyParser()
    File = open("AddressBook.xml")
    sax.parse(File, Parser)
    print Parser.DOM["2"].Email

if __name__ == "__main__": Main()