XMLObject is a Python module that simplifies the handling of XML streams by converting the data into objects.
There are several ways to manipulate XML in Python.
XMLObject easily parses an XML file and allows you to manipulate it in memory in an intuitive and easy-to-read manner. To compare, let's parse the following XML with XMLObject, xml.dom, pyRXP, and xml.sax:
Example 1.1. Sample Addressbook XML
<AddressBook> <User ID="1"> <Nickname>Gre7g</Nickname> <Email>firstname.lastname@example.org</Email> <Address>2235 West Sunnyside Dr.</Address> <City>Cedar City</City> <State>MT</State> <Zip>74720</Zip> </User> <User ID="2"> <Nickname>Jimbo</Nickname> <BestFriend ID="1" /> <Email>email@example.com</Email> <Address>115 North Main</Address> <Address>Apartment #309</Address> <City>Columbia</City> <State>MO</State> <Zip>65201</Zip> </User> </AddressBook>
In this simple address book, each user is indexed by a unique identifier ID. Let's parse the XML with XMLObject, and print out the e-mail address for the user who has ID=2.
Example 1.2. Decoding XML with XMLObject
>>> import XMLObject, AddressBook >>> Book = XMLObject.Parse("AddressBook.xml", "AddressBook", ... AddressBook.AddressBook) >>> Book.User["2"].Email firstname.lastname@example.org'
Not only is the e-mail address easy to find, the Python code to find it is simple to read – and will therefore be easy to maintain. Let's compare this with xml.dom code that accomplishes the same task.
Example 1.3. Decoding XML with xml.dom
>>> from xml.dom import minidom >>> Book = minidom.parse("AddressBook.xml") >>> Users = Book.getElementsByTagName("User") >>> User = filter(lambda Node: ... Node.attributes["ID"].value == "2", Users) >>> User.getElementsByTagName("Email").firstChild.data email@example.com'
It takes a little less code to parse the XML file with xml.dom, but look at how many hoops we have to jump through to find some text data that is only a couple layers down from the top of the structure. The built-in function filter does a great job of finding the correct user, but it does hurt the code's legibility. A for loop would be more readible, but it would take another few lines of code.
Now let's find the e-mail address with pyRXP.
Example 1.4. Decoding XML with pyRXP
>>> from types import * >>> import pyRXP >>> AddressBook = open("AddressBook.xml").read() >>> Book = pyRXP.Parser()(AddressBook) >>> Users = filter(lambda Node: type(Node) == TupleType, ... Book) >>> User = filter(lambda Node: Node["ID"] == "2", ... Users) >>> Children = filter(lambda Node: type(Node) == TupleType, ... User) >>> filter(lambda Node: Node == "Email", ... Children) 'firstname.lastname@example.org'
I love pyRXP's power and speed, but yuck! Three lambda functions and eight references to a constant index do not make for readible code!
Decoding XML with xml.sax is no small undertaking. I'll save you all the editorial comments and allow the code to speak for itself:
Example 1.5. Decoding XML with xml.sax
from types import * from xml import sax class cAddressBook(DictType): pass class cUser: pass class cMyParser(sax.ContentHandler): def __init__(self): self.Stack, DOM = , None def startElement(self, Name, Attrs): if Name == "AddressBook": self.Stack.append(cAddressBook()) elif Name == "User": User = cUser() self.Stack[-1][Attrs["ID"]] = User self.Stack.append(User) else: self.Stack.append(Name) def characters(self, Content): if Content.strip(): setattr(self.Stack[-2], self.Stack[-1], Content) def endElement(self, Name): X = self.Stack.pop() if not self.Stack: self.DOM = X def Main(): Parser = cMyParser() File = open("AddressBook.xml") sax.parse(File, Parser) print Parser.DOM["2"].Email if __name__ == "__main__": Main()