ProjectPythonSourceForge
Welcome
Introduction
Suitability
Future Plans
Setup
Downloading XMLObject
Installing XMLObject
XMLObjApp
The XMLObjApp Application
Classes
Special Attributes
XML Attributes
Child Tags
VCD Strings
Configuring a Typical Child Type
Configuring a #PCDATA Child Type
Configuring an <XML> Child Type
File Menu
Miscellaneous Operation Notes
Unicode and ASCII Strings
Manually Editing Your Parser
Outputting XML
XMLObject
XMLObject -- XML to Object Conversion
Stack -- Tracks the Document Hierarchy
All Docs on One Page

Child Tags

VCD Strings

Each class needs a valid children definition (VCD) string to define which tags are allowed as children (a blank VCD indicates that the class is not allowed to have any children). This string looks a lot like a regular expression. It allows the use of parenthesis, braces, modifiers (*, +, and ?), and the | operator just like regular expressions do.

Note

Put all tags or sets of tags in parenthesis before adding modifiers or the | operator to the VCD.

Instead of giving a huge lesson on regular expressions, I'm just going to describe a few sets of allowed children and show you the VCD you could enter to specify such a set. I'm confident that you'll be able to extrapolate from there.

Note

Feel free to put whitespace in your VCD's to make them easier to read.

One or more <User> tags.

(<User>)+

Any number of <Foo>, <Bar>, or <Snafu> tags – in any order.

((<Foo>) | (<Bar>) | (<Snafu>))+

A <Nickname>, possibly a <BestFriend>, an <Email>, one to two <Address> tags, a <City>, a <State>, and a <Zip> – in that order.

<Nickname> (<BestFriend>)? <Email> (<Address>){1,2} <City><State><Zip>

Once you've entered your VCD, hit Tab or click on any other control. As soon as the VCD control loses focus, XMLObjApp will recalculate the list of possible children and add or remove fields to allow you to configure how these children should be processed.

#PCDATA Pseudo-tag

To indicate that a class can have raw text as a child, enter the pseudo-tag #PCDATA into the VCD. The #PCDATA pseudo-tag acts like any other tag.

Note

The #PCDATA pseudo-tag must be entered exactly. Case is signifigant.

<XML> Pseudo-tag

On occassion, you may not want XMLObject to parse some of the XML in your stream. You might want all of an element's subordinate XML to be stored raw for later processing. This is accomplished by using the <XML> pseudo-tag.

Notes

The <XML> pseudo-tag must be entered exactly. Case is signifigant.

The <XML> pseudo-tag "consumes" all child tags and/or text between the given element's beginning and ending tags. If present, it should be used exclusively as a class' VCD string, and not mixed in with any other tags, modifiers, or operators.

The <XML> pseudo-tag places very few requirements on the nature of the subordinate XML. The XML must be well balanced (i.e. it must have an appropriately placed closing tag for every opening tag), but that's about it. The XML captured may be empty.

Configuring a Typical Child Type

To complete our configuration of a class, we must specify how the children should be attached to the class. There are four fundamental things we need to define for each child:

  • What class should be instantiated?

  • What object member name should be used?

  • How should the data be saved?

  • Should all of the data be saved, or just a portion?

To illustrate these decisions, let's look at some really contrived XML that describes an engineer:

Example 3.5. Engineer XML

<Engineer Name="Duane">
    <Boss Name="Cecil"> (lots of "boss" data) </Boss>
    <Spouse>Elizabeth</Spouse>
    <Child>Herman</Child>
    <Child>Olaf</Child>
    <Project ProjCode="210"> (lots of data on project #210) </Project>
    <Project ProjCode="229"> (lots of data on project #229) </Project>
    <Project ProjCode="766"> (lots of data on project #766) </Project>
</Engineer>

To parse data such as this, I would configure my Engineer class in the following way:

Figure 3.6. Engineer class Configuration

Engineer class Configuration

As you can see from the VCD string, we expect a single boss, the possibility of a single spouse (must not be living in Utah!), any number of children (including none), and any number of projects.

The boss will be an instance of class Manager and stored in the Engineer attribute Boss. The boss instance is liable to contain several pieces of information, so we won't reduce it down any.

There can only be one spouse, just as there is only one boss, so we will save this information as attribute Spouse. However, there's only one thing of interest in the Spouse attribute – the Spouse's name.

Since the spouse has only one piece of real data, it's more convenient to access this data as Engineer.Spouse instead of as Engineer.Spouse.Name. To do this reduction, I've checked the appropriate Reduce checkbox and entered the Name member into the appropriate field.

Note

When reducing data, the child class is still instantiated. Data reduction takes places after the instantiation and XML element processing.

The engineer may have any number of children. Each <Child> element is very similar in structure to the <Spouse> element. I've taken advantage of this and used the same class for both.

I've configured the Child element capture in much the same way as the <Spouse> element. Each child is reduced to the value of the Name member. The difference is that there can be multiple <Child> elements, but only one <Spouse> element. To handle these multiple elements, I've configured Child to store the data values in a list called Children.

Parsing the Engineer XML above with the model shown, will give Engineer.Children a value of ['Herman', 'Olaf'].

Like the Boss member, the Projects member will not be reduced. Each project will undoubtably contain multiple pieces of useful data we will want to maintain. However, there can be multiple projects, just as there could be multiple children. We could store these in a list, like we did with the children, but because each <Project> has a unique ProjCode, it will probably be a better choice for us to put these projects in a dictionary (with a key of ProjCode).

By putting the projects in a dictionary, we can get a list of project codes with Engineer.Projects.keys(), a list of the projects with Engineer.Projects.values(), or even access a specific project with Engineer.Projects["229"] (for example). This lets us access a specific project without having to search the list for the appropriate list index.

Note

Regardless of the underlying index type, XMLObject always creates string type dictionary keys.

Configuring a #PCDATA Child Type

As mentioned previously, elements can have child text. Configuring how the text should be processed and stored begins with entering the #PCDATA pseudo-tag into the VCD. #PCDATA configuration is as shown:

Figure 3.7. #PCDATA Pseudo-tag Configuration

#PCDATA Pseudo-tag Configuration

Text may be processed in a variety of ways:

Strip

White-space is stripped from the beginning and ending of the text.

Exact

Text is captured exactly. No white-space is stripped.

Split

Text is split into a list, broken by the given delimiter.

Strip-Split

White-space is stripped from the beginning and ending of the text. Then, the remaining text is split into a list, broken by the given delimiter.

Split-Strip

Text is split into a list, broken by the given delimiter. Then, white-space is stripped from the beginning and ending of each list item.

Split-Strip

White-space is stripped from the beginning and ending of the text. Then, the remaining text is split into a list, broken by the given delimiter. Finally, white-space is stripped from the beginning and ending of each list item.

Regardless of how the text is processed, it is saved as the entered attribute name.

Note

The pseudo-tag #PCDATA will catch multiple lines of text. You do not need to put "(#PCDATA)*" in your VCD to indicate that there may be any amount of text.

However, if the text is optional, you should indicate this by putting a "(#PCDATA)?" in your VCD instead of just "#PCDATA".

Configuring an <XML> Child Type

As mentioned previously, there are times when you just don't want to parse all the XML in a document. One common situation for this is when one program wants to communicate an HTML-formatted message to another program. Using an <XML> pseudo-tag will accomplish just that.

Example 3.6. HTML Embedded in XML

<Error>
    File could <b>not</b> be opened.
</Error>

Could be parsed with:

Figure 3.8. Parsing HTML Within XML

Parsing HTML Within XML

To give Error.XML a value of:

'File could <b>not</b> be opened.'

Alternately, you can choose to extract the text Exactly and not strip off any extraneous white-space.

Note

The pseudo-tag <XML> will catch any and all XML children. You do not need to put "(<XML>)?" or "(<XML>)*" in your VCD to try and indicate what sort of XML to expect.

Likewise, do not mix <XML> with any other tag in your VCD.

XMLObject will not enforce any such restrictions, apart from the XML being well-formed. If the XML must be validated, do so by editing in an _end_init function.