| Each class needs a valid children definition (VCD) string to define which tags are allowed as children (a blank VCD indicates that the class is not allowed to have any children). This string looks a lot like a regular expression. It allows the use of parenthesis, braces, modifiers (*, +, and ?), and the | operator just like regular expressions do. NotePut all tags or sets of tags in parenthesis before adding modifiers or the | operator to the VCD. Instead of giving a huge lesson on regular expressions, I'm just going to describe a few sets of allowed children and show you the VCD you could enter to specify such a set. I'm confident that you'll be able to extrapolate from there. NoteFeel free to put whitespace in your VCD's to make them easier to read.
Once you've entered your VCD, hit Tab or click on any other control. As soon as the VCD control loses focus, XMLObjApp will recalculate the list of possible children and add or remove fields to allow you to configure how these children should be processed. To indicate that a class can have raw text as a child, enter the pseudo-tag #PCDATA into the VCD. The #PCDATA pseudo-tag acts like any other tag. NoteThe #PCDATA pseudo-tag must be entered exactly. Case is signifigant. On occassion, you may not want XMLObject to parse some of the XML in your stream. You might want all of an element's subordinate XML to be stored raw for later processing. This is accomplished by using the <XML> pseudo-tag. NotesThe <XML> pseudo-tag must be entered exactly. Case is signifigant. The <XML> pseudo-tag "consumes" all child tags and/or text between the given element's beginning and ending tags. If present, it should be used exclusively as a class' VCD string, and not mixed in with any other tags, modifiers, or operators. The <XML> pseudo-tag places very few requirements on the nature of the subordinate XML. The XML must be well balanced (i.e. it must have an appropriately placed closing tag for every opening tag), but that's about it. The XML captured may be empty. To complete our configuration of a class, we must specify how the children should be attached to the class. There are four fundamental things we need to define for each child:
To illustrate these decisions, let's look at some really contrived XML that describes an engineer: Example 3.5. Engineer XML <Engineer Name="Duane"> <Boss Name="Cecil"> (lots of "boss" data) </Boss> <Spouse>Elizabeth</Spouse> <Child>Herman</Child> <Child>Olaf</Child> <Project ProjCode="210"> (lots of data on project #210) </Project> <Project ProjCode="229"> (lots of data on project #229) </Project> <Project ProjCode="766"> (lots of data on project #766) </Project> </Engineer> To parse data such as this, I would configure my Engineer class in the following way: As you can see from the VCD string, we expect a single boss, the possibility of a single spouse (must not be living in Utah!), any number of children (including none), and any number of projects. The boss will be an instance of class Manager and stored in the Engineer attribute Boss. The boss instance is liable to contain several pieces of information, so we won't reduce it down any. There can only be one spouse, just as there is only one boss, so we will save this information as attribute Spouse. However, there's only one thing of interest in the Spouse attribute – the Spouse's name. Since the spouse has only one piece of real data, it's more convenient to access this data as Engineer.Spouse instead of as Engineer.Spouse.Name. To do this reduction, I've checked the appropriate checkbox and entered the Name member into the appropriate field.NoteWhen reducing data, the child class is still instantiated. Data reduction takes places after the instantiation and XML element processing. The engineer may have any number of children. Each <Child> element is very similar in structure to the <Spouse> element. I've taken advantage of this and used the same class for both. I've configured the Child element capture in much the same way as the <Spouse> element. Each child is reduced to the value of the Name member. The difference is that there can be multiple <Child> elements, but only one <Spouse> element. To handle these multiple elements, I've configured Child to store the data values in a list called Children. Parsing the Engineer XML above with the model shown, will give Engineer.Children a value of ['Herman', 'Olaf']. Like the Boss member, the Projects member will not be reduced. Each project will undoubtably contain multiple pieces of useful data we will want to maintain. However, there can be multiple projects, just as there could be multiple children. We could store these in a list, like we did with the children, but because each <Project> has a unique ProjCode, it will probably be a better choice for us to put these projects in a dictionary (with a key of ProjCode). By putting the projects in a dictionary, we can get a list of project codes with Engineer.Projects.keys(), a list of the projects with Engineer.Projects.values(), or even access a specific project with Engineer.Projects["229"] (for example). This lets us access a specific project without having to search the list for the appropriate list index. NoteRegardless of the underlying index type, XMLObject always creates string type dictionary keys. As mentioned previously, elements can have child text. Configuring how the text should be processed and stored begins with entering the #PCDATA pseudo-tag into the VCD. #PCDATA configuration is as shown: Text may be processed in a variety of ways:
Regardless of how the text is processed, it is saved as the entered attribute name. NoteThe pseudo-tag #PCDATA will catch multiple lines of text. You do not need to put "(#PCDATA)*" in your VCD to indicate that there may be any amount of text. However, if the text is optional, you should indicate this by putting a "(#PCDATA)?" in your VCD instead of just "#PCDATA". As mentioned previously, there are times when you just don't want to parse all the XML in a document. One common situation for this is when one program wants to communicate an HTML-formatted message to another program. Using an <XML> pseudo-tag will accomplish just that. Could be parsed with: To give Error.XML a value of: 'File could <b>not</b> be opened.' Alternately, you can choose to extract the text Exactly and not strip off any extraneous white-space. NoteThe pseudo-tag <XML> will catch any and all XML children. You do not need to put "(<XML>)?" or "(<XML>)*" in your VCD to try and indicate what sort of XML to expect. Likewise, do not mix <XML> with any other tag in your VCD. XMLObject will not enforce any such restrictions, apart from the XML being well-formed. If the XML must be validated, do so by editing in an _end_init function. |