Parsing a Simple XML Document using C++

TinyXml is an excellent choice for applications that need to do just a bit of XML processing. Its source distribution is small, it’s easy to build and integrate with projects, and it has a very simple interface. It also has a very permissive license. Its main limitations are that it doesn’t understand XML Namespaces, can’t validate against a DTD or schema, and can’t parse XML documents containing an internal DTD. If you need to use any of these features, or any of the XML-related technologies such as XPath or XSLT, you should use the other libraries.
The TinyXml parser produces a representation of an XML document as a tree whose nodes represent the elements, text, comments and other components of an XML document. The root of the tree represents the XML document itself. This type of representation
of a hierarchical document as a tree is known as a Document Object Model (DOM). The TinyXml DOM is similar to the one designed by the World Wide Web Consortium (W3C), although it does not conform to the W3C specification.
In keeping with the minimalist spirit of TinyXml, the TinyXml DOM is simpler than the W3C DOM, but also less powerful. The nodes in the tree representing an XML document can be accessed through the interface TiXmlNode, which provides methods to access a node’s parent, to enumerate its child nodes, and to remove child nodes or insert additional child nodes. Each node is actually an instance of a more derived type; for example, the root of the tree is an instance of TiXmlDocument, nodes representing elements are instances TiXmlElement, and nodes representing text are instances of TiXmlText. The type of a TiXmlNode can be determined by calling its Type( ) method; once you knowthe type of a node, you can obtain a representation of the node as a more derived type by calling one of the convenience methods such as toDocument( ), toElement( ) and toText( ).

These derived types contain additional methods appropriate to the type of node they represent. It’s noweasy to understand example code. First, the function textValue( ) extracts the text content from an element that contains only text, such as name, species, or dateOfBirth. It does this by first checking that an element has only one child, and that the child is a text node. It then obtains the child’s text by calling the Value( ) method, which returns the textual content of a text node or comment node, the tag name of an element node, and the filename of a root node.
Next, the function nodeToContact( ) takes a node corresponding to a veterinarian or trainer element and constructs a Contact object from the values of its name and phone attributes, which it retrieves using the Attribute( ) method. Similarly, the function nodeToAnimal( ) takes a node corresponding to an animal element and constructs an Animal object. It does this by iterating over the node’s children using the NextSiblingElement( ) method, extracting the data contained in each element, and setting the corresponding property of the Animal object. The data is extracted using the function textValue( ) for the elements name, species, and dateOfBirth and the function nodeToContact( ) for the elements veterinarian and trainer.

In the main function, I first construct a TiXmlDocument object corresponding to the file animals.xml and parse it using the LoadFile( ) method. I then obtain a TiXmlElement corresponding to the document root by calling the RootElement( ) method. Next, I iterate over the children of the root element, constructing an Animal object from each animal element using the function nodeToAnimal( ). Finally, I iterate over the collection of Animal objects, writing them to standard output. One feature of TinyXml that is not illustrated in code expamle is the SaveFile( ) method of TiXmlDocument, which writes the document represented by a TiXmlDocument
to a file. This allows you to parse an XML document, modify it using the DOM interface, and save the modified document. You can even create a TiXmlDocument from scratch and save it to disk:
// Create a document hello.xml, consisting
// of a single “hello” element
TiXmlDocument doc;
TiXmlElement root(“hello”);
doc.InsertEndChild(root);
doc.SaveFile(“hello.xml”);

Use the TinyXml library. First, define an object of type TiXmlDocument and call its LoadFile( ) method, passing the pathname of your XML document as its argument. If LoadFile( ) returns true, your document has been successfully parsed. If parsing was successful, call the RootElement( ) method to obtain a pointer to an object of type TiXmlElement representing the document root. This object has a hierarchical structure that reflects the structure of your XML document; by traversing this structure, you can extract information about the document and use this information to create a collection of C++ objects.
For example, suppose you have an XML document animals.xml representing a collection of circus animals, as shown in first sample code. The document root is named animalList and has a number of child animal elements each representing an animal owned by the Feldman Family Circus. Suppose you also have a C++ class named Animal, and you want to construct a std::vector of Animals corresponding to the animals listed in the document.


<?xml version="1.0" encoding="UTF-8"?>
<!-- Feldman Family Circus Animals -->
<animalList>
<animal>
<name>Herby</name>
<species>elephant</species>
<dateOfBirth>1992-04-23
<veterinarian name="Dr. Hal Brown" phone="(801)595-9627"/>
<trainer name="Bob Fisk" phone="(801)881-2260"/>
</animal>
<animal>
<name>Sheldon</name>
<species>parrot</species>
<dateOfBirth>1998-09-30</dateOfBirth>
<veterinarian name="Dr. Kevin Wilson" phone="(801)466-6498"/>
<trainer name="Eli Wendel" phone="(801)929-2506"/>
</animal>
<animal>
<name>Dippy</name>
<species>penguin</species>
<dateOfBirth>2001-06-08</dateOfBirth>
<veterinarian name="Dr. Barbara Swayne" phone="(801)459-7746"/>
<trainer name="Ben Waxman" phone="(801)882-3549"/>
</animal>
<!--<span class="hiddenSpellError" pre=""-->animalList>

 

Code sample shows how the definition of the class Animal might look. Animal has five data members corresponding to an animal’s name, species, date of birth, veterinarian, and trainer. An animal’s name and species are represented as std::strings,
its date of birth is represented as a boost::gregorian::date from Boost.Date_Time, and its veterinarian and trainer are represented as instances of the class Contact, also defined in both code examples shows how to use TinyXml to parse the document animals.xml, traverse the parsed document, and populate a std::vector of Animals using data extracted from the document.

The header animal.hpp:


#ifndef ANIMALS_HPP_INCLUDED
#define ANIMALS_HPP_INCLUDED
#include <ostream>
#include <string>
#include <stdexcept> // runtime_error
#include gregorian/gregorian.hpp>
#include <boost/regex.hpp>
// Represents a veterinarian or trainer
class Contact

{
public:

Contact( ) { }
Contact(const std::string& name, const std::string& phone)
: name_(name)
{
setPhone(phone);
}
std::string name( ) const { return name_; }
std::string phone( ) const { return phone_; }
void setName(const std::string& name) { name_ = name; }
void setPhone(const std::string& phone)
{
using namespace std;
using namespace boost;
// Use Boost.Regex to verify that phone
// has the form (ddd)ddd-dddd
static regex pattern("\\([0-9]{3}\\)[0-9]{3}-[0-9]{4}");
if (!regex_match(phone, pattern)) {
throw runtime_error(string("bad phone number:") + phone);
}
phone_ = phone;
}
private:
std::string name_;
std::string phone_;
};

// (for completeness, you should also define operator!=)
bool operator==(const Contact& lhs, const Contact& rhs)
{
return lhs.name() == rhs.name() && lhs.phone() == rhs.phone();
}
// Writes a Contact to an ostream
std::ostream& operator<<(std::ostream& out, const Contact& contact)
{
out << contact.name( ) << " " << contact.phone( );
return out;
}
// Represents an animal
class Animal

 {
public:
// Default constructs an Animal; this is
// the constructor you'll use most
Animal( ) { }
// Constructs an Animal with the given properties;
Animal( const std::string& name,
const std::string& species,
const std::string& dob,
const Contact& vet,
const Contact& trainer )
: name_(name),
species_(species),
vet_(vet),
trainer_(trainer)
{
setDateOfBirth(dob);
}
// Getters
std::string name( ) const { return name_; }
std::string species( ) const { return species_; }
boost::gregorian::date dateOfBirth( ) const { return dob_; }
Contact veterinarian( ) const { return vet_; }
Contact trainer( ) const { return trainer_; }
// Setters
void setName(const std::string& name) { name_ = name; }
void setSpecies(const std::string& species) { species_ = species; }
void setDateOfBirth(const std::string& dob)
{
dob_ = boost::gregorian::from_string(dob);
}
void setVeterinarian(const Contact& vet) { vet_ = vet; }
void setTrainer(const Contact& trainer) { trainer_ = trainer; }
private:
std::string name_;
std::string species_;
boost::gregorian::date dob_;
Contact vet_;
Contact trainer_;
};

// (for completeness, you should also define operator!=)
bool operator==(const Animal& lhs, const Animal& rhs)
{
return lhs.name() == rhs.name() &&
lhs.species() == rhs.species() &&
lhs.dateOfBirth() == rhs.dateOfBirth() &&
lhs.veterinarian() == rhs.veterinarian() &&
lhs.trainer() == rhs.trainer();
}
// Writes an Animal to an ostream
std::ostream& operator<<(std::ostream& out, const Animal& animal)
{
out << "Animal {\n"
<< " name=" << animal.name( ) << ";\n"
<< " species=" << animal.species( ) << ";\n"
<< " date-of-birth=" << animal.dateOfBirth( ) << ";\n"

<< " veterinarian=" << animal.veterinarian( ) << ";\n"
<< " trainer=" << animal.trainer( ) << ";\n"
<< "}";
return out;
}
#endif // #ifndef ANIMALS_HPP_INCLUDED


 

 

Parsing animals.xml with TinyXml:

#include <exception>
#include  // cout
#include <stdexcept> // runtime_error
#include <cstdlib> // EXIT_FAILURE
#include  // strcmp
#include <vector>
#include <tinyxml.h>
#include "animal.hpp"
using namespace std;
// Extracts the content of an XML element that contains only text
const char* textValue(TiXmlElement* e)
{
TiXmlNode* first = e->FirstChild( );
if ( first != 0 &&
first == e->LastChild( ) &&
first->Type( ) == TiXmlNode::TEXT )
{
// the element e has a single child, of type TEXT;
// return the child's
return first->Value( );
} else {
throw runtime_error(string("bad ") + e->Value( ) + " element");
}
}
// Constructs a Contact from a "veterinarian" or "trainer" element
Contact nodeToContact(TiXmlElement* contact)
{
using namespace std;
const char *name, *phone;
if ( contact->FirstChild( ) == 0 &&
(name = contact->Attribute("name")) &&
(phone = contact->Attribute("phone")) )
{
// The element contact is childless and has "name"
// and "phone" attributes; use these values to
// construct a Contact
return Contact(name, phone);
} else {
throw runtime_error(string("bad ") + contact->Value( ) + " element");

}
}
// Constructs an Animal from an "animal" element
Animal nodeToAnimal(TiXmlElement* animal)
{
using namespace std;
// Verify that animal corresponds to an "animal" element
if (strcmp(animal->Value( ), "animal") != 0) {
throw runtime_error(string("bad animal: ") + animal ->Value( ));
}
Animal result; // Return value
TiXmlElement* element = animal->FirstChildElement( );
// Read name
if (element && strcmp(element->Value( ), "name") == 0) {
// The first child element of animal is a "name"
// element; use its text value to set the name of result
result.setName(textValue(element));
} else {
throw runtime_error("no name attribute");
}
// Read species
element = element->NextSiblingElement( );
if (element && strcmp(element->Value( ), "species") == 0) {
// The second child element of animal is a "species"
// element; use its text value to set the species of result
result.setSpecies(textValue(element));
} else {
throw runtime_error("no species attribute");
}
// Read date of birth
element = element->NextSiblingElement( );
if (element && strcmp(element->Value( ), "dateOfBirth") == 0) {
// The third child element of animal is a "dateOfBirth"
// element; use its text value to set the date of birth
// of result
result.setDateOfBirth(textValue(element));
} else {
throw runtime_error("no dateOfBirth attribute");
}
// Read veterinarian
element = element->NextSiblingElement( );
if (strcmp(element->Value( ), "veterinarian") == 0) {
// The fourth child element of animal is a "veterinarian"
// element; use it to construct a Contact object and
// set result's veterinarian

result.setVeterinarian(nodeToContact(element));
} else {
throw runtime_error("no veterinarian attribute");
}
// Read trainer
element = element->NextSiblingElement( );
if (strcmp(element->Value( ), "trainer") == 0) {
// The fifth child element of animal is a "trainer"
// element; use it to construct a Contact object and
// set result's trainer
result.setTrainer(nodeToContact(element));
} else {
throw runtime_error("no trainer attribute");
}
// Check that there are no more children
element = element->NextSiblingElement( );
if (element != 0) {
throw runtime_error(
string("unexpected element:") +
element->Value( )
);
}
return result;
}

Main Program:


int main( )
{
using namespace std;
try {
vector animalList;
// Parse "animals.xml"
TiXmlDocument doc("animals.xml");
if (!doc.LoadFile( ))
throw runtime_error("bad parse");
// Verify that root is an animal-list
TiXmlElement* root = doc.RootElement( );
if (strcmp(root->Value( ), "animalList") != 0) {
throw runtime_error(string("bad root: ") + root->Value( ));
}
// Traverse children of root, populating the list
// of animals
for ( TiXmlElement* animal = root->FirstChildElement( );
animal;
animal = animal->NextSiblingElement( ) )

{
animalList.push_back(nodeToAnimal(animal));
}
// Print the animals' names
for ( vector::size_type i = 0,
n = animalList.size( );
i < n;
++i )
{
cout << animalList[i] << "\n";
}
} catch (const exception& e) {
cout << e.what( ) << "\n";
return EXIT_FAILURE;
}
}

 

Posted on June 17, 2013, in C++ and tagged , , , , , , , , , . Bookmark the permalink. Leave a comment.

Leave a comment