January 6, 2009, Tuesday, 5

Cpp:XmlCppLib

From IdeA thinKING

Jump to: navigation, search

Contents

Introduction

XMLCPP Library is an implementation of XmlPull v1 API in C++. To support all Unicode character set, it use wchar_t and basic_string<wchar_t> to represent character and string. So every names and values returned by XMLCPP Library are basic_string<wchar_t> type.

XmlPull v1 API is a simple to use XML pull parsing API that was designed for simplicity and very good performance both in constrained environment. XML pull parsing allows incremental (sometimes called streaming) parsing of XML where application is in control - the parsing can be interrupted at any given moment and resumed when application is ready to consume more input.

This document will show step by step how to create a simple application that is using XmlPull API to parse XML.

XMLCPP Library does not supports all standard features. Not supported standard features are PROCESS_DOCDECL and VALIDATION. An optional feature supported is REPORT_NAMESPACE_ATTRIBUTES.

XMLCPP Library does not provides Serializer API for writing XML yet. But it'll available soon.


Main features of API

C++ version of XmlPull v1 API provides:

  • simple interface - parser consists of one interface and one exception
  • ease of use - there is only one key method next() that is used to retrieve next event and there are only five events:
    1. START DOCUMENT: document start - parser has not yet read any input
    2. START_TAG: parser is on start tag
    3. TEXT: parser is on element content
    4. END_TAG: parser is on end tag
    5. END_DOCUMENT document finished and no more parsing is allowed
  • performance - the interface is designed to allow implementing very fast XML parsers
  • fully supported XML Information Set


Requirements

XMLCPP Library requires two external libraries to build:

  • libiconv
  • Boost C++ Library.
    • You don't have to have any compiled Boost library. But if you want to build and run unittest, you need the Boost.Test library compiled.


Code step-by-step

First we need to create and initialize an instance of parser. To do this two steps are required:

  1. create an instance of the parser
  2. set needed feature like PROCESS_NAMESPACES (its default value is false to follow the XmlPull v1 API)

and the code to do this may look similar to this:

using namespace xmlcpp;
 
PullParser pp;
pp.set_feature(PullParser::PROCESS_NAMESPACES);

Next step is to set parser input. There are three overloaded set_input() functions. (actually four, but the last one is for advanced use only.)

ifstream fs("note.xml");
pp.set_input(fs);
// or
wchat_t* cstr = L"<?xml ...?><note>...</note>";
pp.set_input(cstr);
// or
wstring str = L"<?xml ...?><note>...</note>";
pp.set_input(str.begin(), str.end());

and now we can start parsing!

Typical XmlPull applicaition will repeatedly call next() function to retrieve next event, process event until the event is END_DOCUMENT:

PullParser::event_type evt = pp.get_event_type();
do {
    switch (evt) {
    case PullParser::START_DOCUMENT:
        process_start_document(pp);
        break;
    case PullParser::START_TAG) {
        process_start_tag(pp);
        break;
    case PullParser::END_TAG) {
        process_end_tag(pp);
        break;
    case PullParser::TEXT) {
        process_text(pp);
        break;
    }
    evt = pp.next();
} while (evt != PullParser::END_DOCUMENT);

In the START_DOCUMENT event, you can retrieve the version, encoding and standalone information.

void process_start_element(PullParser& pp)
{
    wcout << L"<?xml";
    wcout << L" version=\"" << pp.get_version() << L"\"";
    if (!pp.get_encoding().empty())
        wcout << L" encoding=\"" << pp.get_encoding() << L"\"";
    if (!pp.get_standalone().empty())
        wcout << L" standalone\"" << pp.get_standalone() << L"\"";
    wcout << L" ?>";
}

Let see how to process start tag. Processing end tag is very similar - main difference is that the end tag has no attributes.

void process_start_tag(PullParser& pp)
{
    wcout << L"<" << pp.get_name();
    for (int i = 0; i < pp.get_attr_size(); ++i) {
        wcout << L" " << pp.get_attr_name(i)
              << L"=\"" << pp.get_attr_value(i) << L"\"";
    }
 
    if (pp.is_empty_elem_tag()) {
        wcout << L" />";
        pp.next(); // consume next END_TAG token.
    }
    else {
        wcout << L">";
    }
}
 
void process_end_tag(PullParser& pp)
{
    wcout << L"</" << pp.get_name() << L">";
}

If you enabled the PROCESS_NAMESPACES feature, you can get prefix and namespace uri of tag names and attribute names.

And now let see how element content is retrieved and printed:

void process_end_tag(PullParser& pp)
{
    wcout << pp.get_text();
}


Complete sample

Thess sample code snippets come from XMLPrinter.cpp in pakages.


Advanced topic

You can use next_token() instead of next(). From this, you can get more event type like the following.

  • START_DOCUMENT,
  • END_DOCUMENT
  • START_TAG
  • END_TAG
  • TEXT
  • IGNORABLE_WHITESPACE
  • CDSECT
  • PROCESSING_INSTRUCTION
  • COMMENT
  • ENTITY_REF
  • DOCDECL

IGNORABLE_WHITESPACE, PROCESSING_INSTRUCTION, COMMENT, DOCDECL event types are ignored in next() function. And a sequence of TEXT, CDSECT, and ENTITY_REF is combined to one TEXT event in next(). In this case, the ENTITY_REF is expanded before combined to the TEXT event.

Related Sites

  1. Homepage
  2. sf.net for XML C++ Library project.

Download

You can use sf.net cvs service.

cvs -d:pserver:anonymous@xmlcpp.cvs.sourceforge.net:/cvsroot/xmlcpp login
cvs -z3 -d:pserver:anonymous@xmlcpp.cvs.sourceforge.net:/cvsroot/xmlcpp co -P -r rel-0-9-patches xmlcpp

iwongu 16:42, 2 November 2006 (KST)