Parser Capabilities and Architecture
There are two general approaches to parsing and handling XML, each with its own style of API:- Tree-based API: This approach maps an XML document into an internal tree structure that conforms to the logical structure described by a schema, then facilitates the navigation and manipulation of that tree. Many tree-based APIs are available, including the DOM (Document Object Model) proposed by the World Wide Web Consortium (W3C). The XML Path Language (XPath), XML Inclusions (XInclude), and XML Pointer Language (XPointer) are WC3 programmatic interfaces for querying and handling the XML in DOM-style tree structures.
- Event-driven API: In this approach the parser reports parsing events (such as the start and end of each element) to an application as it encounters them. In C-based APIs, this reporting is accomplished through callbacks implemented by the application to handle the types of events. SAX is the best-known example of this style of parsing API. This type of parser is sometimes referred to as a streaming parser.
For example, say you have a simple XML file such as the following:
<?xml version= "1.0" encoding="UTF8"> |
<article author="John Doe"> |
<para>This is a very short article.</para> |
</article> |
- Started parsing document
- Found start tag for element
article
- Found attribute
author
of elementarticle
, value “John Doe” - Found start tag for element
para
- Found characters
This is a very short article.
- Found end tag for element
para
- Found end tag for element
article
- Ended parsing document
Event-driven parsing—because it deals with only one XML construct at a time and not all of them at once—consumes much less memory than tree-based parsing. It is ideal for situations where performance is a goal and modification of the parsed XML is not. One such application for event-driven parsing is searching a repository of XML documents (or even one XML document with multiple “records”) for specific elements and doing something with the element content. For example, you could use NSXMLParser to search the property-list preferences files on all machines in a Bonjour network to gather network-configuration information.
Event-driven parsing is less suitable for tasks that require the XML to be subjected to extended user queries or to be modified and written back to a file. Event-driven parsers such as NSXMLParser also do not offer any help with validation (that is, it verifying whether XML conforms to the structuring rules as specified in a DTD or other schema). For these kinds of tasks, you need a DOM-style tree. However, you can construct your own internal tree structures using an event-driven parser such as NSXMLParser.
In addition to reporting parsing events, an NSXMLParser object verifies that the XML or DTD is well-formed. For example, it checks whether a start tag for an element has a matching end tag or whether an attribute has a value assigned. If it encounters any such syntactical error, it stops parsing and informs the delegate.
Although the parser “understands” only XML and DTD as markup languages, it can parse any XML-based language schema such as RELAX NG and XML Schema.
No comments:
Post a Comment