Powering Pipelines with JAXP

Track: Core Technologies, Case Studies, Integration

Audience Level: Technical View

Time: Thursday, November 18 at 09:00

Author: Nigel Whitaker , DeltaXML (Monsell EDM Ltd)

Author: Thomas Nichols , DeltaXML (Monsell EDM Ltd)

Keywords: Java, JAXP, SAX, XSLT

Abstract:

The JAXP API allows Java programmers easy access to the power and flexibility of XML parsing and filtering and XSLT transformation. However, while many programmers utilize JAXP for simple XML parsing or single-shot XSLT transformation, going further to construct processing pipelines often proves difficult.

Using JAXP to construct pipelines of processing elements is a good idea; it allows complex problems to be decomposed into a number of simpler steps or components and also, in theory, provides the ability to benefit from concurrent processing. With careful attention to issues such as avoiding disk IO and, through the use of SAX event streaming to avoid re-parsing XML data, pipelines can provide good runtime performance.

However, in practice the construction of pipelines is often a difficult process for Java programmers. For example, programmers are often unsure whether to construct a pipeline using XMLFilters, TransformerHandlers or both; they start off by adopting some existing example code and then run into problems.

Using experience gained assisting and supporting programmers when constructing JAXP pipelines, this paper presents classification schemes, diagrams and tables which try to explain the pipeline construction process. Examples will show the construction of both simple and advanced processing pipelines. We will also describe some commonly encountered issues and problems, such as: preserving the DOCTYPE declaration or entity references, preserving whitespace and controlling output indentation and then present solutions or workarounds to overcome limitations in the current pipeline architecture.

We also briefly explain some of the challenges we encountered creating new XML-centric software components which are designed to integrate with the existing JAXP pipeline components and describe the rationale for our subsequent design decisions. Finally we will review future developments and proposals for XML processing pipelines from the Java Community Process (JCP) and W3C.