Extreme Markup Languages 2003: Proceedings

Proceedings Home Page

STnG — a Streaming Transformations and Glue framework

K. Ari Krupnikov [Research Associate; University of Edinburgh, HCRC Language Technology Group]
Abstract

STnG (pronounced “sting” 1 ) is a framework for processing XML and other structured text. In developing STnG, it was our goal to allow complex transformations beyond those afforded by traditional XML transforming tools, such as XSLT, yet make the framework simple to use. We claim that to meet this goal, a system must:

  1. support and encourage the use of small processing components
  2. offer a hierarchical tree-like view of its data
  3. factor out facilities for input chunking through a pattern/action model
  4. not provide processing facilities of its own, instead invoking processors written in existing languages
STnG is built around common XML tools and idioms, but can process arbitrary structured text almost as easily as XML.

In the first part of this paper, we show how these requirements result in powerful and flexible systems, and how they can be achieved. The balance of this paper describes a processing framework we have developed in Java that implements these requirements. It is available for download at http://stng.sf.net.

STnG — a Streaming Transformations and Glue framework

1 Requirements
1.1 Small processors
1.2 Hierarchical structures and recursive decomposition
1.3 Small chunks of data
1.3.1 Chunking and the pattern/action model
1.3.2 Pattern/action model and non-linear filtering
1.3.3 Chunking and streaming
1.4 Use existing languages to write processors
2 Implementation
2.1 STnG processing model
2.1.1 Switches and cases
2.1.2 The ContentFilter interface
2.2 STnG syntax
2.3 Compilation
2.4 STnG and XPath
3 A real-world example using STnG
3.1 Controlling the parser
3.2 Applying several processors to one stream
3.3 Tree-oriented document fragment processing
3.4 Inline procedural code
3.5 Validation
3.6 XSLT
3.7 Conclusion
4 Appendix A: related work
4.1 SAX filters
4.2 STX
4.3 XPipe
4.4 Regular fragmentation
4.5 XML pipeline definition language