Abstract
The ubiquity of Microsoft Office products is a reality of business in the global enterprise. Even in organizations moving to a standards-based extensible information management system, MS Word has remained a dominant format for inter-enterprise information exchange, and in many circumstances is a required format. This creates a significant work-flow issue if the document provider is using XML as the primary document format in any of their organizational processes. This paper details the design and implementation of a system which transforms XML to Microsoft Word documents in a completely extensible fashion.
There are several commercial products which are specifically designed to address the XML to Word issue. Depending on the specific business requirements, these may present a sufficient ROI. The methodology described in this paper is for those who cannot afford such COTS solutions, wish to leverage existing in-house knowledge of XML and XSLT, or need the degree of customization that the approach laid out provides.
The fundamental problem in producing Word Documents from XML is the basic one of moving from a semantically rich document format to one that is solely format oriented. XSLT's design addresses this issue. The problem with trying to use XSLT directly is that it implies using RTF as an output target. Producing RTF output that contains all of the styling required, is valid, and is robust to schema changes is notoriously hard to do. The system design described in this paper takes the novel approach of letting Microsoft Word do the styling for us. The basic design has the following flow: XML Source Document ----> XSLT transformation to a structurally consistent XML format ----> DOM processing of intermediary form used to programmatically construct the Microsoft Word document.
The basic design described in this paper requires only an XSLT stylesheet engine, an XML Parser, and programmatic means of communicating with the Microsoft Word COM API. For the reference implementation we use Python as the basic implementation language, 4Suite XML tools for XML parsing and XSLT transformation, and the Pythono win32 extensions for COM communication.
Readers will be left with a clear understanding of the complexities involved in an XML to Word solution and the technical knowledge needed to do it.
![]() ![]() |
Design & Development by deepX Ltd. |