XML Europe 2002 logo

Merging XML Files: A New Approach Providing Intelligent Merge of XML Data Sets

Abstract

As XML becomes ubiquitous so the need for powerful tools to manipulate XML data becomes more pressing. Merging XML is particularly tricky, but often necessary to consolidate data feeds from heterogeneous systems, or to synchronize submissions of XML fragments which make up a larger document. An automated mechanism for defining and controlling such merges has been developed and is demonstrated to provide a consistent, adaptable and resilient solution to this problem. Integration into an information pipeline allows limitless customization.

As XML tools become more powerful and able to handle many of the peculiarities of real data, so the possibility of achieving a genuine, intelligent merge of XML data sets becomes a reality. Increasingly users are wanting to apply concurrent engineering to XML, i.e. to allow multiple users to add to a single data set simultaneously.

This paper proposes a systematic approach to merging based on the use of an intermediate XML file that contains both of the files to be merged in a formal structure that clearly identifies data that is common to both files and data that is unique to one of the files. The advantage of this intermediate file is that many of the conflicts that typically emerge when XML data is merged can be identified and resolved. The resolution of these conflicts is a key to achieving a useful merge.

The paper addresses issues of real data in terms of how to control the correct correspondence between the data within the files. Finding this correspondence is a necessary step before a sensible merge can be executed. Applying these techniques means that XML files containing libraries of data, e.g. XML Schemas or SVG files, can be intelligently merged in an automated way. This improves quality and reduces the required human effort involved in these essential processes.

The paper uses XML Schema files as an example to illustrate the merge operation and to identify areas where special care is needed.

The proposed method provides XML users with a general-purpose merge operation for the amalgamation of XML data and thus gives another reason for adopting XML as the preferred format for documents and data.


The full paper was not available at the time the proceedings were created. Please check the conference web site, http://www.xmleurope.com, to find an updated version of this paper.

Biography

Robin La Fontaine has a degree in Engineering Science from the University of Oxford and a Masters degree in Computer Science. His company has, over the last few years, developed a method for identifying changes in XML documents and data and representing these changes in XML and this is implemented as DeltaXML (www.deltaxml.com). Robin is a member of the STEP XML Working Group within ISO (ISO10303). He has been project manager of several European research projects including the STEPWISE project and the XML/EDI European Pilot Project. His background is in CAD data exchange and Lisp programming.