Abstract
The Danish Ministry of Science, Technology and Innovation has on behalf of the Danish XML Committee contracted a toolkit to support the development of standardizing XML Schemas by user communities. The infostructurbase has four elements:
A website (in English) providing information about the standardized XML Schemas available and the interfaces for the public and private sector.
Tools to support standardization work groups (calendar, discussion groups, file share, etc.)
A repository of XML Schema fragments, interfaces and process descriptions.
A UDDI repository
The purpose of the InfoStructurBase is to provide a free tool, which facilitates the development and standardization of XML based services. The underlying XML Schema data model is exceptional in its extreme enforcement of reuse.
Keywords
Table of Contents
In the autumn of 2000 a committee on digital administration was established under the auspices of the Danish Ministry of Finance. The goal of the committee was to identify strategies for providing simple and inexpensive access to public data. The committee concluded that concrete initiatives concerning the standardization of data exchange ought to be initiated. According to the committee such initiatives would minimize the costs of and simplify access to integration and exchange of data. Further to these conclusions it was decided that the establishment of such initiatives would occur under the governance of the Danish Ministry of Science, Technology and Innovation, and would be based on XML technology.
The Ministry of Science, Technology and Innovation, in cooperation with the Coordinating Information Committee, relevant public authorities, the Association of Local Municipalities and the Association of Danish Regions, as well as private contractors, developed a strategy that essentially deals with the implementation of XML as the standard for exchange of data within the public sector and between the public and private sector.
Behind the project stands the Joint Board of Project E-government, together with the Coordinating Information Committee. There is strong backing behind the recommendation of XML as common public standard for data communication in both the Joint Board of Project E-government and the Coordinating Information Committee. In this light the Coordination Information Committee established the Danish XML Committee, which is responsible for ensuring coherence and momentum in the standardization of XML-based interfaces.
XML has been promoted heavily as a foundation for data interchange in the public and private sector in Denmark and abroad. It is widely accepted that XML is by no means a silver bullet that can solve all the application integration problems. The XML-family of standards and tools simply lowers the entry barrier for developers by simplifying some of the associated tasks.
Past application integration efforts has evolved around the EDIFACT and X12 standards and a number of more and less proprietary methods and tools. The cost of implementing application integration has in many cases been to high, and has been out of reach especially to small and medium sized organizations. It is characteristic that application integration efforts in the public sector has been developed in an ad hoc basis and has rarely been based on agreed upon standards. Ad hoc application integration which is not based on standardized interchange formats - makes it difficult and expensive to replace applications and components. Every application or component has to be tailor made to support specific integration requirements.
The paradox in this situation is the fact that XML and Web Service technology makes it too easy to integrate applications on an ad hoc manner. Most software development environments are capable of exposing business logic and data as Web Services using a variety of wizards. The problem is that by using the wizards the developer leaves XML data definition to the software development environment. Naturally - environments from different vendors will generate XML-interfaces for the same business logic in different ways. Even if two systems shared the same underlying database model - exposing an employee's address would render different representations in XML. Integrating the systems then requires extra effort in transforming messages from one XML instance to another. The wizards of software development environments make it easy to continue with bad habits.
Developers and organizations must take control over the way XML interfaces are defined. Do not leave the definition to the wizards. To service consumers, the initial cost of integrating is the same no matter whether the interface was generated automatically or developed by hand. The interfaces must be standardized, and developers and domain experts must collaborate in the definition of the interfaces.
The goal is standardized interfaces. But how do we develop these interfaces? The W3C XML Schema standard enables us to re-use schema fragments across interfaces.
Looking at the XML.org registry reveals that many schemas are constructed with no or little re-use from other XML vocabularies. This can to some extend be explained with the relatively recent adoption of W3C XML Schema. There are of course good examples of reuse in document oriented markup. An example is the user of the CALS table model in the DocBook standard. Standardization at the message level is by no means bad, but more advantages could be gained if the different domains would share definitions at the type and element level. The namespace concept was introduced in order to support the combination of XML vocabularies. In the new era of data oriented markup, re-use of types and elements from existing schema vocabularies based on the use of namespaces should be widespread due to the following reasons:
Using controlled vocabularies of types and elements, makes transformation between similar messages easier.
Use of namespace prefixes provides extra information about the context of schema constructs. Knowledge about the origin of a namespace has semantic value and qualifies the interpretation of elements.
Well defined schema fragments enable the development of generic handlers for the different constructs. The re-use of schema constructs also extends to software. Without shared vocabularies the software developers has to develop handlers for XML instances which does not differ at the semantic level but only differ at the syntactic level.
Historically re-use has not been prevailing. But the key to successful integration of applications is coordinated development and re-use of existing definitions.
Furthermore it poses a problem that standards authorities like the Dublin Core Metadata Initiative, and UN/CEFACT have not developed authoritative XML Schemas. In order to use the standards, the developers must implement their own XML Schemas from the abstract definitions. The re-use is thus limited to the semantic level. Syntactic re-use is not possible and developers are again forced to do transformations between XML vocabularies with the same semantic content.
The standards bodies will eventually be forced to endorse de facto implementations of the different standards, and developers may have to convert their interfaces in order to comply with the standard also at the syntactical level.
Standardization and open standards are high on the agenda of IT politics in Denmark and in many other countries around the world. New standardization initiatives are formed both nationally and internationally. The initiatives are often competing with other initiatives not only internationally but also locally. In many cases the initiatives are unaware of the existence of similar initiatives. In other cases the competition is caused by rivalry among vendors. Some companies will participate in several competing standardization initiatives and in the end support the one that has the most momentum and support.
In other cases standardization initiatives are unaware that other competing initiatives exist. Promotion and high visibility is important in order to avoid that others initiates similar work. But the problem is that the few global XML registries that exist only seem to contain a fraction of the developed XML vocabularies and standards. The registries only support the submission of finished standards and vocabularies and they are not meant for announcing the formation of a standardization group.
The multitude of similar standards which are competing for no particular reason is another barrier to application integration The benefit of implementing a standard with many alternatives is low because there is no guarantee that others have implemented the same standard.
Standardization initiatives must be communicated and coordinated. A methodology for announcing standardization initiatives at a regional, national or global level is needed.
Future tenders for public sector IT-systems in Denmark are expected to require vendors to support the development standardized XML-interfaces around the system. This fact will be a driving force in the formation of new national standardization initiatives. The standardization initiatives are facing a number of difficulties:
Time to market. Since the driving force behind the standard in many cases will be a tender for the development of one or more IT systems, the contractors will have to develop the standard in a very short time frame.
Locating existing standards. Due to the low visibility of existing standardization initiatives it may be difficult to locate existing standards and standardization initiatives in order to coordinate.
Lack of motivation to reuse. For standards developed at the national level the motivation to re-use parts of existing vocabularies may be low. The benefit from this extra work is low compared to the benefit of having a standard at all.
Critical mass. With very little time to develop the first version of a standard, it could prove difficult to identify and motivate all relevant parties which ought to participate in the development of the standard.
In the private sector - most standards and interfaces are developed due to a demand in the market. Demands from user communities and competition among companies drive the development. The public sector does only to some extend have an element of competition at the local and regional level. Most state institutions are not concerned with competition. As such state institutions do not have to compete with other institutions on providing the best and most standardized interfaces.
Without the element of competition as a driving factor it is more difficult to find a business model for public institutions. The problem is that in many cases, the organization taking the cost is not the one harvesting the benefit. There is no bottom line incentive to provide services and data using a standardized interface.
The philosophy of the Danish XML-project is developed with consideration to some of the challenges described in the previous section. An extra challenge is the fact that public institutions in Denmark benefit from a high degree of autonomy at the local and regional level. The Ministry of Science, Technology and Innovation does not have the authority to require the use of XML for data interchange or for that sake demand that institutions collaborate in the development of standards. Some means of governance was needed.
Fortunately the XML project quickly gained widespread acceptance and support and in May 2001 the XML Committee was established with representatives from all leves of the public sector. The purpose of the XML Committee is to standardize, drive and support the use of XML in application integration. The committee has no formal authority to enforce the use of specific standards and interfaces in public institutions, but this fact has not been an issue until now.
With a growing number of uncoordinated XML integration projects, it was at an early stage decided that the project should encompass both managed and anarchistic development of XML interfaces.
In the standardization activity, the XML-Project provides resources for groups of users called communities of practice (CoP). The CoPs resolve the social issues and the online infrastructure and methodology enable them to develop standards for their own domains. The CoPs shall describe the requirements for data and information interchange and interoperability. They must classify and clarify the terms and concepts of the interchange or interoperability needs of their user communities by creating standard information object definitions.
The information object definitions that CoPs produce are submitted to the standardization groups of the project and the facilitating XML-Secretariat who review, accept, and publish the results. These results form the basis for the technical work on the XML-schema standards.
The work of these CoPs is supported by a social framework that allows a well-balanced co-operation between the representatives of the User Communities and IT technicians. The framework provides an infrastructure that assists the CoPs in progressing their contributions of knowledge and requirements to the IT experts who can create the XML standards. [HT 2003]
To bridge the "gap" between the social and technical architectures, CoPs will create information object definitions, which are technical descriptions of application information created using their domain knowledge and their viewpoints. Associated with the Social Framework and the methodology are a set of tools for defining models, information object definitions, metadata, and describing the needed data to be interchanged or used in interoperability.
Information object definitions shall contain community information and concepts, terminology, and associations, meanings, as well as the ways of working with and using the particular information. These descriptive tasks belong to the social issues of standardizing and will be prepared by representatives from different communities working together in communities of practice. The information object definitions are input to the IT technicians who rarely will have an understanding of the domain or a feeling for the importance or relationships of the specifics of the application. [HT 2003]
The following is the minimum required process when developing standards with the use of the InfoStructureBase:
Identify and model the interchange requirements of the application domains, creating communities of practice (CoPs)
The CoPs will analyze the information content from their chosen domain and create standardized information object definitions. These definitions describe the optimal levels of granularity for information from the CoP point of view. The information object definitions are complete with formalized XML schemas and attributes, metadata, and naming conventions providing both context and semantics.
The standardized information object definitions will be published in the ISB and available for all interested parties.
XML experts can use the published information object definitions to create XML Schemas for interchange and interoperability - which will enable the re-using of the information between companies, ministries, local governments, and other organizations.
Please refer to the Handbook for Standardization [HT 2003] for further introduction to the process of standardization.
Developing standardized reusable XML Schemas and XML-based services is not an easy task. Furthermore it must be done in a coordinated manner in order to facilitate reuse and interoperability. The realization of this end also requires knowledge of other areas than XML Schema. The Danish XML Committee has developed a number of so called 'cookbooks' aiding the process of system integration in accordance with guidelines set forth by in the Danish XML Project.
Implementation Handbook: Guidelines for project leaders - runs orthogonal to the other Handbooks.
Standardization Handbook [HT 2003]: Guidelines for creating XML-standards.
Modeling XML Schemas with UML: Provides modeling principles in UML and rules for mapping from UML models to XML Schemas.
XML Schema Handbook [MB 2002]: Provides guidelines for writing, naming, and administrating XML Schemas.
Integration Handbook: Technical guidelines describing subjects such as choosing protocols, security, and versioning of services.
At the end of 2001 The Ministry for Science, Technology and Development, in cooperation with the Danish XML Committee took the initiative to establish a so-called InfoStructureBase. The InfoStructureBase contains a number of tools that support schema development and standardization of XML interfaces. Furthermore, the base contains a repository of schemas, schema fragments, interface descriptions and process descriptions. The aim of the base is to bring about a tool that can help to realize the goal of creating easy and cheap access to public data. The InfoStructureBase shall, in other words, support the collaboration surrounding the development of schemas and interface descriptions for the exchange of data, as well as the standardization process itself.
The suggestions and requirements for the InfoStructureBase are based on a number of assumptions. Fundamental to the vision behind it is the assumption that XML is well suited to the structuring and exchange of data. The next assumption is that the XML Schema standard is well suited to expressing structure and semantics for the XML data that is exchanged.
It is certain that the XML standard is here to stay. The XML Schema standard is a second-generation validation language, whereas a Document Type Definition (DTD) was a first generation validation language. The XML Schema standard has such clear improvements on the DTD standard that a standardization process can be based upon it. With XML Schema it is possible to re-use fragments of XML Schema across diverse schemas. For example: the declaration of an address structure can be re-used in many different schemas. This possibility has guided the vision that lies at the foundation of the InfoStructureBase. Combining XML Schemas from different domains, institutions and companies requires that the XML Schema components are developed in accordance with a basic set of rules and constraints.
The philosophy of the Danish standardization process is at first to standardize the basic data types, elements and schemas, which will be used throughout the public and private sectors. In order to kick-start this development the Danish XML Committee established a Danish Core Components TC, which has been given the task of identifying and developing core schemas pertaining to the central public registries whose data is used everywhere in the public administration and as well in exchanges with the private sector. The Danish XML Committee thus has a standardized set of data-types, elements and schema fragments, which, in line with this handbook, it will require to be used as the basis for each schema it is asked to authorize. It is now up to the individual authorities and companies to develop schemas or schema fragments based on these rules.
An important driving force behind the establishment of the InfoStructureBase is the vision of a common public-sector data model, OIOXML, where standardization and re-use are the keywords for development. The common public-sector data model builds on a set of simple principles, which can be illustrated by an analogy with LEGO® bricks.
Core types and elements are the data models’ smallest bricks. The bricks can be formed by any institution, company or person and separated with the help of namespaces. Bricks from different namespaces can be combined and included in composite schemas. A description of a house built in LEGO bricks is analogous to a composite schema. The house is built on a LEGO plate and consists of basic blocks, windows, doors and roof pieces. The individual bricks can come from many different building-sets (namespaces). Each building set comes with an inventory (package schema), which describes precisely which parts the building-set consists of. The description of the house (the composite schema) can be included in a combined description (interface schema) of a property (description of house, garden, sewers etc.)
One of the reasons that children and adults all over the world can exercise their creativity with LEGO toys, putting together bricks from different building-sets in new and exciting ways, is that the “interface” between the bricks is well defined. Each single brick must be built to conform to precise measurements and guidelines if it is to be included in an arbitrary LEGO construction. These measurements and guidelines are analogous to this handbook; the goal of this is to ensure that schema fragments in the public-sector data model can be used in new contexts, which the designers of the individual schema fragments could not have imagined.
Schemas in the common public data model can be grouped as follows:
Core types. Schemas with common data types and enumerated values which are re-used throughout the public and private sectors. For example ”PersonGivenName”, ”PersonSurnameName” and ”PersonTaxIdentifier” are used again and again in countless contexts. Data types should be utilized as components in other schemas. Data types will typically be gathered together in logical packages as, for instance, a person package. Thus the schema developer can access person specific data types via this package.
Core elements. Schemas with core element and attribute declarations are employed to standardize the naming of frequently used elements and attributes. An element declaration must be based on a declaration of a data type to which it belongs. For example, the declaration of the element ”CivilRegistrationNumber” could be based on the type ”CivilRegistrationNumberType”.
Packages. Packages contain references to core elements and types. Packages can be organised logically by domain. A domain could be, for example, addresses, customers, businesses, forms etc. The advantage of using packages is that the schema developer can simply refer to a single package that contains the elements and types required. A package will typically contain only references to schemas defining core types and elements. These references will be by both ”import” and ”include”.
Composite schemas. Composite schemas build on the core types and elements. In contrast to packages, composite schemas will typically describe structure and semantics, as well as the cardinality of elements for larger structures to be reused in other contexts. A composite schema is a concrete implementation of a given domain’s understanding of e.g. an address. The address can be implemented in different domains but built on the same basic package.
Interface schemas. Interface schemas describe an interface between two systems, and are typically composed of one or more composite schemas. The root element in an interface schema mirrors the context in which the schema is to be implemented. For example, the root element in a data exchange for reporting sick leave could be named ”AbsenceNotification”.
The approach taken in the Danish government Digital Services Project is still in its early stages. Experiences from pilots will be collected in the summer of 2003. Early experiences from pilots working with the paradigm have shown that it is possible to develop interfaces around applications using our paradigm.
The business model can not always be found for all institutions. It is not always the institution taking the cost that benefits and saves. For now the approach taken is to allow institutions to charge service consumers based on the raw costs associated with exposing the service. This practice has already proven to be a barrier to application integration in the public sector. The public sector as a whole has to compete with the private sector on service levels. In order to provide good service to the citizens and businesses - the public sector must act as an enterprise and not as multitude of independent private companies which will not acct unless there is a direct benefit or saving for the company itself.
XML and Web Service technology makes it possible for service consumers to integrate to service providers at a very low cost and effort. The cost and effort can easily be multiplied several times if consumption of the service is associated with signing of cumbersome legal documents and payment agreements. Institutions depending on data from several public sources will have a lot of administration related to contracts and payment for the services. It is not easy to explain to the taxpayers why public institutions should use the taxpayer's money to administer charging each other for services. Administration methods with a lover cost associated must to be found. One method could be only to charge organizations with unusual usage patterns drawing on significant resources from the service provider. Administration in the public sector cost could also be lowered if a central agency was in charge of making contracts and billing services. In that case institutions would only have to make contracts with one agency instead of with each public supplier.
Standards authorities should take control over the definition of their vocabularies in approapiate languages. Thus avoiding that service providers themselves express the standards i various syntactically different representations.
I would like to thank my colleagues at the National IT and Telecom Agency (Michael Bang Kjeldgaard, Palle Aagaard, Erik Hannibal Terp, René Løhde) and Hugh Tucker from GIAXS for contributions to the development of ideas discussed in this paper.
[MB 2002] Brun, Mikkel Hippe and René Løhde 2002. XML Schema Handbook, The Danish XML Committee, Copenhagen, Denmark. http://purl.oclc.org/NET/oio/cookbooks/XMLschema
[HT 2003] Tucker, Hugh 2003. Standardization Handbook,The Danish XML Committee, Copenhagen, Denmark. http://purl.oclc.org/NET/oio/cookbooks/Standardization
![]() ![]() |
Design & Development by deepX Ltd. |