XML Europe 2003 logo

Information Bus

Web Services in Action at the United Nations

Abstract

In this presentation we discuss our experience of using web services and XML technologies to support interoperability and integration of data at the United Nations FAO. We evaluate the use of these technologies and discuss the benefits and drawbacks encountered during the development of our approach.

Keywords


Table of Contents

1. Introduction
2. The Problem
3. Objectives
4. Approach
5. The Information Bus
6. Country Profiles Application
7. Conclusion
Biography

1. Introduction

The Food and Agriculture Organization of the United Nations (FAO) is a specialized agency of the United Nations which leads international efforts to defeat hunger. It helps developing countries modernize and expand agriculture, forestry and fisheries and ensure good nutrition for all. One of its most important functions is to collect, analyse and disseminate information to assist governments fight hunger and achieve food security. Towards this effort FAO has established the World Agricultural Information Centre (WAICENT) for agricultural information management and dissemination.

Within the WAICENT framework, a large amount of data, represented in various formats and in many different languages, are generated every day and stored in different types of data sources. In all, there are over 200 such data sources. People need to access and manipulate data distributed in the various sources from both inside and outside the organization. It is important to share data between systems quickly and easily, without requiring the systems to be tightly coupled. In simple terms, the existing systems need to "talk" to each other. Another main problem is related to the fact that within the organisation the use of two different technologies (Microsoft/ASP and J2EE/JSP/Servlets) is widespread and it is, therefore, very difficult to impose a single technology throughout the FAO.

An approach named 'Information Bus', based on web services technology, has been designed and deployed in the FAO to promote interoperability between various data sources, in a way that can be implemented on multiple vendor platforms, with minimal effort and disruption to existing systems. The approach supports the standard representation and exchange of meta data as well as the multilingual requirements of an institution like FAO, in which documents are expressed in the five official languages (English, French, Spanish, Chinese, and Arabic) as well as Russian and other local variations.

The principle objective of the approach is to create an environment where new web-based information systems can be developed quickly and easily, using any technology platform, by accessing information from any of the existing information systems at the FAO.

2. The Problem

The Food and Agriculture Organization of the United Nations has approximately 200 systems supplying information for access on the World Wide Web, deployed using two different technologies: Microsoft ASP and Java JSP/servlets. These data sources need to share and exchange data between each other in an easy way. However, the use of the two technologies is already widespread in the organization and it is almost impossible to impose a single technology throughout the FAO. In addition, it is necessary to avoid rewriting of existing applications.

The existing information infrastructure is shown in Figure 1. It consists of information sources (database systems) containing different types of data including, but not limited to, different types of documents written in five official languages - English, French, Spanish, Chinese and Arabic (and some in Russian); electronic bibliography references; statistical data; maps and graphics; news and events from different countries; and web information.

click image for full size view

Figure 1. Existing information structure at the FAO

Different people generate documents in different formats, which are inserted in the databases using web interfaces. The data are accessed from the databases in HTML format, through applications available on the Internet. Examples of these applications are WAICENT Information Finder (online search tools), FAOBIB (online catalogue of bibliography), FAO Virtual Library (digital archive), and FAOSTAT online database about statistics of various areas).

The FAO users are farmers, scientists, traders, government planners, and non-governmental people, both inside and outside the organization, that need to access and publish information.

Although the existing setting addresses some of the requirements of integrating disparate distributed systems, there are limitations involving budgetary or technical challenges, inflexibility, lack of standardization, and difficulty of scalability and extensibility. It is important to have a technology that is inexpensive, easy to implement, easy to maintain and based on open standards, to allow leverage of knowledge and existing resources without having to rewrite existing applications.

The technology needs to support interoperability of existing data sources and management of multilingual variants without changing the database structures. Currently, it is necessary to customize and add database structures for each different language. There is no standard way to manage language variants of documents or other data structures. This generates inconsistencies between applications in the way that they manage the different languages. In addition, the database models are not easily extensible when new data or language variants are added.

Other problems are related to the support of metadata representation and metadata exchange in a standard way, as well as use of standard ontology formats. In the FAO a document repository has been developed with the objective of storing and disseminating all publications electronically. It stores meeting notes, documents, metadata, and index data. Different ASP interfaces have been created to allow searching the document repository by type, language, and subject.

However, there is no standard way to manage language variants of documents or other data structures like specific country information and metadata. The multilingual Agricultural Thesaurus (AGROVOC) from FAO has been applied to the web as a strategy to ensure some conformity in resource description/discovery. However, it falls short of being a complete tool for this purpose in view of a need for more specific subject terminology and richer ontological relations than are offered by traditional thesaurus.

3. Objectives

In order to tackle the problems we have described, a lightweight 'information integration' approach was proposed, based on Web services and related XML technologies. The approach was developed in a way that can be implemented on multiple vendor platforms, with minimal effort and disruption to existing systems.

The main goal of the approach is to create an environment where new web-based information systems can be developed quickly and easily, using any technology platform, by accessing information from any of the existing 200 information systems at the FAO, and supporting the multilingual characteristics of the institution in which documents are expressed in five official languages as well as Russian and other local variations.

Other objectives included the implementation of dynamic report generator and development of an XML document repository to handle metadata and language variants in a generic way.

The overall objectives, set at the start of the project, were to:

  • Create an environment to develop new web-based information systems in an easy and quick way, using any technology platform

  • Create a generic XML-based information infrastructure to support multilingual information in an easy and standard way

  • Create an application integration structure based on Web Services to allow interoperability of FAO systems and information sources for delivery through web portals

  • Demonstrate standard XML representations for handling metadata and multilingual documents

  • Compare the use of different Web Services technologies

4. Approach

It was decided to create a prototype system to test the design approach and to allow the overall solution to be demonstrated throughout the FAO, in order to win the backing of the many departments and groups who would need to be involved in a full roll-out. For the prototype, a representative sample of about 10 internal information systems were wrapped with Web services interfaces and used to re-create one of the FAO's externally facing applications, called Country Profiles.

Country Profiles is an application in the FAO that allows access to country-specific information without the need to search individual databases and systems. It is an information retrieval tool that groups in a single area the vast amount of information available at FAO based on the global activities in agriculture and development, and classifies the information by country. The application uses three categories to group information:

  • FAO's areas of expertise - sustainable development, economy, agriculture, fisheries, forestry, and technical cooperation,

  • FAO's priority areas for interdisciplinary action (PAIA)- ranging from biological diversity to trade in agriculture, fisheries, and forestry, and

  • AGROVOC - a metadata ontology with over 4000 terms breaking down the first two metadata categories to a lower level (i.e. Cattle Breeding). AGROVOC is mainly used in the Library applications at FAO.

click image for full size view

Figure 2. Country Profiles using the Information Bus

The specific objectives set before developing the Country Profiles prototype were to:

  • Design and prototype an XML document repository that handles metadata and language variants in a generic way

  • Design and prototype an implementation of Web Services to provide standard access to existing systems on the FAO network

  • Prototype an implementation of Web Service wrappers around existing systems to provide Country Profiles information

  • Prototype implementation of a dynamic Country Profiles report generator in PDF

  • Prototype the Country Profiles applications combining two different technologies

    • MS .NET and J2EE

In all, the following Web Services wrappers were created for information source to be included in the Country Profiles application:

  • Statistics

    • FAOSTAT - an internal FAO statistics system

    • World Bank Statistics (external)

  • Documents

    • Online Catalogue - an existing internal bibliography system

    • EIMS (Electronic Information Management System) - an existing internal repository of full text documents

    • RAP - a new internal XML document and meta data repository

  • Maps

    • METART (Meteorlogical and Artemis System) - an internal map application

    • GeoNetwork - an internal map application

  • News and Events

    • NEMS (News and Events Management System) - an internal news application

  • Web Pages

    • BBC News Online - an external new service

5. The Information Bus

The information bus is the system created to support interoperability of various information sources at the FAO. The approach consists of wrapping the various data sources with Web service interfaces in which information inputs and outputs are passed as XML structures.

The concept of the information bus is that all data passed through it is represented in standard XML formats. These formats can be imposed in a regulated fashion by publishing the XML schemas being used and validating instances of messages. Regardless of the formats used by the existing systems, the same XML syntax is used for input and output parameters on the Web services.

For example, all data related to country, language or currency is represented in a single XML format, which uses (a) ISO 3166 country code (3 letter), (b) ISO 639-1 language code (2 letter), (c) ISO 4217 currency code, respectively. With Web services it is not necessary to re-engineer existing systems to new XML standards. However, it is necessary to enforce XML standards in the Web services interfaces. For example, the parameters for operations involving language codes always use the 2-character ISO 639 code.

The Web services were developed for systems containing information about statistics, documents, maps, news and events. These systems can be:

  • internal to the FAO, for which the development team had access to the application source code

  • internal to the FAO, but the development team had no access to the application source code

  • external to the FAO

The management of information, including handling of multilingual variants is also based on XML. We propose to move structured information out of database fields and represent them in XML documents to allow a more generic model, which is easier to administer and to extend to new languages (e.g. there is a growing need to support Russian, in addition to the five existing official languages). Whereas existing systems use their own (non-standard) database structures to model multilingual data, the XML approach provides a generic way to manage structured information to any schema.

click image for full size view

Figure 3. Information Bus Architecture

The way the architecture handles metadata is also based on XML and metadata vocabularies and ontologies can stored in the XML repository. The metadata are represented as RDF and RDF Schema, with the optional use of XML Topic Maps. RDF is used to specify metadata on resources, i.e. values of properties for the resources. RDF Schema is used to define classes of resources and the properties that instances of each class can take. In addition, RDF Schema and XML Topic Maps can be used to define ontologies, which capture the relationship between classes, resources, and properties that compose a vocabulary. XML Schemas are also used to define range of values for a property and stored in a vocabulary.

The assignment of constraint metadata is based on standard ontologies that can be published (ie publicly available) or developed in-house, and are also represented in XML. This facilitates importing and exporting of all XML metadata held in participating systems.

The XML repository stores resources (documents) in a relational database, using a Java interface based on an extended version of the XML:DB API that caters for document variants (e.g. different language variants of the same document) and metadata associated with documents. The repository is also wrapped as a Web service to allow access of documents by metadata and/or language.

The architecture can contain two Universal Discovery, Description, and Integration (UDDI) registries to support discovery of information. One UDDI registry is internal to the FAO and assists with the sharing and exchange of information between the data sources internal to the organization. The other UDDI registry is used to support the sharing and exchange of data between the data sources external to FAO. In the initial deployment of the architecture, only the internal registry was active.

An example of the XML structure passed in the information bus is shown in Figure 4. It consists of a SOAP message enriched with metadata from ontologies represented in RDF. In this example, the XML structure represents a query about documents containing information of forestry (Keyword), in Senegal (Country - SEN), written in English (Language - EN). The transformation from the standard XML representation used in as the input parameters of the Web service, to the native input parameters of the system is implemented in the Web Service code itself. This is achieved using mapping structures from the native input parameters of the application (strings, integers) to the ISO representations outlined in the information bus.

click image for full size view

Figure 4. SOAP Message on the Information Bus

The information bus supports three different types of Web services that can be used to create applications which access the data sources in the FAO. These three types of Web service are called support, relevance, and content.

  • Support services are utilities to return standard representations of countries, metadata categories, and language translations.

  • Relevance services are used to identify the Web services that are relevant to a particular application context and the setting of parameters necessary to call the identified Web service, as illustrated in Figure 5. In this example the Web service with ID 900 contains a description of general maps and should be accessed by using parameters such as Country, Language, and Category.

  • Content services are invoked to return XML content from existing information sources, through Web services interfaces with parameters for language, country, subject, and others.

click image for full size view

Figure 5. Details Returned by the Content Service

6. Country Profiles Application

The Country Profiles application has been developed using the information bus technology. Figure 6 presents the web page used as the interface to the application. In the figure it is possible to see the three different types of Web service used in the application.

click image for full size view

Figure 6. Country Profiles Application

Firstly the three dropdown lists under the banner at the top of the page are invoked from the support Web services described above. These set the state of the application and are currently set to English (EN), Afghanistan (AFG) and FAO's Fields of Expertise for the metadata. The categories to the left of the page (General Information) are also populated from the same metadata support Web service.

Slightly below is the fourth dropdown list, this is populated using the relevance service, which takes inputs from the above three services and generates a list of available Web services which meet the current state of the application. It also contains the exact parameters to be sent to each content service when the user chooses an application (see Figure 5 for an example).

Finally in the main body of the screen you can see an example of an invoked content Web service, in this example the service returns News information about the selected country from a system names EIMS.

The Country Profiles application also supports dynamic report generation based on data extracted from the various information sources. The reports are assembled as XML and rendered as PDF, by using XML Stylesheet Language: Formatting Objects (XSL:FO) and the open source Formatting Objects Processor (FOP) from the Apache XML Project. The reports are generated based on information content selected by the user.

When the user chooses a country and language from the support Web services this sets the state of the client and the relevance Web service is used to define the information available to the user in that context. Then when the user chooses to generate a dynamic report they are presented with the option to invoke different Web services, depending on the context. These Web services create the different sections of the report, according to the preference of the user.

Once the user has chosen the services to invoke in the creation of the report, the report generator calls all the Web services simultaneously using multi-threading. The report is built in memory in an order that depends on which Web service returns results first; the final report, in the correct order is compiled and generated once the last Web service returns results. The whole process takes approximately 60 seconds from invoking the services to report generation; a normal report will involve between 30 and 50 different Web services.

click image for full size view

Figure 7. Dynamic Generation of Country Profile Report

7. Conclusion

The Information Bus has been proven and deployed in a system which provides:

  • an information architecture to support multilingual information in an extensible and standard way

  • standard XML representations of meta data related to topics, languages and countries

  • an application integration structure based on web services to allow interoperability of FAO systems and information sources for delivery through web portals

  • a 'toolkit' approach that can be used to make any information system within the FAO available on the Information Bus with minimal impact on the existing systems

  • the facility to integrate external data sources and feeds with information generated by systems internal to the FAO

  • the ability to create customised views of information, accessed through web services, that are configured dynamically as the information sources change

  • an environment which supports different web services technologies: Microsoft .NET and J2EE.

The Information Bus, and the prototype Country Profiles application, are being used to solve problems of information integration at the UN FAO, and has proved that Web services technology is able to address both technical and organisational challenges.

Biography

Dr John Chelsom is Managing Director of the CSW Group, a company dedicated to providing object-level information management solutions using XML, SGML and database technology. Originally trained as an electrical engineer, John worked first as an X-Ray engineer and later gained a PhD for work on the application of knowledge based systems in medicine. Since founding CSW he has been responsible for the design and development of XML and SGML information management systems for some of the world's most prestigious healthcare, engineering and publishing organisations.

John is a regular speaker at XML conferences and seminars, was a contributing author for the SGML Buyer's Guide and is a presenter of the Technology Appraisals seminar series on XML.

Chief, Information Dissemination Mamagement Branch

Stephen Katz graduated from the University of Chicago in 1980 with a Bachelor's Degree in Social Science. He has over 20 years of experience in information technology and knowledge management, which has culminated in his current responsibilities as head of the information dissemination program of the Food and Agriculture Organization of the United Nations (FAO).

At the present time, he is actively engaged in an international initiative to develop a consensus on common metadata standards aiming to increase access to agricultural information and to facilitate information exchange between partners. Mr. Katz is a regular speaker at international meetings and the author of a number of technical papers on the subject of information and knowledge management.

Dr Zisman is currently a lecturer in the Department of Computing at City University, and a member of the Requirements Engineering group in this department. She is a Member of the ACM, an Associated Member of the IEE, and a Member of XML UK. Before joining City University she was a Post Doctoral Research Fellow in the area of Software Systems Engineering, in the Department of Computer Science at University College London (UCL).

Professor Summers is Head of the Department of Information Science at Loughborough University. His main teaching areas are Informatics, Systems and Information Management. His research interests include Health Informatics, Biosignal Interpretation, Digital Libraries in Medicine, Systems Tools (Methods and Methodologies) Knowledge Management, Enterprise Modeling, Behaviour of Complex Adaptive Systems and Development Informatics.