XML Enables a High-Volume, Near Real-Time Information Analyst Support System (IASS)

Track: Case Studies, Core Technologies, Knowledge Management

Audience Level: High Level/Technical View

Time: Thursday, November 18 at 11:00

Author: William Wolf , Assistant VP/Division Manager, Science Applications International Corporation (SAIC)

Keywords: XSLT, Scalable, Performance, SGML, XML, Text, Distributed, Information Analysis, Document, Search, Query, Fuzzy Match, Boolean, Unstructured Data, Terrorism, Flexible, Fault-Tolerant, Terabyte, Unicode, Foreign Language, International, Security, API, Standards-Based, ODBC, DOC, PDF, RTF, Java, C, C++, WebDAV, SOAP, Z39.50

Abstract:

In the War on Terrorism, the people are represented by two quite intertwined and critically important groups: the Information Analysts who draw conclusions and provide those to decision-makers, and the Information Management Developers, who use XML to assist the analysts with correlation, transformation, assimilation and delivery of that information.

The key challenge is managing and monitoring the flow of information that might alert an information analyst to a high-threat event. The information that must be indexed and stored for immediate and term analysis comes in a multitude of formats. The information may include, for example, eye-witness accounts, transportation and shipping records, records of purchases of controlled chemicals, public announcements and even blogs. Success demands the ability to fuse data, including meaning and context, from disparate sources into a coherent whole. New records arrive at the rate of thousands per second, and overall data storage is in the terabytes. Fast load-to-index times are required, as are full-text search and retrieval capabilities. Scalability and storage efficiency are a must.

We have developed and deployed multiple systems to meet this challenge. The IASS described here implements an architecture that satisfies all these requirements, and is extremely scalable, flexible, and fault-tolerant. The IASS fuses structured and unstructured information from across the enterprise and provides analysts with full search capabilities across billions of records. XML is the enabling technology for IASS, and in conjunction with XSLT provides a common language for configuration, data interchange, data access and presentation.

IASS’s data sources include relational databases, text and XML repositories, and analytic applications. XML and text data records comprise about half of the over 4 billion records stored in a variety of languages and structures. The IASS strategy for managing large volumes of diverse data is to handle each with the most appropriate DBMS for that particular data type. The use of the text-centric system for XML data overcomes performance and efficiency issues associated with using an RDBMS with text or XML extensions. The text DBS also allows creation of customized text parsers and indexing algorithms, providing unique search features. Full support of XML, including XPATH , provides the ability to easily load multi-language and hierarchical XML documents. The text database copes with high data ingest volumes: the millions of new records that are added to IASS every day dictate that approximately 1000 new XML records per second are indexed.

The IASS application uses a collection of distributed, loosely-coupled components to find, collect, analyze, and synthesize information. A commercial web services messaging system is used to bind the components together; XML-based messaging, allows the components to interoperate in virtually any language, to fulfill virtually any function. The IASS components, which serve as database adapters, user interfaces, or to reflect business logic, all connect to web services in a hub-and-spoke architecture. The loosely-coupled design provides the added benefit of fault-tolerance. In fact, this feature has been exploited to migrate components from machine to machine, during business hours, with no downtime.

XML is used ubiquitously as markup to facilitate data fusion. XSLT engines (software and hardware) are used to perform just-in-time transformations of XML information into the format requested by the client application. XSLT re-purposes data for a variety of applications and audiences, such as management and the news media. XSLT also transforms XML into intermediate forms optimized for automated analysis.

XML and XML-related standards provide the underpinnings on which the highly successful IASS application rests. These technologies allow IASS developers to focus on the problem at hand, and apply the best tools to implement solutions, using XML for information encoding, transformation, assimilation and delivery.