Mark Logic's Open Content Architecture: Best Practices for Content Apps

Track: Product Presentations, Storing XML, Integration

Audience Level: High Level/Technical View

Time: Thursday, November 18 at 09:45

Author: Max Schireson , Vice President, Customer Solutions, Mark Logic

Keywords: Application Architecture, Conversion, Content Management, Content Database, Database, Document Creation, Electronic Publishing, Enterprise Content Management, Integration, Legacy Data Conversion, Publishing, Repository, Search, XML, XML Database, XQuery

Abstract:

Despite the many advancements in relational databases over the last twenty years, these products still do not address a critical business reality: eighty percent of corporate information is content: reports, contracts, proposals, e-mail, compliance documents, training manuals, and so on, produced in varying formats and covering a wide range of topics. This content cannot be normalized into tables and therefore can't be easily managed with a traditional database. Faced with this limitation, some companies turn to enterprise content management systems or search engines to handle content challenges. However, the former lock businesses into a proprietary architecture while the latter are not programmable, offering little more than simple document pointers.

In this presentation, Mark Logic will describe and demonstrate its Open Content Architecture, a comprehensive development model for companies seeking the optimal environment for managing and leveraging unstructured and semi-structured corporate information. OCA describes how systems that store, process, enrich, and deliver content should interoperate.

The OCA model centers on a standards-based data repository with an extensible and pluggable architecture. Customers can develop or purchase tools that integrate with the repository, enabling the use of best-of-breed solutions for loading and enriching content in the data store and publishing it to a range of output formats.

Mark Logic's Content Interaction Server is the ideal repository for the OCA. Content Interaction Server is an enterprise-class database specifically designed to function as the storage hub for the Open Content Architecture. It is fully transactional, runs in a distributed environment and scales to terabytes of data. In addition, it offers full programmability in XQuery, the query language for XML, as well as Java and Microsoft .Net APIs.

Most importantly, Content Interaction Server is built specifically for content. It is schema independent; documents in the database can be queried without normalizing the data in advance. Content can be loaded in any format, and transformations such as format conversion and enrichment can be performed at any time.

Mark Logic will demonstrate a sophisticated content application built using the Open Content Architecture model, based on a collection of 250,000 e-mail messages sent by Enron employees from 1999 to 2002. These messages are presented alongside a variety of supporting data, including mail authors' insider trades, Enron's stock price, the level of "chatter" about various topics over time, and more. The application lets users examine individuals' mail habits - learn who discussed what and who communicates most frequently. It can even make a statistical comparison of two authors to determine if they are the same individual writing under distinct e-mail aliases.

The Open Content Architecture demo unites technologies from a range of vendors to show convincingly how companies can quickly assemble top-flight components into a fully integrated content-centric system. In addition to Mark Logic, the OCA demo presents technology from seven different software publishers:

* Entity and concept extraction from Inxight and ClearForest

* Content clustering and categorization from Intellisophic

* Relational database integration from Composite Software

* Information visualization from Groxis

* Content conversion from Exegenix and Olive Software