Mark Logic's Open Content Architecture: Best Practices for Content Apps

Keywords: Application architecture, Conversion, Content management, Content database, Database, Document Creation, Electronic Publishing, Enterprise Content Management, Integration, Legacy Data Conversion, Publishing, Repository, Search, XML, XML database, XQuery

Max Schireson
Vice President, Customer Solutions
Mark Logic
San Mateo
California
United States of America
max.schireson@marklogic.com

Biography

As Vice President of Customer Solutions, Max is responsible for ensuring successful customer deployment of Mark Logic products. This includes the delivery of professional services as well as the creation of value-added applications and solutions built on Mark Logic's technology. Prior to joining Mark Logic, Max held a variety of executive positions at Oracle Corporation, including Vice President of Applications Development and Chief Applications Architect. In addition to his product responsibilities where he led a team of over 200 professionals and ran an $80 million business, Max held IT responsibilities for Oracle's web store, which accounted for over 70% of Oracle's US orders, and Oracle's web support infrastructure which processed over 80% of Oracle's support requests for 400,000 Oracle customers. In addition, Max started Oracle's eCommerce consulting practice, which he grew to over 200 consultants worldwide and over $50 million in revenue.


Abstract


Despite the many advancements in relational databases over the last twenty years, these products still do not address a critical business reality: eighty percent of corporate information is content: reports, contracts, proposals, e-mail, compliance documents, training manuals, and so on, produced in varying formats and covering a wide range of topics. This content cannot be normalized into tables and therefore can't be easily managed with a traditional database. Faced with this limitation, some companies turn to enterprise content management systems or search engines to handle content challenges. However, the former lock businesses into a proprietary architecture while the latter are not programmable, offering little more than simple document pointers.

In this presentation, Mark Logic will describe and demonstrate its Open Content Architecture, a comprehensive development model for companies seeking the optimal environment for managing and leveraging unstructured and semi-structured corporate information. OCA describes how systems that store, process, enrich, and deliver content should interoperate.

The OCA model centers on a standards-based data repository with an extensible and pluggable architecture. Customers can develop or purchase tools that integrate with the repository, enabling the use of best-of-breed solutions for loading and enriching content in the data store and publishing it to a range of output formats.

Mark Logic's Content Interaction Server is the ideal repository for the OCA. Content Interaction Server is an enterprise-class database specifically designed to function as the storage hub for the Open Content Architecture. It is fully transactional, runs in a distributed environment and scales to terabytes of data. In addition, it offers full programmability in XQuery, the query language for XML, as well as Java and Microsoft .Net APIs.

Most importantly, Content Interaction Server is built specifically for content. It is schema independent; documents in the database can be queried without normalizing the data in advance. Content can be loaded in any format, and transformations such as format conversion and enrichment can be performed at any time.

Mark Logic will demonstrate a sophisticated content application built using the Open Content Architecture model, based on a collection of 250,000 e-mail messages sent by Enron employees from 1999 to 2002. These messages are presented alongside a variety of supporting data, including mail authors' insider trades, Enron's stock price, the level of "chatter" about various topics over time, and more. The application lets users examine individuals' mail habits - learn who discussed what and who communicates most frequently. It can even make a statistical comparison of two authors to determine if they are the same individual writing under distinct e-mail aliases.

The Open Content Architecture demo unites technologies from a range of vendors to show convincingly how companies can quickly assemble top-flight components into a fully integrated content-centric system. In addition to Mark Logic, the OCA demo presents technology from seven different software publishers:

* Entity and concept extraction from Inxight and ClearForest

* Content clustering and categorization from Intellisophic

* Relational database integration from Composite Software

* Information visualization from Groxis

* Content conversion from Exegenix and Olive Software


Table of Contents


1. Product Presentation Paper

1. Product Presentation Paper

Since this was a product presentation, no paper was prepared for the proceedings.

XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.