Abstract
This case study will highlight the dramatic improvements underway within the US Defense Intelligence Community that will result in a new intelligence paradigm where information is more reliable, more findable, and more timely.
The US Intelligence Community (IC) recognized the need for information standardization, interoperability, and process automation long before the terrible events of last September. Isolated demonstrations and operational use of XML, among other useful technologies, have been underway for the last five years primarily in publishing. However, in the last year strong IC-wide coordination around metadata, XML, and software tools are helping to make the vision of a better IC come true.
There's no question that XML is an important part of improving the IC. Intelligence analysts work with a multitude of analytical systems and data on a daily basis. Intelligence consumers also have varied needs defined by timeliness, analytical depth, and presentation styles. A majority of the analysis work is presented to consumers in the form of documents and a user's ability to find content is heavily reliant on document level metadata.
XML is being looked at for exchanging content between databases, systems, and users; as a web services protocol for enabling interoperability between systems; and as a delivery format for timely dissemination of content to consumers. This presentation will demonstrate some of the Defense Intelligence Community's unique XML software tool customizations and will highlight the need for more openly integrated data fusion and business intelligence technologies that can be integrated into an XML-based digital production and dissemination environment. The presentation will also introduce some of the problematic technology transition issues that have driven many of the requirements.
Keywords
Table of Contents
There has been a great push since September 11th to bring together the knowledge of disparate agencies and organizations across the Defense, Intelligence, and Law Enforcement Communities to assess both conventional and asymmetrical threats. It has become an "executive directive" to share content within and across areas of responsibility so information can be analyzed by the collective talents of all members of the community and we can prevent egregious acts against our country and citizens. It is believed that fighting our enemies is dependent on this type of sharing and access to information and knowledge. Knowledge is indeed power, but that power can only be efficiently utilized when that knowledge is extracted from people's heads and placed into machines that can be used to fuse, deliver, index, repurpose, and reuse that knowledge.
Today, the Intelligence Community (IC's) private and secure web space is merely a delivery and access medium for data or content. There are no standards for intelligent search and data mining, data exchange, adaptive presentation, and personalization outside of what HyperText Markup Language (HTML) and five year old web technologies can offer. The IC's must go beyond adopting just an information access and display standard such as HTML. It must adopt, as policy, an information understanding standard, a common way of representing data so software can better search, move, display, and manipulate information hidden in contextual obscurity. HTML can't do this because it's an unstructured format designed specifically for interpretation by a web browser for controlling how a Web page should look; it does not represent data. For example, HTML does not:
Provide a standard way for a warfighter to call in close air support on a particular target.
Enable a science and technology center to publish parametric information about a weapon in a format that any receiver can incorporate into an analytical tool.
Provide a standard way to search the intelligence space for all content about a certain topic.
Specify how intelligence can be transmitted in a way that allows a mission planner or warfighter to work offline, respond to immediate crisis situations, issue orders, and disseminate those orders in a standard format.
In short, while HTML provides reasonable facilities for display, it does not provide any standards-based way to manage data; something the IC's must recognize as one of the most significant challenges in today's intelligence environment.
A standard for data representation will expand the IC's web space in much the same way that the HTML standard for display did a few years ago. The data standard will be the vehicle for automation, collaboration, analytical processing, sharing, and interoperability. Intelligence publications, targeting data, parametric data, weapons system profiles, and intelligence data exchanges will all be represented at some point in time in this new data standard. It will open up a wide variety of new uses, all based on a standard representation for moving structured data around the IC's web infrastructure as easily as we move HTML pages today. The data standard will be Extensible Markup Language (XML).
There is a great awakening among IC's membership about the importance, roles, and methods that metadata and XML can play in today's intelligence processing systems and knowledge space. There are numerous initiatives going on starting at the IC'sChief Information Officer (CIO) level, moving down through the "big-5" agencies, and even down to the grassroots levels of the community where the value-added activities and value-decisions are being made. Look outside the IC's immediate boundary to the military services and the US Government in general and you will quickly see that large dollar development activities are well underway to improve the production, sharing, and quality of information using metadata and XML.
One such IC's organization attempting to provide value-added solutions to the metadata and XML debate is the IC Metadata Working Group (IC MWG). Started in 2000 with the charter of addressing metadata and its role within the future system and data architecture of the IC's, the IC MWG has fostered the development of XML metadata standards for IC's-wide use. These standards are intended to be the lingua franca for intelligence of the future and they are but one important aspect to solving the current directive to share information and make it more findable. These standards also contribute toward more accurate and reliable content as well as cost savings and process improvements in the intelligence production cycle.
XML is a tool to enable content automation and interoperability. The XML Recommendation, like its predecessor ISO Standard, is a specification for creating your own language. Custom languages offer an opportunity to tailor the way information is represented to the specific people communicating, the information being communicated, and the process of communication. The IC's is unique in all these cases and therefore warrants its own standard XML language optimized for its business.
While it does require significant effort, the creation of IC's XML standards ensures that content, once expressed in that standard format, can be recognized and reliably processed by all systems and on all platforms. Standards are important in all aspects of communication, and an open standard, such as XML, is the only method capable of ensuring complete information interoperability.
XML significantly enables production automation such that information can be created faster, cheaper, and with greater substance. XML separation of styling and content is the key to being able to create a single, "highly classified" intelligence document that can be filtered to lower security levels and styled on delivery to the many consumers who have different needs and different security accesses. This process takes advantage of computing power to act on the markup and make decisions many times faster than a human. The result is tailored content presented in an optimized form for the end user and for the security domain that information is delivered on.
In order to find relevant information, index and sort that information, and facilitate the discovery of new relationships, each specific piece of content in the intelligence arena must contain standard metadata. Once all intelligence content contains this metadata, the quality and validity of searching increases exponentially. Search engines and knowledge discovery tools use such metadata to find documents with precision, and to discover new relationships within documents.
XML is more than a method of adding structure and metadata to information. A complete set of recommendations from industry consortiums and standards bodies, XML incorporates standards for: filtering information and applying styling to a document, finding a particular piece of content, electronic data interchange, and for finding and requesting information from a web service. The groundswell behind XML is astounding as commercial and government agencies clamor to XML and new web-enabled business practices.
The technical and business arguments for why metadata and XML need to be a part of the IC's are quite simple and understandable, but we have failed to make the arguments adequately and therefore, attempts to use these technologies in a more aggressive way have floundered over the last 30 years. It is hard to believe that the IC's has been users of these technologies in pockets for this long and still hasn't adopted the approach whole heartedly, although there are some "shining stars" who have and are making some wonderful progress.
For the next two to five years XML will be one of the most important aspects of the Internet and of systems development in general. This period has the potential to move metadata and standards into the mainstream of systems development as we see everyone demanding and beginning to provide solutions around these critical technologies. The question is, "Is the IC's prepared to take advantage of this explosion?"
Metadata has always played a part in IC's business, first starting with the bibliographic document metadata managed by librarians and now reaching into dynamic digital storage and exchange standards. Numerous Standard Generalized Markup Language (SGML) and XML applications have been built over the past 30 years for specific "vertical" industries [e.g., a group of users or organizations with a similar interest, such as the Air Transport Association (ATA) or the Department of Defense (DoD's) MILSPEC 38784 for Technical Manuals]. These SGML and XML applications define the semantic tagging structure and controlled values for all types of metadata and markup within a content space. These standard applications promote interoperability within the industry or support a specific type of application, such as Mathematical Markup Language (MathML) or the Organization for the Advancement of Structured Information Standards (OASIS) Exchange Table Model. The IC's has attempted to define some standards specific to their vertical - Intelligence.
A number of years ago the IC's identified a need to enhance how consumers found information on Intelink, a distributed dissemination service operated on the IC's secure TCP/IP network. To support the automated indexing of content hosted on Intelink by the different agencies, organizations, and commands, it was determined that a document metadata standard was needed. The method defined for content providers to communicate that metadata was to store that information within the HTML file using the META element's name/value pairs.
Another technique used on Intelink is to produce a metadata card. The card is a separate HTML file from the actual HTML document. The card contains the crawlable HTML metadata. Among that metadata is a pointer to the actual location of the HTML document. This technique is used to reduce the amount of crawling of actual documents and to simplify access control requirements for the web crawlers.
A consortium of Intelink user-organizations defined a series of name/value pairs expected to be included within every available file on Intelink or defined with a corresponding metadata card. These META elements are crawled and indexed by the Intelink search engine. The Intelink name/value pairs are called the Intelink Metadata Guidelines. These guidelines are exactly that - guidelines. If producers don't use the guidelines, then there is no guarantee that consumers will be able to find the producer's information.
The Intelink Metadata Guidelines cannot be thought of as a markup language. It can be thought of as a controlled vocabulary of sorts implemented within HTML. The vocabulary is defined within the Intelink Metadata Guidelines document. The Intelink Metadata Guidelines take advantage of the only extensible part of HTML syntax - the META element. However, the contents of the name/value pairs can contain any values (possibly erroneous) and are not controlled within the markup language directly using real-time parsing techniques.
A true markup language implies some level of programmatic control and validation defined in the language itself. The HTML META element cannot be programmatically controlled or validated via the markup language during creation or processing time without the development of additional programming logic. An XML application can build in the specification of required or optional named element or attribute constructs and requisite values.
In the last five years, many industries have come together to create standard XML Document Types that define the structure of shared or exchanged documents. Any document written, by a human or by a computer, that conforms to the agreed upon standard can be shared across editors, databases, content management systems, and other programs. Documents can be transformed, indexed, stored, edited, and passed around various systems created by different vendors; as long as they all conform to the published document type.
Such standards exist for news articles, business-to-business transactions, and large industries. Such a standard is required for the IC's so that documents written in disparate organizations can easily be passed around, indexed, and accessed by all members of the community. Rather than simply adopting some other standard, the IC's needs its own suite of XML Standards to meet its unique needs.
IC Metadata Standards for Publications (IC MSP) and IC Information Security Markings (IC ISM) are new standards under development by the IC MWG. The standards are being developed to promote interoperability of intelligence content across producers and consumers of information within the IC's. The IC's Metadata Standards are XML applications that focus on both document and structural metadata. IC MSP consists of the following parts, each of which have usefulness on their own right and can be independently used without the others:
Document Metadata. The standard leverages prior Intelink metadata and significantly enhances the amount and sophistication of document metadata associated with an information asset. The standard defines administrative metadata to assist in management and production process-based discovery (i.e., find all assets published by DIA since August of 2001). The standard also includes descriptive metadata used to communicate the importance or meaning of information found in the information asset (i.e., find assets dealing with counter-terrorism). These models when combined create a bibliographic metadata card that can be embedded in the asset or could reside as a reference and pointing mechanism as a part of a metadata registry system.
Structural - Document Construction. Primarily to support structural enforcement of information asset specifications, the standard provides a modular and extensible method for creating new asset models that incorporate the other parts of the standard. Additionally, the standard provides six common asset models for immediate use or to help kick-start the adoption of the standard. These six models include: report, article, correspondence, analytical packet, briefing, and basic.
Structural - Narrative Construction. Information assets across the IC's are almost always constructed from the usual narrative objects used in communication; namely, paragraphs, lists, notes, tables, media, etc. The standard defines models for these objects in an attempt to support narrative sharing across the IC's. Conceptually no different than HTML narrative objects (i.e., p, ol, ul, li, table), the standard infuses IC's security marking rules, source referencing, linking constructs, and limited presentational emphasis constructs.
Concept Identification. Many topics and concepts within IC's narrative define what the asset is really about. The narrative helps relate topics and concepts while communicating important information about those entities. The standard defines a limited number of topic tags that can be used to identify people, places, and things. When used, knowledge discovery is enhanced as consumers can target their searches and more easily sort through the finding results.
The many data-oriented benefits of the IC's Metadata Standards include:
Administrative and Descriptive Summarization. To ensure that a document and its subsections are properly indexed and easily found by decision-makers and analysts, most IC's documents contain a related metacard, or Intelink metadata. The IC's Metadata Standards allows a significant amount of this metadata to be incorporated in the document itself, describing author specifics, classification, keywords, subjects, and Intelligence Function Codes (IFC) codes. In addition, sourcing information, dates, and copyright information exist at the paragraph, quote, and media levels, allowing this information to be available for searching, and for reuse.
Document or Product Enforcement. IC's documents from different agencies all contain textual information, media objects, and database queries that adhere to carefully planned outlines. This document structure cannot be enforced by generic word processors, and yet it is too complex for simple form insertion. XML allows for the strict enforcement of document outlines, with the flexibility to allow for multiple paragraphs, lists, and other content difficult to implement with templates or form outlines.
Security Marking Business Rules Enforcement. IC's documents are unique in the requirement for security classifications at the document, section, title, and text block levels. No publicly available XML specification addresses this need. The IC's Metadata Standards contain security classifications at all these locations in a document, and ensures that an author cannot forget this information. The standards also require authors or editors to enter foreign releasability information, expiration dates, and other aspects of security.
Single-Source, Multi-Channel Publishing. A single document can take many forms; it can be styled into different product forms, sent out as message traffic, or stored in a database. A document's summary section may appear in a daily briefing; its analysis in a warfighter's back pocket. It can have various filtered versions; by security classification, by foreign releasability information, and by detail levels desired by the audience. Only XML allows for publishing in this fashion.
Intelligence Interoperability. Throughout the IC's, there are various offices using different tools to create multiple forms of output. Searching through the millions of documents in Microsoft Word format, Wordperfect format, text documents, message traffic, postscript files, text files, and the like is a task that search engines attack through brute force and massive indexes. With community-wide standards in XML, each office can choose the authoring and publishing suites they prefer - while XML output allows for interchange, reuse, and standard indexing.
Many other metadata standards are in the works throughout the IC's, mostly focusing on different forms of structural metadata to handle different data types, such as motion video, geospatial, and intelligence unique content. Many of these standards are being represented in XML. The XML content itself is inherently more dynamic allowing raw XML data streams to be rendered through interactive applets which give consumers information that can be flexibly viewed from different perspectives. Likewise, the XML serves as an interchange format for intelligence data enables producers and consumers to exchange data using a business-to-business transaction model.
Often times there are industry standards that can be relied on for these purposes, such as the Geospatial Markup Language (GML) being developed and promoted by the leading Geospatial Information System (GIS) experts in the Open GIS Consortium. The IC's is also heavily involved in industry standards bodies dealing not only with GIS material, but also motion video and multimedia interchange. The IC's is also using the Sharable Content Object Reference Model (SCORM) metadata standard developed in the learning management systems market as part of a push to integrate the various courseware and delivery systems available throughout the community.
The beauty of XML is that its multi-purpose design allows creative people to design standards for whatever they need. However, the IC's must be cautious in the development of too many standards. Fewer are probably better and if there are metadata standards available in industry, the IC's might expect those standards to be supported in out of the box software. The IC's does have some unique data requirements, mostly for security, and therefore the community must expect that most standards out of the box will not be an exact match, requiring some level of modification (which XML allows). In some cases, the amount of customizations necessary result in something so different than the original metadata standard that there is no value maintaining the connection. At that point the IC's has to be willing to manage that metadata standard in every way necessary to support its successful implementation.
Metadata has been the promoted "silver bullet" within the knowledge management space for some time. Regardless of the underlying markup language used to tag your data or even the media format of your content, metadata is a critical component in automating the management of that data throughout its life cycle. There will always be a requirement to store or attribute additional information into or on top of the traditional content that we manage. Without metadata, the content objects that we want to create, manage and distribute are more difficult to find, assemble and link; exhibit less opportunity to process in an automated fashion; and require significantly more time for a human to interpret. Metadata helps creators and users alike embed the required knowledge about information contents into a format that is more usable and efficient.
The advancement of markup languages, database technologies and indexed retrieval systems over the last ten years have done more to advance the successful use of metadata in an automated environment than any other technology. Each of these technologies has more or less proven their ability to handle metadata independently, but now we are seeing these technologies blur together into very powerful information systems that are truly paying large dividends. While these systems may be the ultimate solution in content management, each of these technologies has a life of its own which will continue to serve large numbers of users in a variety of ways. Therefore, an understanding of how metadata works within each technology is important in understanding how these technologies can together fulfill information management requirements.
The intelligence environment of the future will be highly diverse and be characterized increasingly by demand-driven production and dissemination; organizational specialization and interorganizational coordination and collaboration; and production automation. Additionally, the future production environment envisions a high-degree of common use or reuse of already-produced intelligence and the need to keep any resultant product in digital format (or "soft copy") as long as possible until ultimate consumption. This production is distinguished by the creation of modular, reusable packets of information that, when aggregated, result in a coherent answer to a specific question or intelligence problem. The key components for the future intelligence production environment can be categorized in five areas:
Authoring and Capture. The capability to author in a markup language, such as XML, common to the IC's that will enable analysts to produce knowledge objects or analytical packets (text, figure, table, list, image) that permit a high degree of common use or re-use by and among production organizations. These knowledge objects/analytical packets will be identified or "tagged" to enhance recognition, retrieval, classification (including security), and reuse.
Content Management. The capability to integrate the collaborative creation, capture, organization, access and use of an enterprise's information assets. This includes [knowledge] bases, documents and, most importantly, the uncaptured, tacit expertise and experience of individual workers. The envisioned knowledge base for the future will be virtual, in that it will be distributed and reside both at individual organization sites and in common areas. This virtual knowledge base must accommodate common accessibility and appropriate "ownership," i.e., responsibility for the accuracy and currency of content.
Workflow/Production Control. The capability to structure and control both process and output. The capability includes the ability to implement requirement or tasking management, resource allocation, output standardization, and editing/quality control. In the future production environment workflow and production control must consider both intra- and interorganizational production of intelligence.
Knowledge Discovery. The capability to crawl and index any data repository, create intelligent relationships between terms, and build queries that return results with the highest level of recall and precision possible. The queries will focus on the mentions of concepts or topics and correlate those mentions across the knowledge base. The query inputs and the results will be presented using logical methods and analytical wizards assisting the consumer through the discovery process.
Dissemination. The capability to provide the required or desired intelligence (data, information, knowledge) to the ultimate customer in the required or desired format, by or at the time it is needed or useful. Increasingly, dissemination is transitioning from hard copy products to tailorable, demand-based electronic products capable of near-real-time update and web-based access as well as digital products on physical media such as DVD and CD-ROM.
The IC's Metadata Standards, or its predecessors, have already been demonstrated in a variety of prototype and operational applications:
Publishing Automation. One Defense intelligence production center operationally uses a predecessor to the IC's Metadata Standards standard and has done so for over three years as part of their drive to enhance internal digital production needs. Analysts and editors use an XML authoring environment to generate analytical information assets that contain rich document and structural metadata. This metadata is used to automate downstream multi-domain Intelink creation and posting processes as well as to generate Intelink crawled metacards conforming to the Intelink HTML Metadata Standard. Within the authoring environment are numerous utilities that assist the authors in creating not only valid XML, but also conforming to "writing for releaseability" guidelines, adherence to security markings instructions, and assisting with the capture of vital metadata. The return on investment in this implementation is tremendous as the organization was able to streamline the generation of Intelink posted output which helped deal with recent command resource declines. Additionally, the information the organization produces is now better written, easier to find, and presents consumers with a more pleasant knowledge discovery experience.
Concept Tagging and Discovery. Another Defense intelligence production center has also been experimenting with a predecessor to the IC's Metadata Standards and has developed not only an authoring and management environment, but also a demonstrable concept of operations for the processing and use of inline concept tags. With the assistance of a concept identification tool, the organization shows how the tagging of inline concepts can assist with downstream searching and location of information. The movement to new XML-based authoring will assist the organization's authors in the generation of new information assets while simplifying the downstream transition to paper and web products. This organization is one of the strong proponents of community-wide agreement and use of concept tagging (in both HTML and XML) to support a more relevant and meaningful knowledge discovery function.
Data and Content Exchange. A recently established partnership between three Defense intelligence production centers has yielded some exciting benefits for information exchange. One of the organizations originally created a subject-oriented taxonomy and XML model using the predecessor to the IC's Metadata Standards. This taxonomy was embedded in an XML authoring tools and used for the creation and management of weapons system characteristics and performance data. Once the information is created in XML, it is stored and then exported to a community database for sharing. The three organizations are exploring the use of the XML model as the exchange format, relieving themselves of a dependence on the monolithic community database.
Conformance to the IC's Metadata Standard will mean that consumers of IC's information will be able to understand that content's structure, meaning, and context. It also means that IC's organizations themselves can benefit from the same insights into the content it produces and exchanges through periodic collaboration or research. Exchange of the information asset metadata cards alone can assist in quickly identifying assets of importance.
A case in point was the transfer of intelligence that took place between the IC's and the law enforcement community shortly after 9-11. In at least one case, an organization received a hodgepodge of hardcopy and electronic media that wasn't well sorted or understood. The receiving organization had no choice but to simply scan the information and index it with a full text search engine. But, if the organization had received digital information assets encoded with XML according to the IC's Metadata Standards and that organization had systems that knew how to process the assets, both organizations could have easily:
Identified the producers and contributors of the information that would have assisted in the identification of experts or specialist with pertinent knowledge of the problem set.
Identified information less than ten years old and reviewed less than two years ago.
Surveyed the asset titles and descriptions looking to pare down the assets to the most relevant set.
Surveyed the coverage and subject codes looking to identify quick nuggets with the most information pertinent to a facility, geographic reference, web site or subject matter.
Ensured that no content of security classifications higher than need-to-know status were inappropriately given to the wrong parties or individuals.
Extracted people, places, and events clearly identified by the author or identified through an IC's controlled vocabulary for use in link analysis or knowledge discovery applications where the information asset itself becomes a linkage of information about pertinent topics.
With these improved methods and information standards, the work load and time taken to perform this selective syndication event would have enabled all parties to make better use of and gain greater understanding from the information shared.
The threats in our world today have changed. This requires our country, our military, and our Government to change with it in order to effectively meet new challenges. The IC's is constantly reevaluating itself at the organizational level and as a whole to find business processes that are ineffective, lacking in speed, or agility. They are also looking for technical deficiencies or new technologies that can offer enhanced services at reduced costs. September 11th simply called to light some of those areas that failed instigating one of the most aggressive accelerations of process and technology changes seen in our Government's history. The organizational results of these changes can be seen in the recent realignment of our military's command structure and the creation of the Northern Command focused on protecting our own borders as well as the call for the new Department of Homeland Security.
The new partnerships between our country's intelligence, military, and law enforcement communities are unparalleled. These partnerships are rapidly identifying intersecting business processes and similarities in content. Where similar content is found new XML standards are being established with the universal goal of making all intelligence content more structured, more isolated from proprietary formats, and more interoperable. As some of this content is in the public domain (e.g., LegalXML or GML), industry will be called on to support our Government as a valued user and leader in technology adoption.
IC MSP is just one of those new XML standards that is responsive to requests by numerous organizations within the IC's to have an IC's-wide mandated XML model to support interoperability of content. Anything less than a comprehensive interchange standard that assists in not only marking up content, but also the critical capturing of metadata, proper security markings, and hooks for local extensions would be a failure. It turns out that parts of IC MSP are also applicable to the US Military and some Justice organizations allowing a greater level of collaborative exchange of ideas and technology across these different communities.
The IC's Metadata Standards deal specifically with intelligence documents or product content being generated and shared within the IC's intelligence production and dissemination systems. These systems include content authoring, management, workflow, and dissemination components. It is desired that members of the community participate in collaborative production of intelligence content that are responsive to consumer's issues and can be reused to generate new and timely products.
The IC's Metadata Standards have taken on greater importance, specifically since September 11, as being just one of many important foundational initiatives to support greater sharing and finding of information in the IC's shared spaces. Recent IC's Metadata Standards development have focused on creating a standard that promotes enhanced metadata capture; content production automation; dissemination and delivery automation; content exchange and reuse; as well as knowledge discovery. The US Intelligence Community is poised to become one of the largest XML supporters in the world. Since there is a significant push to rapidly adopt commercial-off-the-shelf (COTS) software, the industry should be listening to these organizations and participating, where possible, in helping our Government respond to threats against our country and improving our way of life.
Air Transport Association
Chief Information Officer
commercial-off-the-shelf
Department of Defense
Geospatial Information System
Geospatial Markup Language
HyperText Markup Language
IC Information Security Markings
IC Metadata Standards for Publications
IC Metadata Working Group
Intelligence Community
Intelligence Function Codes
Mathematical Markup Language
Organization for the Advancement of Structured Information Standards
Sharable Content Object Reference Model
Standard Generalized Markup Language
Extensible Markup Language
![]() ![]() |
Design & Development by deepX Ltd. 2002 |