Abstract
This paper discusses lessons learned from developing, evaluating, and integrating open standards into XML-based content management systems. The discussion focuses on common ideas and solutions that do not work, their flaws and consequences, and the realities of developing for and with such systems.
XML Content Management Systems are quickly becoming a cornerstone of real world business solutions. These systems integrate key XML specifications and standards with database technology and distributed internet frameworks to provide advanced content management solutions. Many vendors target specific areas of content management, such as single-source multi-output publishing, sub-document level modular storage, and web-based distribution.
The AntiPatterns identified highlight concepts, technical approaches, and implementations that have proven to be painful lessons when fitting the pieces together to make a useful XML content management solution.
Discussion includes the following issues to keep in mind when evaluating or developing XML Content Management Systems:
Common scalability issues.
Relational versus object databases for modular storage.
Common mistakes with standards in content management.
Targeting functionality specific to end user needs.
Fundamentals of system development: transactional safety and concurrency.
Making complex features usable: linking, versioning, and access controls on highly-reusable and modular content.
Want to build a better XML CMS? Want to have a successful integration of XML-based components? Need help selecting or evaluating XML Content Management Systems? Take note of these AntiPatterns and become more aware of critical problems before they become your own.
Keywords
Table of Contents
AntiPatterns have been used in the software industry to identify recurring problems with software architectures, designs, coding, and planning. The seminal work on this is 'AntiPatterns Refactoring Software, Architectures, and Projects in Crisis', by Brown, Malveau, McCormick, and Mowbray[AP]. AntiPatterns have provided a common template for identifying and describing recurring problems within the software industry that typically lead to project failure. Software analysis and design patterns have provided a significant advance in software development by providing a common framework for expressing and communicating appropriate solutions to common programming tasks and frameworks[GO4]. AntiPatterns provide similar benefits to patterns, however they focus on identifying the problems and recommending 'refactored' solutions to them.
XML and its related standards focus on data content, structures, and related processes as opposed to purely software implementations. A recurring set of problems are beginning to emerge as these standards are combined into actual implementations. XML Content Management Systems are a perfect example of the merger between the content and software worlds. AntiPatterns can and should be developed to express the common problems not only with interactions of XML and its related standards, but also with the applications of those standards that provide end-user solutions.
Several XML Mini-Antipatterns have been identified by Bruce Tate, in his book 'Bitter Java'. These AntiPatterns include XML Misuse and Rigid XML. XML Misuse is essentially trying to apply XML to solve problems ill-suited to this technology. The solution is simple or if there is a more suited solution to a specific problem, don't always try to fix it with XML. Rigid XML occurs when schemas are too completely or improperly defined and thus negatively affect their reuse in the future. To reduce rigidity in XML schemas, Bruce discusses several ways of using namespaces to keep schemas flexible for future extension including the use attributes to encode a version number in your schemas. These patterns were encountered using Java and XML to define data interfaces and establish connections between companies. These 'Mini-antipatterns' are merely scratching the surface of recurrent XML problems.
This paper will begin by discussing existing AntiPatterns that also apply to XML Content Management Systems. It will then focus on XML CMS-specific issues that highlight:
Common scalability issues.
Relational versus object databases for modular storage.
Common mistakes with standards in content management.
Targeting functionality specific to end user needs.
Fundamentals of system development: transactional safety and concurrency.
Making complex features usable: linking, versioning, and access controls on highly-reusable and modular content.
Each discussion will include symptoms, consequences, and background information as well as a solution whenever possible. Finally, this paper will conclude by pulling together the lessons learned from these problems.
As a baseline, XML Content Management Systems must start with all of the expected features of a document management system, such as workflow (with graphical editor), web portal and distributed components, versioning, basic file storage and retrieval, indexing and searching, backup and recovery, document-level access controls, and a variety of management and user interfaces.
XML Content Management Systems also have a set of features that set them apart from document management systems. Aside from an open standards-based approach to data management, XML CMS features typically also include:
versioned link management
modular documents and reuse
user managed metadata
system metatada (necessary to rebuild and associate structured documents)
publishing framework
content-level access controls
authoring tool integrations
new wave data storage
These new features have interdependencies with each other, as well as having implications on the traditional DMS features. An XML CMS must define and manage these new complexities, exposing effective management layers. Yet the complexity should be transparent to an end user, as if that user was working with a traditional DMS.
The newness of the standards being applied and integrated as well as the concepts being applied present additional risk. This risk, when combined with the lofty goals and interdependencies between the new and traditional features make creating a quality XML Content Management System a monumental task. These new standards, concepts, features, and interdependencies lay a roadmap of potential failures within existing XML content management solutions.
To provide the new features, a new wave of data storage technologies are utilized, specific to each XML CMS. The storage technologies may be a custom object database, a logical layer upon an relational database, or in some other form. Problems specific to this new wave of data storage technologies are highlighted in the new AntiPattern 'Import Beast'.
Linking and versioning go hand-in-hand. Typical XML CMS versioning gives you two options: 1. always use the most recent, and 2) use a specified version number. This allows linking and reuse to either always automatically stay up to date or alternately to specify versions of documents. Management of sets of links and reuse is largely still a dream. The new AntiPattern 'Versioned Linking Dilemma' addresses the issue of versioned link management in XML CMSs.
Modularization and reuse provide the core of 'content' management, allowing structured XML (or similar) documents to be stored, retrieved, identified, and reused at the subtree level. Modularization changes everything. Modularization affects linking and versioning, such that now versioning can be to specific versions of subtrees within structured documents. This additional expressive power enhances many authoring solutions, however the management costs are significant.
Modularization also creates problems with content-level access controls. Now per-user or per-group access rights must be accounted for not only on documents, but also on sub-documents. Reuse allows sub-document content to be pulled into other documents, typically at authoring time. A simple check-in/check-out routine suddenly becomes much more complex, as permissions for each sub-document pulled in must be checked and handled upon failure. Reuse of subdocuments that can contain reuse to other subdocuments can cause circular dependencies that must also be handled properly.
Configurable ENTITY (i.e. graphics) handling throws another level of management complexity. Now at export time, along with the reuse resolution and access control handling, entities may or may not be exported along with the documents. When that document is checked back in do the entities get checked back in with them? Do those entities replace existing ones, do they get imported as completely new files, or do they increment version numbers of existing ones?
Versioned images and content should be made available via URN/URL for simplification and synchronization between web publishing and authoring clients.
The new AntiPattern 'No Love (for ENTITYs...)' addresses current ENTITY issues within XML Content Management Systems.
Link information should be updated upon document import to switch back and forth between external system pathing and internal URN/URL exposure.
Simple document-level workflow is now much more complicated to define, create, and manage. It must account for content-level reuse, versioning, linking, ENTITY management, and access controls.
The publishing framework and distributed components must be transactionally safe and scale properly. Indexing and searching must account for both structured and unstructured (i.e. plain text or PDFs) as well as provide a configurable document-set scoping and intuitive searching navigation for transformed and published content.
Full internationalization support is expected within XML CMSs. The realities of this are addressed in the new AntiPattern 'Internationalization is Fully Supported'.
All of the additional information to make these new concepts a reality, along with user managed metadata, must be stored, typically as system metadata. Archival and recovery of documents and other CMS resources must be accounted for all of this additional information. Web Application Server components are typically installed and configured separately from the CMS repository. A comprehensive backup scheme must account for all of this. The new AntiPatterns: 'Naked Documents' and 'No CMS is an Island' address these respective issues.
All this, and an XML CMS must be cost-competitive with traditional document management systems. Quality content-level management can be achieved. To help pave the path to success, we must first identify the existing problems acrossed standards-based CMSs. We need a common template for expressing these recurring patterns of failure, so that they can be identified, remedied, and/or avoided.
The community needs AntiPatterns for XML Content Management Systems.
Many of the AntiPatterns identified in 'AntiPatterns Refactoring Software, Architectures, and Projects in Crisis' [AP], are applicable to XML content management. The template used to define AntiPatterns in this paper is a modified version of the Full AntiPattern template[AP].
XML CMS Antipattern Template:
AntiPattern Name:
Anecdotal Evidence:
Encounter: (optional)
Background: (optional)
Identifying the problem:
Related Problems:
Symptoms:
Consequences:
Typical causes:
Known Exceptions: (optional)
Refactored Solutions:
Related AntiPatterns:
The following AntiPatterns come from 'AntiPatterns Refactoring Software, Architectures, and Projects in Crisis'[AP]. Each has a brief description of the AntiPattern, followed by a discussion of how this applies to XML CMSs. The original information on each pattern is sufficiently context-neutral. The discussion below highlights how and where these AntiPatterns apply to XML, related standards, and/or Content Management Systems.
-
| Name: |
Analysis Paralysis |
| Desc: |
Too much requirements gathering and analysis can lead to problems early on in a project. |
| App: |
Too much content analysis can lead to problems throughout the project. Improper use of namespaces or developing too detailed a DTD or Schema for your content model can result in the Rigid XML Mini-AntiPattern previously mentioned[BJ]. Many operations and features in your XML CMS, including specialized views of particular documents resulting from an XSLT transformation to HTML, inevitably rely on the content model's structure, assumptions, and decisions made during content model development. Inflexibility can kill the usefulness of your data structure, and downstream changes to your content model will ripple throughout your stylesheets and many other less obvious system components. |
| Role |
Developer |
| Solution: |
DTD and Schema development must be thorough to minimize changes that will ripple through stylesheets and other system components. However, to minimize impact of future structural changes, the structures should be designed to be flexible with points of extension. See Rigid XML for some good examples and related discussion[BJ]. |
| Related: |
Rigid XML |
-
| Name: |
Architecture by Implication |
| Desc: |
System architectures that are not planned in advanced evolve as the result of coding decisions. This can result in a system with serious flaws and missing functionality. |
| App: |
Document Management Systems often serve as the core component of storing and managing data for corporate enterprises. To manage data at the content level, as opposed to the document level, XML Content Management Systems add several overlapping layers of complexity beyond mere document management. The additional features and functionality from integrating and combining standards behind an intuitive user interface can be a daunting task indeed. Add the final ingredient of the relative newness of the technology and the lack of long-term practical working knowledge of combining standards and you have a very fertile environment for 'Architecture by Implication'. Standards are constantly changing and upgrading. This requires the vendors to continually update their products to comply with the new standards, while also continuing to provide new features and interfaces. It is enough of a task to simply keep up with the technology changes, much less be preventative and prolific about the implications and problem areas of integrating these standards over the already robust and flexible feature set provided by a Document Management System. It is unfortunate, but in this environment refactoring the fundamental architecture of the system as feature sets are expanded can easily be forsaken to meet deadlines and the other technical challenges presented. See the 'Continuous Obsolescence' Mini-AntiPattern below for discussion relating to standards in a continuously changing environment. |
| Role: |
Developer, Consumer |
| Solution: |
Whether you are developing or evaluating an XML CMS, the same universal truths holds true:
Poorly architected systems will almost certainly be missing critical system-level features (and may have been developed in a way that these features cannot be provided). Unfortunately, fundamental system underpinnings are not always obvious to those evaluating a system. Several XML-related AntiPatterns are discussed in subsequent sections. They will provide some practical examples to help you recognize this type of problem. |
| Related: |
Configuration Abomination, Continuous Obsolescence, Import Beast, Swiss Army Knife |
-
| Name: |
Golden Hammer |
| Desc: |
Golden Hammer is a solution or technology that is applied to problems that it shouldn't be. |
| App: |
Bruce Tate's 'XML Misuse' is a specific instance of this antipattern [BJ]. XML is a powerfully expressive document format with many related standards that can be used to solve many, many problems. When it is applied in unappropriate contexts, bad solutions are developed and new problems are created. If data and requirements lend themselves to a table-based relational database, don't apply XML to encode it in a hierarchical structure. If a tool exists that quickly and efficiently solves a problem context, don't reinvent the wheel by solving it with XML unless you have a sufficient reason to. |
| Role: |
Developer |
| Solution: |
XML is not a solution to all your problems:
|
| Related: |
XML Misuse |
-
| Name: |
Stovepipe System |
| Desc: |
Legacy systems whose components and architectures do not fit well together lack the ability to be integrated into useful solutions. Many enterprises are stuck with a variety of these 'stovepipes', none of which meet their current needs, and very few of which properly communicate or interoperate with each other. Enterprises are starting to realize the implications of stovepipe systems and are turing to standards-based solutions to help solve this problem. |
| App: |
Properly architecting and integrating international standards into systems can reduce the stovepipe system risk. However, these will merely reduce the number of the potential problems. Lack of proper programming interfaces and the underlying architectures can still prevent tools from interoperating properly to provide real solutions. Fortunately, many vendors are also realizing that they can sell more of their tools if they make them interoperable with other vendor's tools that perform complementary features. |
| Role: |
Developer, Consumer |
| Solution: |
When designing and architecting a tool or system, make interoperability a priority. The problem of stovepipe systems can be greatly reduced by developing an open, extensible system interfaces. Fully support standards. Make the content your system outputs fully standards-compliant and open (as opposed to proprietary and hard for others to deal with). This will make your tool more marketable, more desirable, and more useful to your customers. When evaluating a new tool or system, make sure that you evaluate its Application Programming Interfaces and other points of connectivity to understand how it fits into your chosen suite of tools. Understand the extent it supports standards. Does it support the one feature you really need, or every other feature of the standard except that one? |
| Related: |
Stovepipe Enterprise, Vendor Lock-In |
-
| Name: |
Vendor Lock-In |
| Desc: |
Vendor lock-in typically occurs when proprietary solutions are used on any business-critical data problem. At some point in time, all software tools are replaced by better, faster ones. If a CMS does not make it easy to export your data, knowledge, configurations, and other business critical information in an open format that can be parsed or quickly converted into/used by a new tool, then you are stuck in vendor lock-in. When a significant amount of time, money, and effort has been invested into a particular tool or solution, vendor lock-in is increased. |
| App: |
Regardless what type of tool is under consideration, you should be aware of whether or not this tool will lock you into a particular solution, data format, or company. XML CMSs and tools that fully support standards can minimize vendor lock-in. |
| Role: |
Consumer |
| Solution: |
Full support of a standardized data format will help keep most of your data from vendor lockin. However, any deviations from standards conformance will lock you into a particular tool more and more. If a tool provides features that replace less-useful standard-conformant ones, there is a trade-off to be made. Any custom features will also lock you more and more into a particular tool. Do these custom features output their information in a standard or proprietary format? Does your use of the tool require you to rely on that tool's exclusive features or architecture to solve your problems? Some degree of vendor lock-in is inevitable on any large scale solution. However, you should keep these things in mind when evaluating a tool so that you are truly aware of the degree of your vendor lock-in and commitment. Several XML-related AntiPatterns spawn from this one are discussed in subsequent sections. They will provide some practical examples to help you recognize this type of AntiPattern. |
| Related: |
Boat Anchor, Stovepipe Solution/Enterprise, Trusting Souls |
-
| Name: |
Boat Anchor |
| Desc: |
Often costly technology components must be purchases as part of a system. If these components go unused, or provide minimal value, they are boat anchors and should be avoided. |
| App: |
No single technology or tool will solve all of your problems. In environments of relatively new technologies and standard implementations, many companies have no choice but to use tools and services provided by other vendors to complement their own feature set and meet your needs. This can be a frequent occurence with XML CMSs. A typical example of this is an XML CMS that will require installation and usage of a particular XML database or Web Application Server (for its web portal) |
| Role: |
Consumer |
| Solution: |
When purchasing a tool or system, look at the other purchases required for this tool. Each of these other purchases could potentially be a Boat Anchor that should be money better spent making this new tool meet your specific needs. A Boat Anchor can often be identified as a required tool that is still under development (for its first release) or that is required to meet a primary CMS feature that is under development. A Boat Anchor may lead you into vendor lock-in that you have not accounted for. |
| Related: |
Vendor Lock-In |
-
| Name: |
Continuous Obsolescence |
| Desc: |
Continuous Obsolescence occurs when technology updates occur so frequently that it isn't possible to keep up with and integrate all necessary changes. |
| App: |
XML has over one hundred related standards. The more standards your tool or CMS supports, the more effort (and the more complexity related with that effort) is required to keep up with rapidly changing standards. Frequently, as the number of supported standards grows, the amount of time to keep each up to date and fully supported shrinks. |
| Role: |
Developer, Consumer |
| Solution: |
Fortunately, the common base standards used by XML CMSs, including XML, DOM, SAX, XPATH, XSLT, and DTDs, are tried and proven. Tools that use these standards can provide a tested baseline without having to keep up with every new update to each one. Newer standards such as XMLNamespace, Schemas, XQuery, TopicMaps are more likely to be less tested, less understood, and require updates to the latest releases to correct design flaws as they are encountered. Consider using many smaller tools that perform well-scoped/specific tasks in your process, as opposed to one comprehensive tool that tries to meet all of your needs. It is more likely that each smaller-scope tool can keep up with the related standards to that tool, and thus all tools in your process will be up-to-date. However, there is a tradeoff. Using many smaller tools can be much more costly to integrate and develop a comprehensive solution. At any rate, using open standards-based solutions will provide more consistent value than stovepipe systems under the same conditions of continuous obsolescence. |
| Related: |
Architecture by Implication |
-
| Name: |
Smoke and Mirrors |
| Desc: |
When production quality does not match demonstrated features, this is Smoke and Mirrors. What you see is not always what you get. |
| App: |
It is unfortunate, but whether you are buying a car, a computer, or any software system for any technology, trusting persons may fall prey to a Smoke and Mirrors routine. |
| Role: |
Consumer |
| Solution: |
Critical/key features should be examined through hands-on evaluation to detect fragility. These features should be double-checked against APIs and other documentation provided to assure it indeed does what you have been shown. Hands-on evaluation also helps prevent the similar 'Viewgraph Engineering' AntiPattern in which documentation for a product is comprehensive, but doesn't match the end product. |
| Related: |
Trusting Souls, Viewgraph Engineering |
-
| Name: |
Swiss Army Knife |
| Desc: |
When a particular tool component appears to address every possible solution to every possible problem, it is likely a Swiss Army Knife. |
| App: |
The Swiss Army Knife can be a clear indicator of Architecture by Implication. Initially, a particular component may have had a well-defined architecture, that over time has been repurposed and extended without proper rearchitecting. This AntiPattern may be non-obvious due to a proprietary language that hides APIs and serious design flaws. |
| Role: |
Consumer |
| Solution: |
Look at API documentation. Confirm that there is a logical separation of functionality into classes and components. If one class appears to provide far more options and features than could possibly ever be needed, this may be a Swiss Army Knife. This may not be a critical problem. However the more occurrences of SAK in a system the more likely it is that the system wasn't well designed, thought through, fully tested, and may result in Walking through a Mine Field. Swiss Army Knifes may be hidden behind custom query and scripting languages. |
| Related: |
Architecture by Implication, Walking through a Mine Field |
-
| Name: |
Walking through a Mine Field |
| Desc: |
Software will always have bugs. These bugs can be devastating. Good testing prevents bugs. Using poorly-tested software can be like 'Walking through a Mine Field'. |
| App: |
Once again, whether you are buying a car or a piece of software, if it is not properly tested, it can have devastating results. Where the auto industry has mandated safety and other testing, software doesn't always. Many XML-related standard define testing that must be performed to verify true conformance. However, this does not extend to tools and solutions based on them. There are also no severe consequences to claiming false conformance. There is no guarantee that the software you are buying has been rigorously tested. |
| Role: |
Developer, Consumer |
| Solution: |
Ask what testing process is used for the tools you are evaluating. Ask the engineers, if possible, and listen for keywords such as test-first development, automated or unit testing, regression testing, system testing or standards conformance testing. If you do not get an answer you like, then hands-on evaluate features of the tool that you really need. |
| Related: |
Swiss Army Knife, Wolf Ticket |
-
| Name: |
Wolf Ticket |
| Desc: |
A Wolf Ticket can be thought of as a wolf in sheep's clothing. A Wolf Ticket is a product that is sold as conforming to standards, that does not fully conform and thus you may not get all of the expected benefits of that standard's features. It is not uncommon to assume that because a feature is listed in a checkbox, that it works the way you would need it to. It is also not uncommon to assume that support for open source standards implies 'full conformance' of those standards. This is not always the case. |
| App: |
The Wolf Ticket is a sincere concern when purchasing any product based soley off of its standards conformance. As discussed in Walking through a Mine Field, there are standards conformance tests. However, there is no guarantee that the product has been conformance tested. A wolf ticket can be an authoring tool with full XML support that lack support for linking standards. A wolf ticket can be an XML CMS that claims support to XML, yet will not allow the use of relative system pathing, or claims to support linking standards but really only supports ID/IDREF linking and doesn't actually use them for its default linking mechanism. Another frequent wolf ticket is a claim of full internationalization support. Refer to the new XML CMS AntiPatterns 'Versioned Linking Dilemma' and 'Internationalization is Fully Supported' for more information. |
| Role: |
Consumer |
| Solution: |
Wolf tickets are often not intentional and are not by design. It is the responsibility of the consumer to: 1. understand the 'extent' of support for each standard in a product, 2. not make assumptions (such as that both DTDs and Schemas are supported), and 3. confirm that conformance tests have been run for the key standard items that the system was purchased for. |
| Related: |
Vendor Lock-In, Versioned Linking Dilemma |
The previous section discussed how previously-defined AntiPatterns can be applied to XML, related standards, related tools, and Content Management Systems. This section introduces new AntiPatterns that have been discovered through usage of current XML Content Management Systems. These new AntiPatterns identify problems that are either unique to or recurrent across many existing XML CMSs.
This is not to say that every existing XML CMS exhibits each of these problems. However, it does provide a roadmap of potential problem areas to know when evaluating these systems. It also serves as a flag to assist developers of existing systems that these problems do exist so that current tools and architectures can be examined for these problems. If these problems are identified and remedied across the landscape of XML CMSs, the tools will become more competitive, more useful to consumers, and generally raise the level of expectation on this type of system.
XML Content Management Systems are composed of supported standards, end uses that dictate how these standards are integrated and presented, database, managment, communication, presentation, and many other layers. Each component of a CMS is suspect to its own set of recurring problems.
The XML CMS AntiPatterns discussed below are functionally grouped so that you can either read through them as a whole, or jump directly to the AntiPatterns that affect your own areas of interest. The first set of new AntiPatterns address relational versus object databases for modular storage and related common scalability issues. The second set addresses targeting functionality to end user needs. The third set highlights some common mistakes with standards in content management. The fourth set addresses fundamentals of system development, such as transactional safety, in XML CMSs. The final set addresses making complex features usable. This includes AntiPatterns for linking, versioning, and access controls on highly-reusable and modular content.
One fundamental component of an XML CMS is the underlying database technology used for storing and retrieving data. An XML CMS will typically use either an object or a relational database. How well this technology is implemented will significantly affect the overall system performance.
There are a variety of potential (and common) scalability issues caused by the database technology used. They are generalized together into the new AntiPattern: Import Beast.
-
| AntiPattern Name: |
Import Beast |
| Anecdotal Evidence: |
The import should be done by tomorrow morning, if nothing goes wrong this time... |
| Context Type: |
Performance during import and export; conversion of XML data to/from database. |
| Background: |
When an XML CMS is importing a new document a number of factors can significantly affect the performance time. These factors include: the underlying database technology used, the conversion algorithms, the number of transformations, cross-linkings and referenced meta-information for the document, partial or sub-document object creation, and indexing. Relational and object databases are typically the database technologies used. Each has its own set of unique benefits and drawbacks. Relational databases are a stable and proven technology. Many commercial relational databases exist and have withstood heavy loads and usage. Unfortunately hierarchical data doesn't necessarily map well to table based data. It can take significant effort to map heirarchical documents, such as XML into relational tables in a way that is performant and handles under load. In order to provide the transformation and storage of data at import and export time, this can be a 'serious' performance inhibitor. It is also worth mentioning that if the data storage mechanism was designed to work optimally with a particular relation database, it may be possible to use it with other relational databases through JDBC/ODBC, however bizarre side effects may be encountered due to different limitations of each one. Many companies opt for a custom Object database technology. The downsides to this are that it is a new and often non-standard database technology. It could use simple file areas to store content or it could serialized data in any number of storage mediums. As this isn't a relational database, it likely doesn't use a widely used query language, such as the Sequel Query Langage (SQL). It could be a prototype of a new standard, such as XQuery, or a custom extension language. Each has its own benefits and performance drawbacks. It is also important that you consider actual import and export usage scenarios, as opposed to single file import and export times. Consider this scenario:
Regardless of the underlying technology, attempt to make sure it is stable, reliable, performant, and transactionally safe. Finally, keep in mind that the exports also depend on the query language used to retrieve and rebuild the data. Depending on the algorithms used internal to this query language, certain operations can have heinous performance that only occur under certain, less common circumstances. An example of this could be link transformation or resolution of circular linking and dependencies on sub-document content modules. |
| Identifying the problem: |
Do not assume that import performance will be acceptable. The following performance tests will help identify if this problem exists. Compare the import and export time of XML and structured documents to plain text and binary files. Compare the import and export time of 10KB, 100KB, 1MB, 10MB, and 50MB XML documents into the database. Graph and summarize the results. Compare the import time of XML documents with a variety of characteristics, such as a large number of elements with little text vs. small number of elements with a lot of text. The goal is to identify content situations that will significantly impact performance that may not be obvious. These will vary depending on the CMS chosen. Quick test: import a 10MB XML document and calculate total time. Perform these tests with multiple concurrent clients (based off of your expected number of end users) to determine scalability affects on performance. |
| Related Problems: |
The RAM requirements for importing large structured documents can also be a significant issue. It has been experienced for an XML CMS to have a heap size growth factor of 100 per 10KB while importing a document. Severely decreased import time under heavy concurrent user loads. |
| Symptoms: |
Unreasonably large import times for structured documents (XML, SGML, etc...):
Unreasonably large RAM memory footprint when importing or exporting documents. |
| Consequences: |
The import and export times will potentially affect editor integrations and general performance of system operations that require processing of structured documents. The cost and time of refactoring the underlying system architecture is prohibitive. The end result is a system that will almost never meet end user needs. |
| Typical causes: |
Lack of knowledge or experience with the problem domain. Poor architecture, design, and implementation of data storage and related processes. Lack of extensive document testing. |
| Known Exceptions: |
If requiring all end users to have 1GB of RAM for import and typically waiting until the next day while a large group of document imports is an acceptable end user scenario for you, then ignore this AntiPattern. |
| Refactored Solutions: |
There is no one root cause for this category of problems. Likewise, there is no one solution to the causes of this category of problems. As a consumer, the solution is to evaluate the import performance using the tests above relative to your expected user situations and factor the results into which product you choose. As a developer, the solution is to identify the problem areas and refactor the architecture of your data storage system to reach an acceptable import time baseline. It 'should' be safe to assume that total import time will be acceptable regardless of what the size, content, and other factors surrounding the import of an XML document. Work to make it so, and when it is post your results with pride. |
| Related AntiPatterns: |
Architecture by Implication |
End users are often curious creatures by nature. If there is a feature or configuration option that sounds interesting, they will try it out. The promise of being able to perform or customize system features without needing a programmer sounds ideal. After all, the configuration is stored in XML!!! What could go wrong?
When targeting functionality to meet end user needs, some of the most critical user interactions don't happen in fancy user interfaces. The two AntiPatterns: 'Configuration Abomination' and 'Control Freak' identify recurring problems with XML CMS configuration and custom (CMS) languages.
-
| AntiPattern Name |
Configuration Abomination |
| Anecdotal Evidence: |
There are so many configuration options that the engineers can't remember them all and disagree on what exactly each one does. These two installations are identical, why don't they both work?!?!? WARNING - DON'T EVER CHANGE THIS LINE (Excerpt from configuration file). |
| Context Type: |
Installation and configuration layers. |
| Background: |
Document Management Systems have a fairly standard set of configuration options. Based on the feature sets, the configuration options will vary for each product. However, you can expect to see the same types of configuration options for basic versioning, data storage, access controls, etc... XML Content Management Systems are a completely different breed. Each integrated standard and each new revolutionary solution type (reuse, hyperdocuments, linking) presents its own set of configurations that will radically vary depending on the user groups that were analysed or targeted during the product's conception. When these sets of configuration options are generalized and extended for new end user requirements, the end result can be a configuration monstrosity that only its creator could love (or hope to understand). |
| Identifying the problem: |
Configuration Abomination occurs when a system either has 1) so much configuration that its confusing or 2) configuration occurs in uncommon places. The former can be discovered by simply looking at how the system is configured (text files, User Interfaces, XML documents). Do the configuration options in the installation and administrative manuals make sense? Are the configuration locations and options even detailed in the provided manuals? The configuration for each major system component should be examined. The latter can be much more difficult to detect. For each of the components, does the location of where a feature is configured make sense? If the wrong features are configured on a client, as opposed to a server, simple changes can have nightmarish side-effects. |
| Symptoms: |
The documentation shows configuration options for every possible scenario, a veritable Swiss Army Knife of configuration. Some options prevent others from working. Some options haven't worked since the last release version. End users may have the ability to configure things that they shouldn't. |
| Consequences: |
Some degree of Configuration Abomination cannot be avoided due to the sweeping scope and nature of the problems being solved by an XML Content Management System. However the consequences can be significant. Performing identical installations can require a lot of manual changes. Users can often have access to configuration items that may allow them to break fundamental system components. Everything those users do until the problem is identified and resolved can ripple through the system making it very difficult to recover the system, much less understand and track the affects. |
| Typical causes: |
Configuration is often developed and exposed as needed, as an afterthought to the primary system features. Configuration Abomination can grow to a frightening level before it is ever detected. Too often the (improper) quick-fix solution is a note 'WARNING - I don't know what this does, but don't ever change it!!! - John Doe'. |
| Known Exceptions: |
Configuration issues are an inevitability. Not dealing with them is unacceptable. |
| Refactored Solutions: |
The solution to this is to include configuration as a major system feature. As new product versions are released, a conscious effort to re-analyze and refactor the existing configuration options must be performed. Otherwise the number of configuration options will continue growing, exposed in an unorganized fashion, never to be removed, and eventually becoming a feared creature that is avoided at all costs, yet never actually put to rest. It is vital to re-evaluate the need for existing configuration options as well as their locations and permissions to prevent this AntiPattern. |
| Related AntiPatterns: |
Architecture by Implication |
-
| AntiPattern Name: |
Control Freak |
| Anecdotal Evidence: |
We have a custom language and XML configuration files, so every system operation can be performed without needing a programmer. The current functions (in our language) meet all of our existing customer's needs. You won't need anything else. |
| Context Type: |
Customization and use of advanced system features. |
| Identifying the problem: |
Take fair warning, if a product has a custom language for interacting with the system or its database that doesn't have equivalent APIs in a common programming language, it 'will be' costly to use, customize, and maintain. This is referred to as Control Freak, as all of the control in the system is constrained to the language provided. Too often these languages were Architected by Implication and have been expanded on an 'as needed' basis. |
| Symptoms: |
Unclear APIs and points of extension. 'Comprehensive' custom language missing features you don't know you need yet. Features that sound too good to be true, often are. |
| Consequences: |
If there aren't user interfaces wrapping most of the custom scripting language, you most certainly will need a programmer to make it do everything you need. If you at any point need to implement custom workflow or other more complex processes, you will need someone who understands architectures and concepts that hide complexity, provide the needed security, etc... When there isn't a solid underlying architecture that provides equivalent stable programming language APIs for key system operations, then you are in a big world of hurt. When this happens, what options do you have? You will have little alternative than to pay that company money to extend core features (that probably should have been present in the first place). |
| Typical causes: |
Wolf Ticket mentality. Ignorance to technological advances. |
| Refactored Solutions: |
As a developer, accept the technological advances of the last 10 years and expose programming interfaces and extensions in place of your custom language. Keep the custom language to hide complexity from end users, but don't hide the fact that it will take a programmer to understand and properly use your language. Refusing to do so is a recipe for failure and disappointment. As a consumer, evaluate the custom language for complementing features. Confirm claims that sound too good to be true. If it says that a programmer is not required to use the features, then regardless of who you are, you should be able to understand and use it, right?... Look for APIs and points of extension, in case you will need a programmer to change system features.What languages are supported for those APIs and points of extension? Are they fully supported, or are they fixed to particular language verisons or other constraints? |
| Related AntiPatterns: |
Wolf Ticket, Vendor Lock-in |
XML is not a silver bullet or a Golden Hammer. It is a challenge to develop a flexible and robust underlying CMS architecture that is applicable to a large group of actual end user needs and business problems. Fully supporting a set of standards, as they change over time, can be quite a challenge. Two common mis-conceptions about standards in Content Management Systems are detailed in the new AntiPatterns: 'Trusting Souls' and 'Internationalization is Fully Supported'. The new XML CMS AntiPattern: 'No Love' highlights one frequent oversight (or under-implementation) in current XML CMSs.
-
| AntiPattern Name: |
No Love (for ENTITYs)... |
| Anecdotal Evidence: |
Where are my graphics? Why don't they show up in the HTML or the Editor? There are 200GB space on the image storage drive, why is it full? |
| Background: |
Entities get no love. XML, SGML, and related standards commonly use ENTITYs to reference graphics and images in documents. Each authoring tool has its own configuration for locating images. If that tool and its configuration isn't integrated with the CMS's repository to either auto-import/export referenced ENTITYs or that repository doesn't expose images via a URN/URL, then the editor cannot retrieve those images for viewing. If the editor allows you to open XML documents, yet uses its own proprietary format that allows encoding of images into the binary of those documents, then exports back into XML can have a mix of results that turn an author's job into a nightmare. If the XML CMS has a web portal, similar interface, or transformation of XML content into HTML or another format, then ENTITYs cannot be overlooked. If the CMS's repository does not expose he images via a URN/URL, then the HTML (or equivalent) cannot retrieve those images for viewing. In either of these scenarios, images are required to be stored in a file location available to all authors and web clients. The XML CMS has a versioning repository. Why shouldn't it control and version the ENTITYs as well as control their external availability? If the repository does indeed import entities and make them available via a URN, your problems still aren't all solved. When you import new versions of a document or other documents that reference the same graphic, how does the repository let you configure this? Will it import a new graphic file every single time? You might be suprised. Once the system has imported graphic ENTITYs associated with many documents, what facility does it provide to manage the references? This can present complex identification referencing issues similar to those of hyperdocument management and is often overlooked or left to solve later. After all, customers purchase XML CMSs for its XML handling, who's going to doublecheck how it handles and manages images... |
| Identifying the problem: |
This AntiPattern represents a set of related ENTITY problems. The underlying cause is typically the fact that ENTITY management is not a key selling requirement, and can therefore be under-analyzed or overlooked. Proper use of XML standards allow for separation of content from formatting. This also provides separation of XML text content from image and graphical content such as JPG, GIF, TIF, and BMP files. While many XML CMSs do a very thorough job of managing and processing XML text content, oftentimes there is insufficient or no management for the ENTITYs referenced in these documents. The problems that have been experienced are:
|
| Related Problems: |
Authoring tools 'new' image reference id generation not configurable. |
| Symptoms: |
Poor or missing ENTITY management interface. Editor tool not able to reference images in repository. HTML portal view not able to reference images in repository. New image imported each time you import the same XML document. Inability to reference same graphic object from multiple XML documents. |
| Consequences: |
Lack of sufficient ENTITY management, both at import/export time and while in repository, can lead to a new image being imported for every new version of the same document and every other document that should be referencing the same images. This can lead to very poor disk usage and thus quickly running out of storage space. Lack of URN/URL exposure for locating and retrieving images from the repository will cause additional and often significant management overhead for images in authoring tools, transformations, and web portals. This in turn can lead to either missing or wrong images showing up in the wrong documents. A series of related problems making authoring end users quite unhappy. |
| Typical causes: |
No image URN exposure for images from repository. Poor or non-existant import/export time configuration for ENTITY reuse/referencing. Export of related graphic ENTITYs at XML export time. The underlying cause is typically the fact that ENTITY management is not a key selling requirement, and can therefore be under-analyzed or overlooked. |
| Known Exceptions: |
This AntiPattern may be ok if it is understood that all graphics, images, and other ENTITYs will always be managed externally to the CMS. Keep in mind, if this isn't managed in and by the repository, it may lead to:
|
| Refactored Solutions: |
An XML CMS should provide quality ENTITY management. It should expose files via URN/URL or similar to allow referencing from authoring, web, and other clients. It should include configurable options for image import and export of ENTITY references for single XML documents, subsequent versions of single XML documents, and multiple different documents. These options should include: replace existing image, increment existing image versions, and import as a new file. The CMS should also provide reference configuration changes for files already in its repository. |
-
| AntiPattern Name: |
(The) Trusting Souls |
| Anecdotal Evidence: |
Standard X is fully supported...except for feature Y. |
| Background: |
As the number of supported standards in a CMS increases, the amount of time spent keeping each up to date and fully supported decreases. It is inaccurate to believe that every aspect of each standard 'supported' in a CMS is fully supported. Every detail of each standard will not be implemented. Some custom features will replace less useful standard components, such as replacing non-directional associations in Topic Maps (XTM). In some cases it is simply not possible to implement every detail of each standard. Certain aspects of a standard may not relevant to the context it is being applied. It is advisable to know the extent of the support for each standard supported by a XML CMS...and avoid being one of the Trusting Souls who gets burned. |
| Identifying the problem: |
Identify and test the standard features that you are truly utilizing. Linking support and more advanced standards are frequently less supported than the basic ones. Beware of features that are still being completed that will utilize the standard items that you need. |
| Related Problems: |
Assuming Unicode and other internationalization support is complete for the system features you will be utilizing. |
| Symptoms: |
Lack of support will become apparent when you actually try to use the features in question. |
| Consequences: |
The few key standard items that you really need may not be fully supported. |
| Typical causes: |
Lack of time. Standard item not useful in the context it is applied. Conflicts between standards and or system features. |
| Refactored Solutions: |
Scan the provided documentation for warning signs, such as use of relative SYSTEM paths in documents will break IMPORT/EXPORT mechanisms. You may (and very likely will) be suprised at the extent of standards truly supported by the tools you are going to use. If you don't know, find someone who knows what to look for. |
| Related AntiPatterns: |
Internationalization is Fully Supported, Vendor Lock-In, Smoke and Mirrors |
-
| AntiPattern Name: |
Internationalization is Fully Supported |
| Anecdotal Evidence: |
Internationalization is Fully Supported by this tool (insert laughter here). |
| Background: |
Support for Unicode does not imply full support for internationalization. Unicode is only one aspect of true internationalization. Internationalization also has affects on how documents are searched, processed, rendered for view, and many other XML CMS features. Each XML CMS has a separate set of features. Each feature must fully support internationalization for the system itself to make this claim. The standards that form the basis for the existence of XML Content Management Systems fully support Unicode. Many of the underlying parsers used in existing XML CMSs fully support Unicode. Unfortunately, XML CMSs themselves do not. Part of the problem is that (to my knowledge) the required criteria for 'full support for Unicode' in an XML CMS is not formally defined. Another aspect of internationalization support relates to either authoring tool integrations or publication to HTML (or other formats) for a web portal. Some unicode languages display right-to-left, as opposed to left-to-right. Some have other unique heuristics that determine display characteristics. Then there are indexing and search engine support issues, pattern matching and document processing issues. Perhaps an XML database alone could make the claim of full Internationalization support as of today, but a comprehensive XML CMS solution could not. The extent of Unicode and other Internationalization support in existing XML CMSs seems random at best. |
| Identifying the problem: |
Create two sample XML documents. Populate one with a full range of UTF-8 characters. Populate the other with a full range of UTF-16 characters. Import each document. Export each document. Compare the files for equality. Test various features of the authoring integration. Test the publication to web features, to the characters appear properly. Test the indexing and text search engine features. Do they let you enter special Unicode characters in the search strings, much less search for them? |
| Related Problems: |
User interfaces that perform dynamic conversion to other languages have been experienced to miss some less-used UI screens, or fail to convert all text on a display to the appropriate language. |
| Symptoms: |
Seemingly random side-effects and failures related to Unicode character content. |