Keywords: Data Representation, DTD, MDA, Model Driven Architecture, Metadata, Ontology, Schema, Stylesheet, TCO, UML, W3C XML Schema, XMI, XInclude, XML, XPath, XQuery, XSLT
Biography
Martin has been a successful XML pioneer at Nortel Networks and several other companies over the previous six years. At Nortel he has been involved in the conversion of five major projects to XML-based architectures in integration, adaptation, and business-to-business communication software in Java, J2EE, C++, and Python. He is a co-editor of the Nortel Networks XML Guidelines for Network Management.
Martin has been a contributor to the International Telecommunications Union Study Group on Telecommunications Network Management since 2003, especially interested in generic and XML interfaces and business-to-business communication.
Martin has worked for 10 years on web crawlers, web servers, ISAPI, DCOM, internet services, CORBA, e-commerce, J2EE and web services. Martin is the driving force behind a model driven code generation movement at Nortel Networks which uses XML and XSLT for a development automation framework. He has experience in network security architecture and has recently completed his M.Sc. in self-healing applications for network security.
Martin has a significant background in leading expert communities to produce internal standards and in the presentation and communication of new ideas and technologies within both business and technical contexts. He is also the President of a Canadian WISP specializing in outdoor access.
The high integration costs which exist today mean that we must automate interface maintenance and integration tasks or go insane, or worse, out of business. Ongoing pressure to reduce software development costs while increasing the quality and completeness of the work provide an opportunity for the use of model driven computing. MDA (Model Driven Architecture) is a technique for model based platform independent software specification based on the MOF (Meta-Object Facility) and XMI (XML Meta-data Interchange) standards from the OMG (Object Management Group). There are a number of tool vendors using XMI (especially UML (Unified Modeling Language) drawing tools) but common use and value seem to be slow to show themselves.
Although there has been significant interest in this topic in the research community for some time there has recently been a significant growth in the commercial industry. This can be evidenced mainly in the articles and reports now being released. MDA provides a mechanism where high-level architectural model descriptions of software systems are produced using UML in a standard UML tool. Most UML tooling supports the OMG XMI standard which allows the expression of the UML software system description in a standard XML representation.
The formalism of model-based descriptions of systems and their code automation will be requirements for the progression of this industry and profession. The key questions that arise with respect to MDA then become: why aren't XDoclet, Cocoon and similar code annotation tools sufficient; why has the uptake of MOF been rapid but the uptake of XMI much slower; are there alternatives to code generation such as the use of next-generation languages; what code can reasonably be generated; and what are the best tools to target these areas?
These questions are answered in this paper. Underlying this discussion is the absolute criticality of XML and specific requirements on XML tooling for this essential area of software development.
1. Introduction
2. Background
2.1 Model Driven Architecture
2.2 Model Integrated Computing
2.3 Code Generation
2.4 Programming Language Concepts
2.5 Unified Modeling Language
2.6 Pattern Frameworks
2.7 Principles of Model-Based Development
3. So what can be generated?
3.1 The benefits of model driven development
3.2 The limits of model driven development
4. What's Coming?
5. Conclusion
Bibliography
Footnotes
MDA (Model Driven Architecture) is a technique standardized by the OMG (Object Management Group) (owners of UML and CORBA as well) in the guise of the MOF (Meta-Object Facility) and the XMI (XML Meta-data Interchange). There are a number of tool vendors using XMI (especially UML drawing tools) but common use and value seem to be slow to show themselves.
MDA provides a mechanism where high-level architectural model descriptions of software systems are produced using UML (Unified Modeling Language) in a standard UML tool. Most UML tooling supports the OMG XMI standard which allows the expression of the UML software system description in a standard XML representation. Alternatively, models can be described directly in MOF or XMI which are freely translatable. The XMI DTD and then W3C Schema have been available for some time. It is interesting to note that these intermediate storage formats (MOF or XMI) are modern mechanisms while the generation of code from UML descriptions is almost as old as UML itself.
The growing cost of integration and the immense cost of errors in software imply that the automation of code from formalized model-based descriptions of systems will be a requirement for the software industry. This is especially important due to the massive cost savings of detecting errors early in the development cycle. Why aren't XDoclet, Cocoon and similar code annotation tools sufficient; why is the uptake of MOF rapid but the uptake of XMI much slower; are there alternatives to code generation such as the use of next-generation languages; what code can reasonably be generated; where are the best cost-benefit efforts targeted in MDA; and what are the best tools to target these areas?
This paper overviews the basics of model driven architecture, model integrated computing, code generation, and alternative technologies. It then considers what is feasible with model driven design (any of a number of mechanisms of basing a design method on an abstract model which is independent of implementation and where the implementation or implementation binding is generated), the role of XML in these technologies, and the requirements that this usage puts on UML and XML tools.
MDA (Model Driven Architecture) is a technique standardized by the OMG (owners of UML and CORBA) based around the MOF (Meta-Object Facility) and the XMI (XML Meta-data Interchange) specifications. The fundamental principle of MDA is to provide insulation and platform independence of program interfaces from middleware implementations, by defining them in an abstract system model[1]. MDA uses this set of specifications to loosely define business-process specific integration points which are technology independent by basing the interface specifications on a protocol-neutral model description. The 3GPP (Third Generation Partnership Project) wireless telecommunications standards organization, the TMF (Tele-Management Forum), and the ITU (International Telecommunications Union) have adopted a similar model driven description of interfaces to provide technology independent interoperability as well. The TMF provides MOF based model descriptions for its standards.
MDA involves the specification of a Platform Independent Model which then allows the generation of a Platform Specific Model through tools which are available for a number of standard platforms. Note that this is code generation from a meta-code description (MOF or XMI). A platform here is a middleware environment such as a .NET CLR or CORBA implementation. Additionally, MDA allows extension to UML to allow for modeling of some of the infinite program aspects not captured in UML[2]. This feature of MDA is, however, heavily underused except for standard extensions, which exist for UML, such as performance modeling and security.
The MOF (Meta-Object Facility) is an OMG standard which allows for the description of an interface model in a common manner. MOF is similar in textual convention to IDL (Interface Definition Language) specification but provides more significant meta-model information. MOF is used by many MDA tools, national and international standards organizations, Microsoft, and is the basis of XMI.
XMI is basically a XML representation of MOF. XMI is not widely used but is supported for import/export of some UML tools and can be used for the specification of MDA interfaces. The problem with XMI is complexity. The principle value and reason for massive proliferation of XML is its simplicity. This is a similar story to TCP/IP, HTTP, HTML, and probably SIP[3]. This is likely the reason for the lack of proliferation of XMI; since it has created a custom mechanism for defining meta-information and extensibility in a language intended for the description of meta-information and extensibility. This is due to its inheritance of MOF expression mechanism. A more sensible mechanism for the description of meta-information would be an extension to W3C XML Schema or OWL. RDF and Relax NG may be other options for detailed modeling description.
MIC (Model Integrated Computing) is a movement, lead by the ISIS (Institute for Software Integrated Systems) Group at Vanderbilt University, which extends the ideas in MDA to go beyond simply specifying system interoperability based on implementation neutral interfaces and behavior to making the implementation independent model the core of a rich set of code generation and reuse techniques, allowing model specification to produce a fully operational system. MIC is an academic effort, restricted to embedded systems and is similar to the concepts in the I-Logix modeling tools for embedded systems. This use of abstract meta-modeling based on common ontologies and restriction to a specific application space has been the key to many successful code generation systems in the past. The first conference on MIC was held in October this year, and provided a catalyst for research in this area. The key aspect of the MIC paradigm is the use of an extensible modeling language, not restricted to the limited scope and expressiveness of UML[2].
UML tools have for some time supported round-tripping[4] without the use of any interoperable standards. One popular commercial tool recently added support for the addition of meta-information to generated code to allow re-import of code after modification while maintaining high-level meta-data. Tools such as I-Logix Rhapsody allow full scale modeling and state transition simulation of a UML model, and thus claim that working code can be produced directly from models. This seems possible in the I-Logix case since this tool allows custom extensions to UML, but whether this mechanism requires less effort and maintenance than traditional design is unclear. The fundamental issue with code generation which MDA attempts to address is how the gap between model detail and platform capability can be bridged. Successful code generation technology has always had a specific domain and has been restricted in its binding to an application specific platform. Take YACC for instance. MDA is no different, being targeted at middleware abstraction.
There are some other popular code generation technologies which deserve consideration.
XDoclet is an Aspect Oriented tool which allows for the creation of meta-tags for source code in the Java language, using javadoc. See Section 2.4 for a discussion of Aspect Oriented Programming in general. The specification of custom javadoc tags is used to embed meta-data in source code during development. These tags are then processed by a templating engine provided by XDoclet which regenerates the source code or new files. Common uses of this mechanism are to generate persistency information for objects in a J2EE (Java 2 Enterprise Edition) environment (Java Data Objects), or to generate deployment descriptors or XML configuration files. Since the basis of javadoc tags requires both the use of Java and pre-existing source code, however, XDoclet is not sufficient for model-based design where the model is stored separately or is implementation agnostic.
Cocoon is an XML server application framework which hides the complexity of managing the process/callflow of XSLT applications for you. By allowing your servlets, J2EE applications, or JSP[5]s to call transformations directly the application configuration and installation complexity is significantly reduced. The technologies available to the user are simply XML and XSLT technologies, but for web-based usage of transformation technology it definitely eases deployment as well as development. Since Cocoon simply provides a framework for the use of XML and XSLT it does not have significant value in the description of models or generation of technology specific bindings, unless we consider a future-state where application bindings can be generated from a central model repository and executed remotely.
CodeSmith is another code templating engine but based on the ASP (Active Server Pages) .NET language. This mechanism involves no abstract modeling, however, and simply provides a large toolset and integrated development environment for .NET code generation.
JAXB and JiBX are technologies for generating Java object bindings from XML schemas (or DTDs) and instance documents, respectively. Both hint at mechanisms for automatically translating meta-data into technology specific code bindings. JAXB provides tools to generate Java object representations of XML elements given a DTD or W3C XML Schema. The generated objects include methods to deserialize objects from an XML stream and vice versa. The JAXB mechanism provides for some specification of how elements from the XML world translate into classes in the Java world. JiBX, on the other hand, provides for real-time translation of XML instances to Java object instances which can then be manipulated through Java reflection[6]. JiBX also provides mechanisms for defining the mapping process to some degree. Neither mechanism allows complex mapping[7], but this is generally a non-issue for model driven design[8]. These tools, however, provide a translation of a model from one representation to another, not a specification of a number of models from a meta-model.
The testing world has recently undergone some significant changes based on model driven design including the generation of unit test cases from code (JUnit and XUnit tools) and the use of Mock Objects to allow testing of unimplemented systems from interface specifications. These system allow code generation from existing code, for a specific application, and are tremendously useful as an addition to a mechanism which can achieve the first stage of automation: from abstract model to code.
Then there is XSLT, a tool who's entire purpose is to perform transformations, but XSLT requires XML input and cannot process code. Some would argue that this is a good thing for model driven systems, since abstract models should not be embedded in the code itself but are more cleanly represented in a separate language intended for meta-description. Some would argue that models and meta-information should be stored with the code. Actually, that both are required since there is a need to have a separated model for readability and maintainability but the code artifacts should include information which allows a reversal of this process, customization and extension, and which aids in code readability. XInclude provides a superb mechanism for providing specialization and code insertion for corner cases in XML model descriptions. Pipeline approaches such as Cocoon provide some fantastic tooling for the use of XSLT in code generation. As well, the addition or customization of generated code can be done simply by the addition of more XSLT script. So why has there been no development of XSLTs for code generation from XMI? Probably because of the constrained problem space in which XMI is currently used, since there is not a large variety of targets the existing tools are sufficient for those who use them. Could XSLT and XMI be used to create MIC, probably, are they the correct mix, likely not due to the complexity of XMI. Are XML and XSLT the right solution, definitely.
What about modern programming language concepts which compete with these ideas of platform neutrality? The Java byte code representation and, more so[9], the Microsoft .NET CLR (Common Language Runtime) byte code representation allow for significant platform neutrality at the language level. The MDA concept is fundamentally to do the same thing as these mechanisms but one layer up the stack, at the middleware layer instead of the operating system layer. Thus there is a close parallel but MDA and MIC provide for a layer or two of abstraction above the operating system. The meta-data annotation provided in Java byte code in the Tiger release will be of significant interest to anyone building similar systems at higher layers.
AOP (Aspect Oriented Programming) provides a mechanism for providing meta-information as annotations to a code base and later adding functionality across the code in a consistent manner through code generation. The principles of AOP do not provide a consistent meta-modeling language though; and again AOP requires source code to start from. AOP is the most promising mechanism for bringing model-driven aspects to existing code bases but is inappropriate for new development where separated model description and meta-model information are possible. Recently James Gosling[10] said that AOP doesn't solve any problems, it takes the N by M complexity problem, inherent in object oriented programs, of mixing nouns (objects) and verbs (methods) and makes it a M by N complexity problem of mixing verbs and nouns[Gosling]. Strictly speaking, AOP is not model-driven, but can make use of existing meta-model information.
The UML 2.0 specification attempts to address the needs of MDA and SOA (Service Oriented Architectures) by adding support for some key elements of modern system design including[UML2]:
So, UML 2.0 provides many required features for modeling modern programs, a tremendous improvement over UML 1.3. UML has the added advantage that it provides a visual language for the description and sharing of models. How tools generate code from these models, however, is completely unconstrained, as is the language of expression. Should UML 2.0 tooling provide mechanisms to export workflow logic in ebXML or UBL (Universal Business Language) as well as providing a more universal modeling representation for common code generation and inter-program sharing based on XML schema we would have a truly powerful MIC platform. More importantly, if the meta-modeling approach was to define meta-information in XML, UML and other tooling could be based on application specific modeling needs, which would be standardized as W3C XML Schemas rather than requiring the retooling of applications as the meta-language evolves. In other words, if the UML tools were XML meta-model driven we would be one step closer to ubiquitous model driven development by making the modeling language extensible in a common way.
Pattern frameworks provide reusable components for common design patterns which are, in fact, the platform required by a model-driven design environment. Those frameworks implemented with templates, such as the ACE (Adaptive Communication Environment) for distributed computing or the PTL (Pattern Template Library) for complex data types, actually provide a type of code generation in the application specific binding to provided platform capabilities. The typical pattern framework is exactly the thing we are aiming for in the platform component of our model-driven approaches, yet it comes without either formal modeling or true code generation. Some would argue that a well designed pattern library simplifies development enough that code generation is not required, but in all complex systems there are repetitive tasks which have not yet been resolved into patterns, and thus only in a perfect world[11] could a pattern framework solve all our automation and simplification needs.
So why has there recently been a significant resurgence of interest in Model Driven Architecture [12] ? It is clear that the rising complexity of software and increased use of third party components (partially due to the proliferation of companies providing such components and largely due to the influence and availability of open source software) is causing integration costs to continue to increase. As well, international competition and increased market competition place significant pressures on development communities to improve productivity and quality at the same time, which are typically in opposition.
As well, patterns and anti-patterns are now a part of our culture and many frameworks and platforms (a key part of MIC and code generation) are available for free, such as J2EE application servers, databases, and CORBA ORB (Object Request Broker) implementations. As is usual in all industries, the current trend in integration and development is to move up the capability stack. This means reusing not just operating system technology but everything from databases and middleware to complete workflow and EDI (Electronic Data Interchange) systems. On top of this, the proliferation of XML and the concepts of extensibility and meta-information becoming part of the industry culture make the time right for model-based development to make in-roads into standard development practices where it has failed before.
There are three principle investments in any code generation or reuse project. First there is the development of an accurate and high quality model. This investment is required to have a clean, understandable, and supportable architecture and, although there is a cost involved, this actually decreases the total cost of a software system because the gains in quality and the early detection of architectural defects far outweighs the investment. Additionally, the use of standard tools and idioms in the description of the software model decreases the cost of documentation using traditional tools such as UML plus text. Secondly, there is investment in the code generation technology which takes a model and produces working code from it. This stage must be heavily reused to make the process worthwhile but allows far more care and attention to detail to be paid to the reused code then could be in each individual use of the code. This massively increases the quality and consistency of deployed code. Third, common libraries which act as a platform for the generated code decrease the amount of code which needs to be generated and provides platform technology. So what's the rub?
First, let us consider the benefits of model driven development. There is a significant quality improvement because of the ability to concentrate effort on code which would otherwise receive little attention. Consider a piece of code which is small and used throughout a program, this means it will receive very little attention to detail in a traditional development environment or will cause a large number of code dependencies. In a model driven design environment, however, this piece of code can receive special attention and the benefits of this extra effort will be seen across the product with the investment spent only once. This is the same as with reuse, but with model driven design the deep thought about reuse partitioning is not required, and thus the process is far more accessible. Additionally, the ability to fix bugs in many places on a single discovery significantly increases the effectiveness of testing and bug fixing. Consistency in bugs may make them easier to find as well. This centralization of distributed effort of course brings about tooling, which further reduces the costs for practitioners. This inherent reuse of technology decreases development efforts as well.
The basic need for underlying platform technology means that effort invested in the development of MDA systems can also be reused in MIC applications, an often overlooked benefit. Consistency in the description and implementation of both the models and systems is a completely underestimated aspect of MIC. In my experience this brings about such significant reductions in training, documentation, and in the cost of mental alignment that it would single-handedly make the endeavour worthwhile. Just think about how many arguments based simply on the differing terminology of two parties can be avoided by such mechanisms (a common meta-language with consistent and ubiquitous application).
The speed of initial development is definitely massively increased, once the initial investments are compensated for, but this is far less significant than the quality or consistency improvements. The increase in both efficiency and accuracy of all maintenance and testing phases has a larger impact on the total cost of development of the software product than the initial design phase.
On top of this, now the model is well documented in a common language. Documentation of models is one of the most difficult parts of many software architecture jobs. If tools are used, documentation or help information can often then be created from this model description. In a way, model driven development is fantastic since it forces designers and architects to document the abstractions they are making in the development process, which are often lost. And of course the ability to create specifications which can be easily ported between subtending systems is achieved through the separation of layers.
There are of course limits to what can be generated and these limits are defined by the two boundary points of the model definition language and platform capability. The needs of each are obviously dependent on what is being modeled and especially on what capabilities are required by the given system. Figure 1 shows how the meta-modeling elements available to the modeller allow expression in the platform through the generator.
Whatever can be both described by the model and delivered by the platform can be generated in a common way, wherever the mapsTo relationship exists. This limits code generation to anything expressible. It is important to note that only those things which are reusable should be included within a code generation scheme, since otherwise the cost of developing the generation technology is higher than the gains afforded by the model-driven system. In my experience, all user interfaces based on data manipulation applications (business applications), client/server interfaces, and service-oriented architectures can be built using model driven design, but there is a big gap between the existing modeling languages (UML, XML Schema) and the requirements of these applications.
What are the costs? The definition of a meta-model, or modeling language if one is not defined which meets the needs of your domain; the initial investment of developing a model; the cost of developing the code generation technology, where this cost is relative to the complexity of the meta-model and the quality requirements of the domain; development of the platform component; and definition of an extension mechanism which allows customization and specialization of the generated code without breaking the code generation. Most of these elements should be available over time for a specific domain (e.g. MDA), making the only required investment the development of a model, which is required in either case and its description in a formal manner provides better and more consistent documentation. In addition, a model tied to code generation and platform usage never deviates from the actual system, a drift which usually causes large software projects significant pain over their lifetime.
One can see by examining model-based code generation projects over the last 20-25 years that the advent of XML as a meta-language has significantly improved the capabilities of the modeller, since unlimited meta-information can now be included in model descriptions. The presentation which accompanies this paper presents more detail on real projects with some detailed examples. The general progression of model driven development has been database and GUI generation based on textual descriptions in the 80's, interface translation based on previous code description as wells as state machine and interface expression based on UML in the 90's, and mediation (model-to-model interface translation) as well as persistence, logic, and workflow automation based on XML model descriptions in this decade.
The key to model driven design is the expression of ontology and meta-models, something which is inherently domain-specific. This implies that the attributes which are required in the modeling language are dependent on the behavioural needs of the domain and a modeling language for use in model driven development must be extensible, but must have a common base to have the benefit of portable knowledge and reuse. These are the principles behind MIC. One of the current problems with general XML schemas is that as they deviate and differentiate across domains; knowledge of XML is no longer common, like the proprietary language extensions in the 1970s and 1980s. What we need is an XML based modeling language that is as easy to author and understand as UML, while having the excellent principles of extensibility and meta-information inherent in XML. We need a method such as an extension of W3C XML Schema which builds on the values and principles of XML rather than using archaic mechanisms expressed in a new language, as is the case with XMI. We need a common meta-modeling language which can be expressed and communicated in a better way, like UML. XML and the graphical nature of UML complement each other fantastically, one being machine readable, the other human readable. Arguably, model descriptions would be easier to write in a well defined XML format than by using graphical tools, due to the formalism of schema definition, while graphical representations are much better for sharing between people.
The key attributes that need to be expressed in the core meta-language are at least all the elements of UML 2.0, plus referential integrity and graphical display rules. XML is optimal for model descriptions for a number of reasons beyond its extensibility and meta-data capabilities, like the power of XPath and XQuery to manipulate model elements with simple hierarchical structures. Brains like hierarchical structures. Why isn't hierarchy an inherent component of the modeling paradigm, instead of just an aspect? The best part about an XML based modeling language is that XSLT can be used out of the box with minimal or no training for the code generation portion of a model driven architecture or model integrated computing system.
MDA and MIC are not a panacea of perfect solutions for everyone. The core meta-model is mostly common, even though some applications will leave out parts, but there are definitely application specific meta-modeling extensions. The platforms or libraries used in such systems are application specific. The XSLTs are application specific, binding to specific languages, middleware, operating systems, databases, etc. There are definitely pieces that are common across a market segment or an application space though, and even on single projects the savings can be substantial. I have seen projects where the code generation saved person-years of effort, but the modeling errors found during the meta-model analysis phase saved tens of person-years.
Model Driven Architectures and Model Integrated Computing are names for mechanisms of defining development processes based on formal modeling approaches. These modeling approaches benefit from code generation. Model driven design, including both MDA and MIC, is feasible, but a retooling of the modeling language to be based in a standard XML-style meta-language is required to bring popularity to these concepts. The efforts of the MDA and MIC movements require more popularity to provide more application space solutions useful to a larger cross-section of the computing industry. In addition, XSLT is a tremendously powerful and perfectly aligned mechanism for binding XML model descriptions to code. Code transformation systems and pattern frameworks add to the capabilities provided by basic model driven systems and fill out the picture. Model definition is the foundation of good design and model driven design puts that model at the forefront of the design process, greatly increasing the quality and longevity of software. Regardless of the terms used, model driven architecture and design are necessary in a modern development environment and XML technologies will underlie their success.
Abstract model here means the data, operations, behavior, and relationships relevant to a system description which allows inspection, manipulation, and partitioning of computer or business systems.
Standard UML does not allow expression of such program aspects as timing and security attributes, but more importantly such aspects as the referential integrity of relationships and complex conditional state spaces, for example.
Session Initiation Protocol is an IETF standard used for Voice over IP and multimedia communication.
Round-tripping is the ability to generate code and have changes or extensions made to this code be reflected to the generation technolgy such that consequent code generation does not eradicate the customizations.
Java Server Pages are a server side scripting mechanism based on Java.
Complex types for containment, associations, or N to M mapping for instance.
Although a significant problem for model mapping or adaptation.
More so because of the cross-language compatibility of the CLR.
James Gosling is the original developer of the Java programming language and a Vice President and Chief Technology Office of Sun Microsystems.
Assuming you consider a world with no problems to solve perfect!
XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.