Abstract
JAXH is the latest Open Source Java(tm) API to facilitate a java programmer to ability to work with XHTML / HTML. Well formed or not, JAXH has been integrated with a java version of the W3Cs Tidy tool to ensure well formedness. JAXH was designed and created to be a lightweight, non-proprietary API for java programmers to intelligently work with the HTML source and work in complete parallel with the graphic/web designer. Several studies / graphs exist to support this separation of labor. Being a pure XML and HTML API, JAXH eliminates the “third language” effect found in all web delivery systems today including JSP and XSLT. JAXH is based on both SAX and DOM interfaces to insure maximum usability and portability. This paper describes how JAXH facilitates the MVC paradigm and is the logic progression of web and data delivery using XML.
Keywords
Table of Contents
Ten years ago the Graphical user interface (GUI or client server) populated the landscape of computer desktops. Developers would design a GUI and then write the underlying code to support this interface. The web complicates this scenario increasingly as the GUI and the underlying code now exist in different locations and are potenitally built by two different people with completely different skill sets. This non-programmatic approach to the delivery of the web presentation layer is partially due to the proliferation of browsers lacking structure enforcement on the input, quickly evolving standards, and competing standards. Ultimately, the perception is that web pages are sequential in nature. HTML, as with any subset of SGML, is based on an object model. This Object model generally is ignored for favor of style. This view of web pages being sequential is much like the comparison of procedural languages and object-oriented ones. As it is possible to use an object-oriented language much as you would use a procedural one. You will not see any factors of improvement though. Put very simply, what is missing from the current web development area is code that can interpret and manipulate HTML. This move is away from templating, scripting, and sequential access to more of a page layout, Object Oriented and an event model paradigm. By comparison, this is similar to a client server model, the HTML is the UI definition and code is needed to process and the manipulation of the widgets. The Markup language should not be the end of the life cycle; you should be able to round trip your HTML if well structured. The UI should exhibit loose coupling with the controller. There should be no ties, in other words the UI should be portable, and the controller should too. Ultimately the goal should be that the application developer will have full control over the source of the code and that of the HTML. Through these practices you will see a higher level of reuse in a previously non-reusable area. Granularity is another important point to note. The granularity should be such that a very low level of manipulation is possible and more complex structures can be created doing such.
The presentation layer and the code that controls it should be two distinctive parts of the composition of any applications' source code. One popular practice for achieving this type of separation is called the Model-View-Controller or MVC. (See Fig 1.1) MVC is an arrangement of objects in such that they have certain unique relationships and contracts with each other. The model is the representation of your data, the view is any one of many ways to look at that data (e.g. user interface), and the controller is the glue or strings that tie them all together. MVC make long-term program maintenance much easier. For instance, if there is a problem with the data, the model changes. If the model and controller were intermixed, then a model change would also signify a controller change, thus endangering and exposing more code in the process. MVC is much more than just saying that what the user sees is the view and the data is the model, it is the relationships that make it continue to work. Any applications' use of the MVC should be a passive one. There are instances in which this discipline is not needed or desired. This flexibility empowers the developer to make these decisions. This use and decoupling will contribute later to the higher levels of efficiency and specialization within your organization. MVC is just not about arbitrary elements of the architecture that play certain roles. It is also about the proposed interaction between them. Many of the MVC models seen on the market today appear to be more of an M(VC). Where there is clear separation of the Model, but the devising line between the View and the Controller is not as clear. This can be seen especially in the development of JSPs. Ask yourself the question; if you are developing a web application, HTML is your view, what is the controller?
HTML is the unavoidable choice for web applications, being that the target is a small set of web-browsers, and one other is needed for processing. What is generally seen is a third language to make this connection, generally in the form of a templating system and “tag-based” system. This third language usually tends to serve as a metadata language or a communication conduit between the multiple parties contributing to the design of the web application. These combinations of languages create proprietary linkages and a maintenance issues for years to come, not too mention configuration management ones as well. If at least two parties are trying to access the same resource at the same time for different purposes, then you have a contention over that resource causing a limitation in the ability to perform true parallel development. These third languages also pose another problem, as they typically do not work within in industry standard toolkits or integrated development environments causing the further dilution of the architecture though proprietary tools and development systems.
When the web was born, there was one way to publish content. HTML, plain text, data, and other media files were placed on a server and were accessible to those who wished to see them. One language was used to make these documents visible; Hyper-Text Markup language. Seems simple enough. As the web “matured” stale, static content was not enough for the end users data accessing needs. CGI or the Common Gateway Interface was born. This introduced a second language in which was needed to see this data and these documents. Usually a C or Perl was utilized to bring this to the web and all was well. As bandwidth and user demand for more engaging content increased, users became dissatisfied with the lack of design and interface. This introduced a problem in itself that the Perl and C developers were not prepared handle and for the most part did not care to. As you try to coordinate more than one mind on just one screen of information, the simple formula for publishing web information becomes more complicated. Now different individuals with different needs have to be satisfied with our language choice.
Enter Java. Java enters and initially has servlets to offer. This does little to solve the issue aside from that of speed, portability, flexibility, and extensibility. Soon afterward, Sun then introduces JSP, based roughly on Java and HTML; JSP makes a bad situation much worse. JSP is the third language effect. HTML plus Java and JSP makes the third language. Many will argue JSP is not a language, so given the questions, if you know HTML, do you know JSP? And, if you know Java, will you know JSP? The answer to both of these is no. This third language was born out of need and seemed to band-aid the problem only temporarily. This third language, not necessarily JSP, loosens the original purity and true capability of both languages. Additionally there is no architectural, performance, or otherwise technical benefits of JSP. Other examples are ASP, Cold Fusion, Struts, etc. Some may argue that if you do take the time to learn these multiple languages that efficiencies can be gained. Unfortunately these efficiencies are lost for two reasons; first as you bring on more developers, and they too must learn propriety APIs. And secondly these APIs may not have the staying power necessary to be here and supported two and three years from now. Quite simply, Java works for web development, and very well. Why introduce any more than is really needed to accomplish the end mission.
JAXH is as much a development methodology than a concrete implementation of code. And while this reference implementation is done in the Java language, it could very be simply ported to other languages and platforms. Knowing and understanding the weaknesses of the other web delivery strategies, we can really formulate a clear picture of what JAXH has set out to solve. Remaining pure to the native languages of choice allow speed the time to market dramatically and ensure the procedural longevity of the solution being developed.
The most interesting problem that is solved here is the separation of labor. This activity is accomplished by giving people involved in the process their respective unchanged environments in which to develop such as Dreamweaver or IBM VisuageAge for Java. The close coupling or relation of graphical and programmatic labor is quite possibly the largest problem plaguing web-based development and delivery of applications. This challenge continues to intensify as the demand for both more interactive and more aesthetic increases. These forces are driven by higher computing resources, higher bandwidth, and evolving browser technology.
JAXH specifically knows how to manipulate internal HTML structures and is able to identify specific instances of them via standard HTML attributes. Specific methods support or should support any possible manipulation that should need to be done by the java developer within reasonable limits. Methods should be intuitive to java programmers while they are not designed to fabricate HTML from just java, e.g. Kona or Jakarta ECS. Frameworks often throw up an expensive learning curve that must be overcome before such frameworks are to prove of any use. As stated before they also often put a ceiling on the level of granularity that can be achieved with a given system. For these reasons, JAXH is not a framework in the instance in which frameworks should be avoided. However, JAXH will fit very nicely into a framework as the actual content delivery portion. Frameworks have their place within any type of system development, however frameworks should handle much more of the controller portion of the Model-View-Controller triad than the view. For example the pre-population of a form is accomplished directly from the domain or model object through the reflection API. The end result HTML or XHML will pass through a WC3 validator. This union of jTidy, a java implementation of W3C’s Tidy by Dave Raggett, and JAXH accomplish ensured support for validation. Tidy, acting as a preprocessor on an as needed basis, will ensure that the input source is valid XHML input source. This allows both a great degree of cerainty and confidence in the end result and it also serves to allow the designers another layer of abstraction from knowing and understanding validation.
By processing a document object model (DOM) representation in an HTML-intelligent method, JAXH allows for the post-processing of HTML. (Fig 1.2) However, this post-processing still occurs before it reaches the end-user browser. Is JAXH simply an extension off of the existing DOM paradigm? Yes and no, mostly no. By placing the document, for lack of a more concise term, into this form of internal representation this serves two distinct purposes. First, this always allows the ability to have access to this model at all times whenever custom or specific one-off granularity is needed. Within most frameworks, the granularity ends where the implementation of the frameworks ends, conversely where JAXH ends, the DOM begins. Secondly, it allows a standards based way to perform work on the target. By not implementing this portion, it allows JAXH to benefit from the advances the work on the DOM. Many of the purposes used completely abstract these concepts from the end user or consumer of the document. Messages, methods, object implementations, and interfaces are available to support working in a more natural from. The messages that are implemented stress a high degree of configuration, extensibility, and reuse. Through the reorganization of smaller patterns within the HTML, JAXH aides the replication of these smaller structures through repetition or into more complex ones. The interleaving of business logic into the design features of the page is directly supported and really the key nature of the API. According to the Gamma, Helm, Johnson, and Vlissides in Design Patterns, this allows us to move away from hard-coding large amounts of fixed behaviors to defining a much smaller amount of fundamental behaviors. Though the use of looser coupling, objects become easier to reuse and out bring them out of isolation. Another advantage is that object can lessen their dependencies allowing for larger systems without creating a mass of tightly strung cords in great tension.
![]() ![]() |
Design & Development by deepX Ltd. 2002 |