Abstract
When the Universal Business Language (UBL) mandate was first created in October 2001, subcommittees for many areas began describing an international language of business documents in XML vocabularies. Sufficient for computer-to-computer interchange, the visualization of instances of UBL vocabularies for human readers was not included in the original mandate. The Forms Presentation Subcommittee (FPSC) was founded in March 2003 as the first new UBL subcommittee, created to address the needs of those visualizing the information in UBL documents. The committee is responsible for specifying and documenting the mapping of UBL instances to visual presentations (e.g. web and paper). This lecture presents how the committee has achieved this objective in a technology-agnostic fashion such that both proprietary and non-proprietary technologies can be used to render visual presentations consistently.
Keywords
Table of Contents
Someone asks you "write me a stylesheet for this schema" ... what is your next step?
You've just been tasked with creating a transformation to express the content found in an XML document modeled by that schema. Your transformation rearranges the #PCDATA and attribute content found in instances of your vocabulary into the #PCDATA and attribute content of the target vocabulary suitable for visualization.
But visualization is different to many people. When the stylesheet writer answers "how should I visualize this information?", what safeguards are there that indeed that is the kind of visualization needed to fulfill the original request? Many aspects of visualization are:
what information belongs in the presentation?
how does the information need to be changed when being presented?
how in the presentation does the information fit?
what does the information look like when presented?
How will you maintain your formatting specification and stylesheet in the long run? Are the stylesheets directly producing the end result in the target vocabulary, or is it necessary to divine some abstractions and make the stylesheet generation more automated or even perhaps restrict the variability in the presentation by applying constraints to the abstractions?
By December 2002 the Universal Business Language (UBL) project had sufficiently developed to begin looking at the visualization of instances of UBL document models described by schemas. A number of issues that were not in place for this to happen quickly came to light. A strategy was adopted and deployed to quickly address the needs for UBL version 0p70 delivered in March 2003. This strategy was again used, essentially unchanged, in the release of the set of formatting specifications for release in UBL 1.0 in November 2003.
This paper attempts to address two perspectives of the writing of formatting specifications:
an overview of how UBL approached the need to express presentation requirements
some observations and suggestions for how you might consider expressing and meeting your own presentation requirements
The objective of the development of the Universal Business Language [UBL] is nothing less than creating a global, vendor-independent and open set of XML business document formats with the following characteristics:
being based on UN, OASIS and W3C specifications
being closely aligned from inception with ebXML and ebXML Core Components
developed in a completely open, publicly accountable OASIS standards process with a limited life span
designed for compatibility with existing EDI systems, legal frameworks and patterns of trade
being specifically focused on B2B rather than internal application integration
intended for the exchange of legal documents, not just procedure calls
being both human-readable and machine-readable
being non-proprietary and royalty free
Web-based Business-to-business (B2B) addresses the same objectives of EDI-based B2B, but using open and non-proprietary standards for free adoption and deployment. See Figure 1 for a comparison of the design components in these two approaches.
The first three years of development in Phase 1 have culminated in a set of UBL 1.0 Schemas to be released November 1, 2003. Phase 2 begins by addressing the Context Methodology for adapting UBL documents in different vertical industries and deployments. In March 2003 in the last year of Phase 1 the Forms Presentation Subcommittee (FPSC) was formed to address the need for formatting specifications.
In December 2002 the first public release of a review set of UBL schemas was imminent. These schemas allow for machine to machine conveyance of information in a structure known and accessible by applications running at both ends of the interchange.
However meeting the objective of expressing the information in an accessible XML form, "human readable" in XML does not necessarily mean "human legible" or "human presentable". That someone can look into an XML instance and see the labels being used on elements and attributes doesn't necessarily make comprehension of the information easy. With effort one can piece together and relate the different components of an XML document by their element and attribute names
The author, having had some experience in XSLT and XSL-FO, enthusiastically offered to "write a few stylesheets" to present UBL instances. The chair of UBL enthusiastically accepted the offer and awaited the opportunity to begin visualizing the information found in UBL instances on both pieces of paper and in a web browser.
But the focus to date had, necessarily, been on the semantics of the information components and the structures of the documents in which these components are conveyed. It turned out that no-one had yet taken the time to think about what information that was obviously important to a computer in the exchange (or it wouldn't have been in UBL in the first place) was going to be important to the human reader. And if important, how is it supposed to look?
But business documents have been around as long as there have been documents (some consider the very first human writings were in fact inventory records) and today's business environment already successfully runs on the back of the interchange of paper-based forms. No-one in UBL had had the time to think about how these internationally accepted documents would be populated with information found in UBL instances.
The obvious source of internationally accepted business forms is the United Nations Centre for Trade Facilitation and Electronic Business [UN/CEFACT] in which there is a standardized document layout for dozens of different business documents. See Figure 2 for an example purchase order document.
These documents based on the UN Layout Key are so very prevalent in shipping ports and import/export offices worldwide that knowledge of the language of the labels of the fields is irrelevant. Dock workers and office staff alike can all access the information in the printed form by its standardized location and presentation. The business information is being visualized in an effective and widely deployed standardized arrangement.
The two documented example scenarios for orders and fulfillment in UBL 1.0 are an office stationery order Figure 3 and a joinery (or DIY/home repair) wood order Figure 4.
Members of the UBL business semantics team (the Library Content Subcommittee) conceived of the scenarios and anticipated example layouts for the sample information. Thinking that was all that was needed the rest of the task was left to the stylesheet writer, but unknown at the time there was a missing piece that prevented the stylesheet from being written.
Faced with sample instances of XML and samples of layout, the essential missing piece for writing stylesheets was the mapping of "what goes where". What was missing was an understanding of the business semantics behind the UBL instances and the business semantics behind the standardized or conceptualized layouts that were desired.
Moreover, solving the problem just for stylesheets would disenfranchise a host of other people or companies interested in visualizing UBL information using proprietary technologies or other non-stylesheet open standards such as X-Forms.
To address these shortcomings, the Forms Presentation Subcommittee (FPSC) was formed with the following charter:
To liaise with standardization organizations responsible for paper-based business commerce forms regarding evolving requirements for the presentation of information.
To rapidly develop and document formal technology-agnostic (i.e. independent of any particular presentation technology) Formatting Specifications as interpretations of internationally standardized or otherwise available paper-based forms for the presentation of UBL documents suitable for the human reader.
To foster implementations of these interpretations through coordination, guidance and responsiveness to queries, in order to test the viability of these Formatting Specifications using different technologies in real-world scenarios.
Possible deliverables for FPSC were initially seen as:
formatting specification guidelines
principles of the development and use of the library of formatting specifications
principles of the presentation of UBL information
catalogue of known implementations
Office-oriented example formatting specifications
Joinery-oriented example formatting specifications
United Nations UNECE aligned Trade Document layout key formatting specifications
Other scenarios requested by the Library Content Subcommittee for sample instances
It is specifically not in the charter to develop stylesheets. These objectives and deliverables would specify sufficient detail of the mapping of UBL constructs to the sample visualizations to direct implementers of any proprietary or open presentation technology in their development of programs and products.
The bulk of the work done by the FPSC subcommittee is from the three voting members: an expert in UBL business semantics, an expert in the UN Layout Key, and an expert in XSL technologies. Together these three pulled together the necessary information from their different respective perspectives to produce the formatting specifications found in UBL 1.0.
Are you addressing a legacy requirement where layouts have long been established and your audience is expecting information to be as it always has been? There may be restrictions in the presentation technology that prevent some legacy requirements from being met. There are some limitations of XSL-FO 1.0 that would prevent a number of legacy needs from being matched:
e.g. line numbers on pages (often used for committee editing)
e.g. synchronized marginalia (often used for special effects or for indicating security levels)
e.g. multiple flows of information (often used in journal publishing)
Has all of your legacy information been converted to XML? One problem the author has come across is a customer would supply a mockup and their converted XML files and I could find a number of items in the desired presentation that could not be found in their XML. What was obvious to them was inadvertently overlooked and it took an outsider to point out that information in their desired reports was not being supplied.
What possible new features are available in today's publishing technologies that were not available or too difficult in legacy systems?
e.g. hyperlinking
e.g. multiple tables of content with different perspectives into the body of the work
As with UBL, you may be in a situation where you are in a cross-industry environment or perhaps you have the need to be unbiased regarding deployment. In this case it is important to specify the requirements in a technology-agnostic fashion, so as to not prejudice any particular implementation approach or product.
When deciding how to address the presentation requirement, many issues need to be considered, including in which target environments the information will be displayed, who is going to maintain the presentation, what performance requirements need to be met, etc.
The traditional approach of writing one stylesheet for each independent result may not be the ideal use of resources or the best plan for long-term maintenance of a suite of stylesheets. This replication tends to happen when there are different result vocabularies such as using HTML for a browser an XSL-FO for print, or when there are different layouts for the same vocabulary such as a family of related reports all based on a common collection of instances.
The author has found two opportunities to skip parallel development and to leverage developments to produce multiple results. These have become models in a number of projects based on the successes documented here.
For UBL release 0p70 Crane Softwrights Ltd. undertook to develop XSL-based visualizations of UBL instances for both paper-based and HTML presentations. Ambrosoft undertook to supply a compiled Java JAR file of the implementation of Crane's UBL stylesheets. Being made publicly available at no charge, it is hoped these implementations would be useful models for those unfamiliar with stylesheet technologies. These implementations were also able to vet the formatting specifications.
Two output formats were chosen for the library of stylesheets: PDF files (via XSL-FO) and web pages (via HTML/CSS). The initial thought was that two separate stylesheets would need to be written.
However, to ensure the web presentation had the identical content as the paper presentation, a serial process was adopted that first produces XSL-FO, and then produces HTML from the XSL-FO as a second step. A publicly available stylesheet fo2html.xsl from FO vendor RenderX is used to create an HTML instance from an XSL-FO instance.
Looking at Figure 5, each layout had a master stylesheet written for XSL-FO using A4 page dimensions. This is indicated at the center left of the diagram, and the results of producing paginated output in the lower right. The master stylesheet is utilized by an importing stylesheet, shown in the lower left, that expresses page dimensions in US-letter values, thus producing a parallel set of results for the differently sized paper. The master stylesheet is also used by an importing stylesheet, shown in the upper left that overrides some of the properties and values to tune the result for HTML transliteration, but isn't, in itself, an HTML stylesheet. The HTML result comes from running the publicly available stylesheet against the intermediate XSL-FO result that has been tweaked by the HTML reordering override. This override is quite minimal, primarily to rearrange rotated text supported by XSL-FO into un-rotated text for HTML.
This is a critically important decision at the start of a project, and may impact on the writing of the formatting specifications. Consider a typical approach of writing as many stylesheets as is needed for individual document or report layouts, one stylesheet per layout. The author was approached with this request by a customer who had implemented an XML export of their legacy information into structures suitable for the production of a family of reports. A total of five to seven stylesheets were requested before the analysis began. The anticipated information flows are shown in Figure 6
During the analysis, it was determined that one particular company representative was responsible for the layouts and was often in the position of having to tweak output requirements based on changing needs from customers. The company was prepared to enter into a long-term maintenance arrangement after the writing of the initial stylesheets, expecting and planning to accommodate the tweaking involved using the independent contractor.
A formatting information flow was conceived that abstracted each of the report component styles and sections into a structured representation of the report using semantics the customer was familiar with, rather than the semantics of XSL-FO that ultimately would be used in the production of the resulting reports. A single set of stylesheets interpreted the custom semantics into an XSL-FO stylesheet that was then applied to the legacy data in XML, as shown in Figure 7.
This atypical approach eliminated the need for long-term stylesheet maintenance. The representative responsible for the content of the reports now edits and maintains the XML abstractions, and the one stylesheet synthesizes the stylesheets needed for production. No or minimal support is needed for the synthesizing stylesheet and the representative can tailor their needs (within the constraints of the abstractions) without addressing the XSL-FO.
It is important for the stylesheet writer to know the details of the target and for the final user to know the limitations of what they have requested. This should be considered as two distinct steps.
We have found that many of our customers are familiar enough with office productivity tools to use a word processor to mock up their needs indicating required margins, font sizes and appearances, text display areas etc. The stylesheet writer can, in turn, utilize the office tool to examine the nuances of the layout through the tool's user interface.
Such a mockup should include both boilerplate text and graphics as well as candidate processed content that is found in the XML source files. This obvious, but not always considered by the customer, step has often identified internal issues that need to be addressed by the customer before the stylesheet writer is involved. Each and every item being rendered in the result needs to be either characterized as background fixed boilerplate (perhaps algorithmically determined, in which case the algorithms will need to be spelled out unambiguously) and information obtained from the XML source.
If using a word processing tool is not an option, then at the least the customer should be able to draw a mockup, even by hand if necessary, to convey the general placement of information. Unless the customer is paying the stylesheet writer to also do graphic arts, it should not be assumed that the stylesheet writer implicitly has the know-how or talent for laying information out on the page in an effective or pleasing fashion.
In the case of the UBL work, the three scenarios were addressed a bit differently. Example layouts for both of the office and joinery scenarios were created by a UBL team member/subject matter expert and supplied to FPSC for realization. The UN layouts were obtained from the official source, but it was quickly determined there were three document types in UBL for which there were no document layouts already blessed by the UN committees. The committee has undertaken the appropriate application process with the UN to allocate official document numbers for these three document types. By doing so, the numbers will be reserved from use in the future by other organizations and the work of UBL will not be ambiguous with other international efforts.
The stylesheet writer has an important step of creating the prototype rendition using the target vocabulary. This isn't just an academic task as it fulfills two important needs: determining that the target vocabulary can, indeed, accommodate the customer needs, and also what particular properties and facilities of the target vocabulary are needed for each presentation item.
When the target vocabulary is either HTML or XSL-FO, it is possible to hand-edit a prototype layout without having to develop any stylesheet logic. Many developers are unaware that XSL-FO can be hand-edited and then processed directly by an XSL-FO engine, or at the least processed with an identity stylesheet.
In the case of the UBL work, XSL-FO layouts were hand-edited for each scenario and the fo2html.xsl stylesheet was used to see the processed HTML result of the XSL-FO. This step revealed accommodations that were necessary in choices available that were not absolutely necessary for the XSL-FO but improved the HTML transliteration.
It is necessary to know the precise locations of the XML source file components that show up in the layout. XPath is the W3C Recommendation for constructing syntactic addresses of components of an XML document. This syntax is based on a data model for all XML documents, regardless of the vocabulary. Examples of XPath addresses used in the UBL project are:
/da:DespatchAdvice/cat:IssueDate
/da:DespatchAdvice/cat:OrderDocumentIdentification/cat:Value
/da:DespatchAdvice/cat:OrderDocumentIdentification/cat:Value/@language
Note how the XPath address utilizes namespace prefixes. In XPath 1.0 a non-prefixed name is, by specification, only a name without any namespace URI. All names from namespaces with namespace URI values must be prefixed in an XPath 1.0 address. Prefixes not normative but in UBL it was decided that they should be consistent within all of the documentation for a given project. An effort was made late in the project, after a number of documents, instances and schema fragments had been written, to establish a project-wide convention and enumeration of namespace prefixes for all namespace URI values. These prefixes are not normative and are not attempts to standardize end-user use of prefixes in any way, but by adopting a specific set of prefixes, the documentation was made more self-consistent and consistent with the example instances.
XPath addresses can be very lengthy and the length of a particular component's XPath address has no relationship to the size of the field in which the information is displayed. This makes it very awkward to annotate an actual layout with the address required. An alternative way of representing the information was needed.
The UBL project employs what were termed "key references", "key files" and "key reports" allowing all elements and attributes of instances of a document to be indicated by numeric reference, which is de-referenced to a documented XPath expression. Using XSLT a number of very useful reference files were created.
"Key references" are unique numbers assigned to every element and attribute in an XML instance. Each of these numbers is expressed in between exclamation marks so as to reduce the possible ambiguity with any boilerplate text on the page. Attributes are numbered to the right of a decimal point, utilizing the number of the element to which they are attached to the left of the decimal point. Examples from the 0p70 Despatch Advice key reference file are:
3 /da:DespatchAdvice/cat:IssueDate
10 /da:DespatchAdvice/cat:ReferencedOrder/cat:SellersOrderID
10.1 /da:DespatchAdvice/cat:ReferencedOrder/cat:SellersOrderID/@language
"Key files" are files where the content of each and every element and attribute is replaced with its key reference. When key files are used as input to stylesheets, the key references are displayed, thus showing the behaviour of the stylesheet with all of the content. See Figure 8 for a depiction of a UBL stylesheet being run against a key file, and note the use of field "!10!" at the top right that corresponds to the XPath address above.
"Key reports" are in two formats: simple text and HTML. Each line in the report has two columns: the key reference on the left and the XPath address on the right. The three lines above are extracted from a text key report.
The HTML key reports appear to be the same but employ JavaScript behind the scenes. Note the status line in Figure 9 where a key number and attribute name are shown. This is the feedback of the act of clicking on a report entry's XPath address, indicating that the entry's complete XPath address has been written to the clipboard. This greatly facilitated the editing of the formatting specifications, and reduced the opportunity for human error, as the author of the specification could then just paste the complete XPath address from the clipboard into the editing tool.
One method of producing key files is to read sample instances and just replace, in situ, all of the present element and attribute content with key references, and to build the key reports from these entries.
This was initially used in the specification development, until it was noted that a number of optional fields that were not present in the data still belonged somewhere in the final rendered result. It became quite laborious to try and hand-populate the missing components just to produce a complete enough key file with which to do formatting.
It was during this process of trying to supplement the test instance with additional information that a revelation regarding validation and stylesheet processing came to light: an input instance is never validated when acted upon by XSLT, therefore it was acceptable to populate the data with as much information as possible whether or not it happened to be valid according to constraints. Care was taken to ensure the parent/child relationships were correct, but the sibling relationships were irrelevant to formatting though they were relevant to validation. This led to a different creation of key files.
Recognizing that the role of the schema expression is to perform constraint validation on a file for processing, and that a stylesheet processor such as XSLT doesn't perform validation, it is okay to send a non-valid document to a stylesheet as the source instance for transformation. Would it be possible to somehow create a pathologically complete instance of all possible elements and attributes in the correct parent/child relationships?
The method that ended up being used is the generation of a key file from the normative W3C Schema expressions of UBL. This would have been very difficult, if not untenable, using XSLT except that the schema expressions in UBL are not hand-crafted. Every schema file is synthesized from the information expressed in a collaborative Open Office spreadsheet. This collaboration through a familiar office productivity tool allowed for the enthusiastic participation of non-XML-oriented members of the committee in the expression of the semantics behind the constructs. Once the spreadsheets expressed all of the correct information about the UBL constructs, the mechanical process of expressing the relationships in the W3C Schema vocabulary was automated.
This automation resulted in a very regular and predictable expression of the schema constructs for the UBL information constructs. An XSLT stylesheet was written to read the W3C Schema expression of a UBL document type and synthesize a key file with every possible element and every possible attribute, all populated with their respective key references and an associated key report with all the information summarized. These pathologically complete instances would never validate but would also never interfere with basic stylesheet processing of the display of information components in a resulting layout.
Key files ended up playing two important roles in the UBL FPSC work and the work of implementers of UBL formatting specifications.
During the development of the formatting specifications the key files were used by the subject matter experts to annotate (by hand or by graphic tools) the visual mockups with succinct key references unambiguously identifying lengthy XPath addresses, and as an assist in the editing process to capture the lengthy addresses in the prose of the specification documents.
During the development of implementations of the formatting specifications, the key files were used to confirm the placement of information from a source instance to the resulting display.
All subsequent internal candidate releases of the UBL Schemas were accompanied with a complete set of key files and reports that were utilized by FPSC in the authoring of the specifications and were made available to developers.
Writing the formatting specification is the most important phase of a rendering project because all of the results hinge on the formatting specification's accuracy and clarity to the implementers.
No Universal Business Language stylesheets could have been developed until the subject matter experts who understood where the information is found in a UBL instance could document where that information belonged in a rendered result. Not all (or even very many) stylesheet and rendering tool makers can be expected to understand the semantics of the information as intimately as the developers of the specifications themselves. In your own projects be sure to not only involve subject matter experts but have them lead the development of the formatting specifications, with an implementer's knowledge of the rendering technology as a check and balance.
In open development projects, as typified by UBL where all vendors and all technologies are invited to participate, it is important that the information be expressed in a technology-agnostic fashion so as to not bias any particular technology. The use of XPath addresses is independent of any rendering technology because the XPath data model unambiguously describes the structure of any XML document, regardless of how that document is being processed downstream.
When considering the information flows for multiple targets, it is important to explore opportunities for leveraging intermediate results in a serial fashion rather than implementing totally parallel processes. Try finding points where automation can play a role to produce multiple paths for the information.
When documenting formatting specifications and measuring implementations, consider the use of key references and key files. These can be very powerful in unambiguously identifying the sources of and actions upon input XML constructs.
[Ambrosoft] Ambrosoft, Inc. http://www.ambrosoft.com
[Crane] Crane Softwrights Ltd. http://www.CraneSoftwrights.com
[RenderX] RenderXhttp://www.renderx.com
[UBL] Universal Business Language http://www.oasis-open.org/committees/ubl
[UN/CEFACT] United Nations Centre for Trade Facilitation and Electronic Business http://www.unece.org/cefact/
![]() ![]() |
Design & Development by deepX Ltd. |