Keywords: Authoring, Case Studies, Application architecture, Document Creation, Content Management, Electronic Publishing, Integration, Metadata, Publishing, Structure, UBL, XForms, XSLT, DocBook, OASIS Open Office format, OpenOffice.org, StarOffice
Biography
Lars Oppermann has studied compuer science in Hamburg and is working as a professional software developer since 1997. He has mainly worked on XML based content management and publishing solutions for various media companies since 1999 and joined Sun Microsystems Inc. in 2001, first working on the web based StarOffice version StarPortal/Sun ONE Webtop and later joining the development team for the XML based infrastructure of StarOffice/OpenOffice.org.
This paper addresses the use of the W3C XForms standard in a general-purpose office application.
XForms allows for the manipulation and processing of highly structured XML content while providing means of input validation and business logic inside the form. Through the integration of XForms support into an office application, the user is enabled to work with arbitrarily structured XML data in a convenient and well-known environment.
The XForms integration into StarOffice and OpenOffice.org that the author shows here supports the user in the design phase of the form, as well as during data entry and validation in the deployed form.
Form design and data entry are integrated into the existing program modules allowing the user to work with forms in a known environment.
Due to the standardization of XForms, the forms are usable with any other XForms compliant application. This enables the use of a general purpose productivity suite to work with arbitrarily structured XML data, where otherwise specifically customized tools would be needed. The special benefit of this particular combination is the possibility of combining structured data and free form content into a single document.
Since the OASIS Open Office XML format is used for free form content, all information that was entered by the user is stored in a well-structured and standardized way which can be processed by any given application throughout a work flow. For instance, XSLT can be used to create any required XML format from the combination of XForms instance data and document content.
Examples will be given to demonstrate the outlined combination of standards into the OpenOffice.org application, including the manipulation of OASIS UBL instance data through a customized form and the preparation of an XML 2004 conference paper in OpenOffice.org through the combination of XForms, XML based word processing and XSLT.
1. Forms Support in Office Applications
2. XForms
2.1 XForms Model
2.2 Repeatable Sections
3. Implementation Architecture
3.1 Adding XForms to the OASIS OpenOffice Format
3.2 Forms Layer
3.2.1 Form Controls
3.3 XML Layer
3.4 Forms Processing
3.5 Forms Submission
4. User Interface Implementation
4.1 Design Mode
4.1.1 Instance Definition
4.1.2 Submission
4.1.3 Bindings
4.1.4 Form Design
Considerations
4.1.5 Form Controls
4.1.6 Other
Design Aids
4.1.7 Calculations and Constraints
4.2 Form Usage Mode
4.3 Submitting Forms
4.4 UBL
(invoice)
4.5 Conference Paper
5. Conclusion
Bibliography
Footnotes
Embedded forms have become a major component in modern information systems. There is next to no website on the internet that doesn’t use HTML forms to get input from the user. Since the advance of scripting technology, web page authors have been able to tie a limited amount of logic to their forms, thus being able to do input validations and calculations on the client side while the user is filling out the form.
Forms have also been a component of Office Applications for some time. Office Applications have also been supporting programming models for some time now, and authors are able to use these means to tie any logic to their forms which they are able to express in the supported programming languages.
However, expressing calculations and input verification together with appropriate error messages in a user friendly manner can become quite a burden for the forms author, and will require substantial programming skills on his side as the forms are getting more complex. Making the form behave in the desired way can become a more demanding task than developing the application that is dealing with the final data.
Office suites have been offering a greater range of input controls than plain HTML documents. Forms that are using such rich controls cannot be used in other Applications, nor can they be exported to a HTML version of the document; this limits the distribution of the form.
Usually such a form is designed for a single task inside some organization and won’t be reusable by anybody else, due to the heavy customization that was used to integrate the form with the rest of the application.
The new W3C XForms recommendation remedies a lot of the problems that have been outlined in the previous section.
In contrast to traditional HTML forms, XForms has been designed to provide a clear separation of the data being collected from the controls which are collecting the data. Not only does this separation provide for greater transparency within the system as a whole, it does also enable XForms to be much more independent from the actual environment it is being used in.
XForms provides a standardized way to create form based user interfaces, but XForms doesn’t stop at the form design. XForms enables the author to formulate input checking and calculations through standardized XPath expressions--consequently, he is not required to familiarize himself with yet another programming language and application specific object model, in order to define the logic of the form.
XForms also gives the author the ability to bind every form control to a specific location in an XML model, which can later be submitted in whole or in part to the application handling the data, entered through the form. Having forms produce structured data in this way helps reducing the complexity of the receiving application, and makes designing the form much more transparent.
XForms primary benefits are as follows[1]:
An XForms form uses a model-view-controller architecture.
The model being the instance data, which will eventually be sent to a server, saved to a file or be attached to an email
once the user has finished filling of the form. (This is not to be confused with the xforms:model element, which
contains instance data and controller logic as bindings.) The view tier is made up from the actual form controls that
are visible to the user and which provide the means of interacting with the instance data. The controller tier is
composed out of bindings which define the relation between the form controls and the instance data.
Bindings link form controls and their content to specific positions in the instance data tree. They can also hold expressions which define constraints and verifications on the data that can be entered into a specific control.
Interaction between the three tiers is facilitated by an event driven messaging mechanism. Individual binding elements will be notified about changes to controls that they are linked to and will perform modifications on the data instance. Bindings will as well receive events about changes in the data instance which have occurred due to other bindings performing some modification on the data instance. In turn, those bindings can update the controls, to provide some sort of visual feedback. Furthermore, they might perform calculations and store the result at some other location in the data model. This might trigger yet other bindings, listening for events about that part of data instance, and so on.
This 3-tier approach eliminates the need for any external scripting facilities almost entirely. Since a vast number of relations between model and controller can be defined by the bindings. The form author no longer has to define event handlers for form elements. He can rather model the relationships, of the elements making up the form, in a much more natural way by describing the relations of the form components through bindings.
XForms defines repeatable sections. Such a section is a group of controls, which can be duplicated in the form e.g. for a list of items in an order form. At this time, repeatable sections are not supported by the current OpenOffice.org implementation; though this is planned for the near future. For that reason, repeatable sections will not be further discussed in this paper.
This chapter will cover the infrastructure that has been added to OpenOffice.org, in order to allow for the integration of an XForms compliant forms mechanism. The user interface that was brought in place, to make this functionality available to the end-user, is described in the next chapter.
The primary consideration concerning the file format is to ensure compatibility. This holds for backwards compatibility with older versions of OpenOffice.org, as well as compatibility with other XForms compliant applications, in so far, as that the XForms Model is stored in a way that can be easily extracted from the document. XForms also allows multiple forms to be embedded in a single document, which is also covered by the inclusion of XForms into the OpenOffice format.
The OpenOffice File Format [OASIS] is controlled by the OASIS Open Office File Format Technical Committee (TC). The members of the TC maintain a detailed specification on all aspects of the file format, thus providing an open format for anyone interested in working with office productivity applications.
As already outlined, forms are a common part of office productivity work. As it has been the practice of the committee in the past to re-use existing XML schema where appropriate, it was decided to propose xml schemes defined by the W3C XForms working group for inclusion into the open office file format.
The OASIS Open Office File Format was designed with extensibility in mind. This means, that aggregating content with an XML representation into the format is straightforward.
office:forms elementform:form and xforms:model . The content
of the xforms:model element conforms to that specified in [XFORMS]. xforms:instance elements to hold forms data. It will also include xforms:bindcode> and xforms:submission elements, that
define the behavior of the form. xforms:bind attributexforms:submission attributexforms:submission element from the XForms model.
When the button is clicked, the referenced submission is processed.
<office:document xmlns:office=" urn:oasis:names:tc:openoffice:xmlns:office:1.0 "
xmlns:form=" urn:oasis:names:tc:openoffice:xmlns:form:1.0 "
xmlns:xforms=” http://www.w3.org/2002/xforms ”>
<!-- ... office:document content ... -->
<office:forms>
<xforms:model id=”model1”>
<xforms:instance id=”instanceData”>
<element1 attrib1=”aValue1”> text-content </element1>
</xforms:instance>
<xforms:submission id=”submit1” ref=”//instanceData”>
<xforms:bind id=”binding1” ref=”//instanceData/element1”>
</xforms:model>
<form:form>
<form:text xforms:bind=”binding1” .../>
<form:button xforms:submission=”submit1” .../>
</form:form>
</office:forms>
<!-- ... office:document content... -->
|
Example 1: Use of
xforms:bindandxforms:submissionattributes
While the xforms:bind and xforms:submission attributes have been introduced
to the file format for the purpose of XForms support, the office:forms element has been a part of the file
format before. The office:forms element was only extended, in order to allow for the inclusion of the
xforms:model element.
The form controls are later placed and positioned in the actual document
by shape elements, which reference the form control in the form description section.
OpenOffice has had form support since it was first released. The form model, however, was prone to the shortcomings described in the introduction.
Since form controls are already available, it is only necessary to ensure compatibility of the available controls to those defined by XForms.
As was already outlined in the previous section, a form in open office
is made up from form controls, which are stored in the //office:forms/form:form element. These forms are
positioned in the document as shapes which are associated with the form controls, and which are thus
providing a view onto the abstract form model.
This doesn't change for XForms based forms at
all. The form elements are just associated with an xforms:bind or xforms:submission element in an
xforms:model.
This means, that existing forms can be easily upgraded to XForms, by just adding a data instance and binding the existing form controls to appropriate nodes in that instance.
Input controls come in different kinds, suited for entry of different data types e.g. text, numbers or dates.
The various input controls also support properties, like the hiding of entered text for passwords or other sensitive data.
Input controls can be configured to be read-only for the user of the form. This would still allow changes to the control through a binding-expression, allowing such fields to display the results from calculations or other visual user feedback.
Further controls are available for the selection of data ranges, single or multiple selections from a set of options and buttons, which can be linked to submissions.
OpenOffice.org does not use the exact form controls specified by [XFORMS]. Existing form controls are
reused and augmented with the xforms:bind and xforms:submission attributes as described above. The mapping
of controls defined by [XFORMS] and OpenOffice.org form controls is as follows:
| XForms Element | OpenOffice Control |
|---|---|
input element | various input field controls for text, number, dates, etc. |
secret element | property for input controls |
textarea element | input field control |
output element | read-only input fields with appropriate binding |
upload element | not supported |
range element | range input filed control |
trigger element | button control |
submit element | button control with appropriate binding |
select element | selection input field controls e.g., multi selection list-box or selection box group |
select1 element | selection input field control e.g., single selection list-box, dropdown box or radio-button group |
choices element | button control |
item element | property of form controls |
value element | property of form controls |
filename element | not supported |
mediatype element | not supported |
label element | label control |
help element | property of form controls |
hint element | property of form controls |
alert element | not supported |
Table 1
When an XForms model contained in an XHTML document is imported, the XForms controls from the document are converted to the internal control types. The model with it's instance and bindings is imported as-is.
The underlying object model uses a DOM level-2 implementation for the representation of the XForms instance data. The model itself on the other hand is stored in a custom object model within the OpenOffice document model. The OpenOffice model is not DOM compliant and is only serialized to XML when needed.
The instance representation supports DOM level-2 events and there is an XPath engine [XPATH] made available that can evaluate XPath expressions on the DOM instance. The event-propagating and xpath mechanisms are used to connect the internal object model to the instance data.
The implementation was done on top of libxml2 by Daniel Veillard [XMLSOFT] and consists of a wrapper, that makes libxml2 functionality available through an UNO API ([UDK]) which is modeled after the respective W3C specifications and recommendations concerning DOM and DOM-events: [DOM2] and [DOM-E].
While the actual data-model and the XPath engine are provided by libxml2, event processing was implemented as part of the wrapper architecture since it is not part of libxml2.
The XForms processor engine was implemented on top of this wrapper architecture, and thus has no direct dependencies on libxml2's data model, since it only uses the DOM abstraction layer implemented as a UNO API.
XForms processing is event driven. Bindings receive events from controls and from the data model instance when it changes. This ensures that the data instance and the form are always reflecting the correct state of the form. Hence, it becomes possible for the user to receive instant feedback about anything that he is doing with the form. The processor can do calculations on the fly whenever data is entered, and display the results to the user through the form, e.g. by means of a form control bound to the result of the calculation.
Processing of an expression will stop, should cyclic references be encountered, in order to prevent infinite loops.
Submission refers to the process of sending the instance data to the next step of the workflow. This could be sending the model as a mail attachment, submission to a web server or just streaming the instance data content to a file for further processing.
The form designer also has the possibility to control what should happen after the data has been received by a server. The server reply could either replace the whole document that is including the submitted form, as is mostly the case in today's web browsers and HTML forms, or it could replace all or part of the data instance that is associated with the form, allowing for the representation of the result within the current document.
The form
author is able to exercise a great level of control over which parts of the data model are actually included
into the submitted data set. Not only is it possible to only select a specific sub-tree of the XML instance,
it is also possible to mark individual nodes in the tree for in- or exclusion from the submission process by
attaching specific xforms:bind elements to them.
XForms defines a number of submission methods that specify how the collected instance data is to be submitted; the most common method being the submission of the XML representation to a server via the HTTP-POST method. XForms also supports the submission by the HTTP-PUT method used in WebDAV servers. The PUT method is also used for streaming the XML content into a local file via a file://-URL.
The supported submission methods are:
XForms furthermore defines some encodings which are suitable for the various submission methods:
OpenOffice is offering a transparent component based network access layer called the UCB (universal content broker) which is already offering support for all the communication methods that are used by XForms [UCB]. By implementing the aforementioned encodings it is possible to use this existing infrastructure to support the submission schemes defined by XForms.
The submission and encodings should be well known by themselves since they have been in existence for some time and are used throughout the internet's communication infrastructure.
While the various post methods and the put methods will use encodings yielding a textual representation of the actual XML tree that is to be submitted, the get method by definition submits a set of name-value pairs. In this case, XML element nodes will become the names and their text children will become the values of the submission. All other parts of the XML tree will be ignored by the respective encodings. For example, the following XML sub-tree
<element1 attrib=”something”>
value1
<element2>
value2
</element2>
</element1>
<element3>
value3
</element3>
|
Example 2: Submission Data
will be encoded into the following set of name-value pairs:
element1=value1&element2=value2&element3=value3
Note the absence of the
attrib child of element1. The application/x-www-form-urlencoded is only useful for more simple forms and
does not support the replacement of the data instance as is supported by the other methods. It is however an
important part of the implementation as it ensures backwards compatibility with all the CGI like
applications that are build around this wide spread data submission method including Java-Servlets, PHP, ASP
and others.
This chapter is covering the user interface, which was brought in place, in order to allow the end user to access the XForms functionality, that was implemented by the infrastructure described in the previous chapter.
The implementation is based on a two-mode concept. There is a mode for authoring or editing a form, and there is a mode for data entry into the finished form and the submission of the filled out form.
It can be specified as a document property whether a document containing a form should be opened in design-mode or not. The form author will disable this document property once he decides to deploy the forms document.
The central tool when working in the design mode is the Data Navigator, which provides a structured view of the XML data instance, that is to be used by the form. The Data Navigator allows the designer to manipulate the data model by either importing, and possibly modifying an existing XML instance, or by creating his own model from scratch.
The process of creating a form layout, defining the behavior of the form and it's relation to the underlying data model is called form design. XForms forms can be designed by means of a simple text editor by writing directly to the XML format. In an office application designed for an end-user, the details of the underlying XML should be hidden as much as possible. The user should be able to create a form with tools, similar to those that he is used to from ordinary document creation.
The main interface in form design mode is the data navigator. It offers an interactive user interface that allows the user to edit the form models in the document. A document can contain multiple form models.
The Data Navigator breaks up the view of the XForms model into at least three sub-views, represented by a tabbed control panel.
The Instance view of the Data Navigator offers functionality with which the form creator can define an XML instance. This can either be done by loading an existing XML instance, or by creating a new XML instance from scratch.
For every instance in the document, there will be an individual tab in the Data Navigator.
The instance can be modified by adding or removing elements, text nodes or attributes. The instance editor does not offer any schema support like DTD, XSD or Relax-NG. It is up to the form author, to ensure validity of the instance, should adherence to a certain schema be desired.
A forms document supports any number of embedded data instances. Instances are assigned a name and are accessible by that name through the Data Navigator.
An instance, which has been defined in the Data Navigator'sInstance tab can have a number of submissions assigned to it assigned.
Each submission can be used to submit different parts of the instance to different endpoints, as described in Section 3.5. Submissions can be bound to form controls by bindings as described in the next section.
A submission can be referenced by its name. It has an action URL, which describes the endpoint that is contacted for the submission process e.g., a file location, an email recipient or a web server. The binding expression selects the part of the respective instance, that is to be used in the submission. The binding dropdown selects an existing binding, to which this submission is to be assigned. The replace dropdown selects the action, to be taken after the submission has been executed – see Section 3.5 for a description of available submission actions.
Bindings represent the connections between form control and instance nodes. An instance can be associated with an arbitrary number of bindings, which define the relationships of all the elements making up the final form.
The Bindings tab shows all bindings that have been defined for the current model. Bindings can be entered directly into this dialog, but they will also be defined by editing binding properties of form controls, submissions or instance nodes.
Through the add or edit binding dialog, the properties of a
binding as described in [XFORMS] can be manipulated. The default value field contains the XPath expression
which is used in the actual xforms:bind's ref attribute.
Form design can be approached from two different angles. The first of which is the data which should be collected by the form. The form author has to know what kind of data he is collecting and what structure it ought to have. A form designer will often know what kind of data his form should collect and how it should be structured. A user approaching form design from this angel can start by importing a pre-filled set of data into his new form model. He can then start to add form controls to his document in the way he desires. After the layout of the form is finished, he can start to bind the individual form controls to the previously imported data-set. The data can then be modified directly by means of the form controls.
In the second approach, the actual data representation might not be defended beforehand. The form layout however might already be available, since the author is implementing an electronic version of a paper based form. In this case, the author would start by rebuilding the existing form inside a document. After he is finished laying out the form, he can define bindings of the controls into a yet non-existent data model. The data model will be created on the fly, according to the bindings when the controls are manipulated.
When laying out the form, the author can define the control specific bindings by editing the properties of the control. The properties sheet contains a Data tab, through which the XForms specific properties such as binding expressions, constraints, etc. Are accessible.
The available properties have the following meaning:
Every time an XPath expression is to be entered, a special dialog is used, which displays a preview of the result of the entered expression. This helps to prevent erroneous expressions.
Where possible, the user should not need to be confronted with the structure of the XForms instance in general. He should however have an understanding about the structure of the model he is binding his form to. Structural considerations are of course not necessary if the form is very simple and the data is to be submitted in plain name/value pairs.
The central visual design aid is the tree-view of the data model instance from the Data Navigator. Individual form controls can be connected to the nodes in this tree by simple mouse drags.
Analogous to the Data Navigator, the Form Navigator can be used to provide an overview about all form controls that are in the document. This is a more convenient way to gain access to individual control properties, such as binding expression, constraints and calculations.
The user can use drag & drop in order to define bindings between controls and instance nodes. By dragging an instance node from the Data Navigator into the document, a default control for the specific data type is created and bound to the respective node. By dragging an existing control onto a node in the Data Navigator, the control is bound to the respective node.
If a control is selected in the document, a double-click (or return key) on an item in the Data Navigator binds the selected control to this item. If no control is selected, a default control for the selected item is inserted into the document.
Calculations and expressions are not limited to the control which they are bound to. They can include references to all parts of the data instance, in order to perform their task.
A calculation, which is to calculate a sum of values entered
into other parts of the form, can use the sum() function in order to sum up the values from the individual
nodes, that the controls through which the data was entered are bound to.
The same is true for constraints. Not only can these perform checks on the current control's value, they can also include references to any other part of the model.
For instance: the user creates a control, in which some maximum value is entered. This value will be stored at the bound location in the instance. Some other control may now define a constraint, which references this location in order to check for the maximum value.
The user of an XForms based form will see the form embedded into a document. The document provides a contextual container for the form and can include explanations on how the form is to be filled out. The document also provides a visual container for the form, which is helpful in case the completed form is later to be printed or stored in some other format, for sharing with other users or archiving purposes.
Due to the event based implementation, the controls of a form are not static. They can change in appearance and content while the user is filling out the form. The form can thus provide instant feedback on anything that is entered. Invalid values can be marked. Results of calculations can be displayed. This enables the user to receive visual feedback at all times while he is working with the form.
The submission of data, collected in the form, is triggered by clicking a button that has been bound to a submission rule as described in Section 4.1.2.
After the submission has been processed, the action that was defined in the submission will be triggered.
This example is describing the creation of a forms document which will be used to create data-sets of a predefined structure in order to exchange this data with other parties.
OpenOffice.org allows to use XSL transformation as in- and export filters. Since there is already a transformation available, that saves an OpenOffice.org document to a docbook compliant file, the user can use this transformation in order to produce a workflow for the generation of documents which are compliant to the XML2004 conference format.
The user will create a form which stores the articleinfo
metadata required by the conference format, and he will extend the transformation with templates which
include the form data into the resulting docbook file.
<!-- copy article info from xforms:instance -->
<xsl:template xsl:match=“//xforms:instance/articleinfo“>
<xsl:element name=”articleinfo”>
<xsl:copy>
<xsl:apply-tempaltes select=”node()|@*”/>
</xsl:copy>
</xsl:element>
</xsl:template>
|
Example 3: XSL Template
It has been shown, how XForms support has been added to an office application. We have seen, how the concept of forms can be put to use in an application that is normally used for the authoring of documents.
taken from [XFORMS]
XHTML rendition made possible by SchemaSoft's Document Interpreter™ technology.