Abstract
The Register of Reporting Obligations of Enterprises (OR - Norwegian Abbr.) in Norway is by law and regulations authorised to monitor and assist all reporting obligations of enterprises. OR focuses on achieving correct and efficient reporting by identifying and reducing multiple reporting of identical information to government agencies. It has become the basis for a number of implementations of electronic reporting. The current solution has received a best practice award from the OECD (2002).
It has a database-founded repository of building blocks and a methodology for creating new building blocks. Currently all data submitted by companies to the governmental agencies are described using these building blocks. The methodology combines a system for handling metadata, data modelling and forms design. The result is exported to an XML Schema and an XForms specification published at a web-server.
Even though the system has won an award, there are some fundamental problems.
Overlap search identifies few identical data definitions. The reason is that besides item descriptions, information about format and form-specific information are also included in the XML Schema. Since external systems retrieve this information and generate forms based on this only, the "additional" information cannot be removed.
This "context" has produced a staggering 20 000 definitions. Reuse is limited due to the included "context" information, and each new electronic form causes the number of XML Schemas to increase.
Even if we could handle the enormous and ever increasing amount of definitions, some forms cannot be transformed to an electronic version due to an exponential increase of XML definitions.
The current solution combines the data description and layout information in one XML Schema description. This does not necessarily pose a problem, but because the XML Schema descriptions are the only output of our design process, this combination of information causes most of our problems above.
The obvious solution is to split the information modelling and the form layout design into two different processes. This will increase the possibilities for reuse since you can choose to reuse only the information model or the pre-designed form elements.
The information modelling is done by creating an ontology that covers relevant information objects. The resulting object attributes are defined to be the information carriers. This separation makes reuse possible across different forms and formats.
Both a common governmental ontology and disjoint ontologies for different domains pose challenges. To avoid this we choose a hybrid solution where we require common use of some information carriers. This allows for both locally adapted ontologies and reuse.
The form elements and form design are described using the same methodology as the information objects. All are stored in a database and made available for the applications used by the agencies to construct an electronic form. All steps in this model and design process are supported by a combination of customised and off-the-shelf software.
Keywords
Table of Contents
One identified important factor to decrease the bureaucracy and to increase the competitive power of Norwegian companies, is to aim for the principle of "Told once, never asked again". This principle basically says that if one Norwegian agency has requested some piece of information from you, no other agency will ask for the same information. If another agency needs the same information they will have to request it from the agency that have this information.
Coordination of requested information is economically efficient if you have a holistic view, but for the individual agencies this is not necessary the case. This is a typical example of a situation where automation and simplification is most advantageous for others than those paying for the change. To evade this understandable lack of motivation, a new agency called Register of Reporting Obligations (Norwegian: Oppgaveregisteret, abbreviated OR) were established. Supported by a dedicated law, this agency was given the responsibility of promoting and implementing the principle of "Told once, never asked again". To do this OR was authorised to monitor and assist all reporting obligations of enterprises to state agencies.
To be able to reach this goal, we must be able to reason on the information to be submitted, and identify identical information requested by different agencies. Our efforts in the TOR project are aimed at making this identification possible. The main motivation for our system is the ability to detect multiple submission of the same information from the same actor. We denote such a task Multiple Submission Search (MSS). The main intention of this paper is to show how our need for MSS affects our new system.
Even though the main motivation behind the system is to allow OR to do a good MSS, the resulting information model can be used in other connections as well. One of the big challenges in electronic communication between computers or companies is interoperability. Standards like XML make syntactic interoperability easier, but do not remedy the problem of semantic interoperability. Our information model will be easy available, so that the semantic content can be used also to achieve more semantic interoperability when different organisations exchange this kind of information.
A system was established to implement the objective or function of the Register of Reporting Obligations. This support system was established in two stages. In the first stage an internal system ORsys was developed to handle all reporting obligations and adherent forms and data definitions. In a second stage a new system ORetat was established to allow agencies to structure data definitions for better reuse. This is the current situation and the basis for a common governmental Internet portal for electronic reporting called AltInn. This solution received a best practice award from the OECD in 2002
The current system is comprised of a number of subsystems, all pivoting a database. The central information is:
Data definitions based on ISO11179-methodology (input through ORsys)
Form specifications containing and grouping sets of these data definitions (input through ORetat)
Reporting obligations referring one or a set of form specifications and relating to types of legal entities (ORsys).
Representation formats referred by data definitions (ORsys)
Agency registers (ORsys)
The data definitions are categorised by data type and business domain. There is no internal structure in the data definitions other than their adherence to the same concept domain or business domain.
Reuse of data definitions is promoted by functionality in the internal ORsys tool. Today the number of data definitions registered in the database has passed 20000. This work is a very complex task and the huge number of definitions is efficiently undermining any efficient reuse.
The ever-growing number of data definitions is caused by two different factors: Layout information like number of digits allowed is included in the data definitions. This means that besides having to map semantic content you also must map syntactic content, in practice this causes almost as many variants of a data definition as there are agencies. The simplicity of the system is the second problem: There is no functionality for describing partial equivalence. Inheritance or Classes are not offered, so there are no "higher level" views, improving the possibility for reuse of data definitions.
One can conclude that it is both difficult to reuse the existing definitions because they are too specific, and to find data definitions to reuse because of the organisation of the definitions. The lack of reuse of data definitions also reduces the possibility for MSS, since we do not know to which extent different definitions are semantically equivalent. The main goal of the new system will be to remedy the situation.
The main goal of the TOR project is to minimise the workload in companies due to compulsory submission of data. We have identified two sub-goals that are thought to be important for reaching the main goal: Promoting electronic submission of data and removing multiple requests for the same data. How we intend to promote electronic submission will be discussed later. This chapter focuses on how to reduce multiple submissions of the same data by the same person or organisation.
We want to reduce multiple submissions of the same data by the same person or organisation valid for the same period. In an ideal world, everyone uses the same names for equal data. Equal data is here considered to be data with the same semantic content. In the real world we cannot expect use of equal names to be the natural order of things.
To be able to compare the submitted information, we must be able to compare the semantic content somehow. We have at least three possible solutions to this problem.
It is possible to use a controlled vocabulary. Using this approach, it will be sufficient to compare names of data. If two data elements have the same name they are said to be semantically equivalent. The drawback of this method is that it requires the users to understand how to use the controlled vocabulary. Another problem is that almost equal data will have different names, and this "almost-the-same-ness" cannot be identified.
Another possibility is to let the person defining information also describe the defined information in a natural language. This makes it possible for humans to compare the semantic content and decide whether it is multiple submissions of the same data or submission of different data. This method has the advantage that the semantic content of data potentially can be fully described.
Still this has two rather serious limitations: The description of the semantic content is dependent of the person writing the description. This kind of semantic descriptions tends to be very personal and can be difficult to navigate for other people. Much of the system that gives the data definitions order, might be inside a person's head and hence not very computable. Even though the technology for interpreting natural language is improving, the current state of the technology does not allow for computerised MSS based on natural language descriptions.
In a way both of the above alternatives can be said to be ontology [2]variants. An ontology is a description of how different items do or do not relate to each other. It can also be thought of as a thesaurus describing the different items modelled by the ontology. Hence it seems that what we need is an ontology. For the computer to be able to reason based on the ontology, it must be easy for the computer to "understand", and richer in content than what is possible for the above-mentioned solutions.
When we are choosing a way to represent and model the information, three different aspects are important for us: It is important with support for the modelling process. This must not necessarily be included in the language, but it must be possible to add it seamlessly seen from the users point of view. The chosen modelling language must be an open standard or directly convertible to one. This means that it should not require expensive investments or specific technical solutions for users of the information models created in the modelling process. These requirements are in addition to the needs discussed in the previous part chapter.
There has been a lot of focus on open standards lately, for instance mentioned in the "European Interoperability Framework" [1]. For us this is an important issue in choosing a modelling language. This is reasonable since we are working toward interoperability.
For this system to meet its goals, semantic interoperability is important. The creation of an information model for use under MSS, will at the same time allow for semantic interoperability. Semantic interoperability is of no use if you do not have the technical interoperability as a basis. So XML as standard format at the technical interoperability level is an obvious choice for us. As an extension to XML it also seems reasonable to include Web Services in our solution. This will be an open standard access-point to our information model.
The possibility for a computer to reason on the model created is the most important factor for us. Creating a model on a sheet of paper or several sheets of paper will only be interesting as long as one person can keep the entire structure in his/her mind at the same time. The world we live in is complex, and believing that a good operational model of the world will be simple and easy to understand is overly optimistic.
The huge advantage of a computer compared to us people, is that it has no problem keeping track of a model containing several hundred classes at the same time. It is therefore reasonable to expect or even evident that the computer will be able to find connections and similarities in a model to a much greater extent than a person would. This is vital for our MMS to work.
When choosing a modelling language, UML[3] seems to be the one obvious candidate. UML is a huge language. Seen from our point of view this is the biggest problem with UML. Since we only need a small subset of UML we plan to restrict the allowed functionality to make it easier to learn for users, and at the same time we can tailor a more closely fitting support for the modelling process.
In an ideal world we would have required the users to express the classes, attributes and relations as first order logic, but since this is an ability few have and even fewer master well, another easier understandable and teachable method had to be found. UML has been identified as a good modelling language, but it has a restricted semantic expressiveness when it comes to semantics of the resulting model. Since we found UML promising, seen from most other points of view, we tried to do an estimation of how much semantics can be found in a UML model.
Object-oriented methodology is designed to promote reuse of classes. This is useful seen from the semantic processability content view. At least the connection between super- and subclasses can be used for semantic comparison of the different classes. Special for this information model is also that every class will have some kind of connection to the common basic data. Basic data is identification data that are compulsory for Norwegian agencies to reuse. This connection to basic data like Person and Company classes can be used to get additional semantic information about the modelled classes. Also the use of data types and sub-classing of data types will also give additional information.
Finally to give additional information to human users when they have to decide whether they can reuse a class, a description of the semantics of the class and the data type will be added to the model. This is beyond the standard UML description, but this does not concern us too much. We can reason on the information model also without this additional support.
Based on the above analysis we have decided to stick with UML, but add the required description on classes and data types, partly to be able to convert our model to ebXML [4]format and partly to aid our users. The subset of UML we intend to use is a Class Diagram "version", with inheritance and attributes with data type and object references. If needed later this subset can be directly mapped to ebXML.
In order to promote reuse of metadata it is essential that these are structured into an information model to facilitate retrieval by semantic navigation. The creation of information models requires modelling skills and semantic domain knowledge. So it is not a trivial task to establish these models. Once available there is a better prognosis of metadata reuse by those specifying forms for submission of data.
In the new solution that will be released later this year we implement a distinction between information modelling and forms design. We will implement two separate tools TORmodell and TORdesign. The first supports in the processes of information modelling, while the latter is used for assembly of forms. In TORdesign there is a reference or binding from field elements to metadata elements in the information model.
Some of the issues discussed in this chapter will affect the TORdesign system, but the parts of the system supporting electronic forms will de discussed in the next chapter. The technical solution is discussed in chapter 6.
Why do we find electronic forms important? First of all it seems reasonable to establish what the alternative is. Earlier all information gathering done by agencies was done using paper forms, manually "punched" into the system. Today it is technically possible to eliminate the manual input part. We want the submitter to be able to enter the data directly into the system.
The main motivation for the TOR project is to reduce the actual and the experienced workload resulting from compulsory submission of data to agencies. Investigations have shown that the experienced workload is less for electronic forms than for paper forms. Another important issue is the fact that electronic forms can be fully adapted to the submitter, and hence be easier to understand and quicker to complete.
Electronic forms also have advantages for the agencies requesting the information. They save man-hours as no more manual operations are needed to enter the information into the system. The dialog existing between an electronic form and the submitter may involve a validation process with the user as an active part. Hence the information submitted can be of higher quality. Lacking information can be requested and the submitted information can be validated for consistency and correct format. Electronic forms also make it possible to find multiple submissions of the same data. MSS will also reduce workload and improve information quality.
Some of the advantages with electronic forms already exist today, but still agencies are hesitant to offer electronic forms to their submitters. We assume that the main reason for this is the needed technology support for using electronic forms. Creating electronic forms is beyond their earlier tasks, and they might not have the ability or resources to focus on this new need.
We therefore feel that it is important to make the transformation to electronic forms as easy and cost effective as possible for the agencies. Most agencies have some people working with form design, and we would like to make the process of creating an electronic form as equal to the process of creating paper forms as possible. Putting this into a software-driven process also makes it possible to promote some standards for getting appropriate electronic forms.
Designing forms is not a new job for the agencies, so we can rely on existing knowledge in the organisations. Making the electronic forms available for the public is quite a different story. This requires resources many agencies might not have and probably should not be forced to hire.
This basically means that when the agency has defined the electronic form we should do what is left of the job. Today we have a form portal called Altinn, where the defined forms can be put into production, without further efforts from the agency. Still the agency needs to have some way of receiving the information submitted through the form.
We assume we will have to offer an integration service to the agencies where we create a module doing the transformation from the public information model and the agency's database. Anyhow we need to be able to do a translation from the internal format to the public one, when other agencies request already submitted information. Since we have to define the transformation one way, the added work for defining the information the other way is assumed to be limited.
Even though we ideally imagine that all agencies will use the full set of tools defined and implemented by TOR, we imagine that some agencies will choose to utilise only some parts of the system. It is also realistic that we later might want to change the tool used in some part of the system.
Both these considerations call for the use of open standards. If you choose an obscure or specialised language for communicating between the components, it will be difficult to replace the tool later. It will also increase the amount of tailoring and costs if some agency wants to create a custom-made tool to support their needs.
Because of this we want to use XML Schema as the format of the information model required by the electronic form design tool. We will use XForms [5] combined with XML Schema to export the finished designed electronic form to Altinn.
The main function of this tool is to give users access to the information model of the TOR system. This model will be a representation of the different domain models contained in the big information model. Modelling all information submitted to the Norwegian agencies is assumed to be hundreds of classes and relations between them.
Human users cannot relate to huge models like this, hence it is important that the user can define views and restrict the portion of the model visible at any time. It must also be possible to alter or extend the model information in TORmodell.
Users of TORmodell will use UML like notation both when viewing the model and when editing it. As earlier mentioned the portion of UML used will be a subset of UML. Figure 2 shows the portion of UML we are using.
The class is the central element. One class may inherit other classes. We do not allow multiple-inheritance. This is a chosen restriction for clarity, both variants are allowed in UML. A class contains a number of attributes. These attributes are either simple attributes of a given data type, or an association to another class. Since this is an information model and not a program, no methods are allowed. An association has two association ends, each attached to one class. A data type must have a name plus at minimum one validation rule. If none of the existing data types fit, you can create a new one as long as you inherit one of the existing data types.
These mechanisms are considered sufficient for our information modelling.
The user interface will look much like a UML class diagram, with most of the known functionality. You will newer see the total information model, only the sections relevant to you. These sections are called views and make it possible to adapt your view of the model to what information modelling you need to do. Views can also be transformed into a document model and exported to XML Schema and TORdesign to be used as basis for form design.
A data type must have a name plus at minimum one validation rule. Data type inheritance implies inheriting the validation rule of the super-type and then adding one or several new validation rules.
For instance the decimal-integer-data-type has the name Integer and the following validation rules:
Characters '0' to '9' and '+' and '-' allowed
Maximum one occurrence of '+' or '-'
A '+' or '-' must precede all other characters
Based on this data type other data types can be derived. For example the decimal-number-data-type with name Number may inherit the Integer type and then add three new rules:
Maximum one occurrence of '.' or ','
Maximum one occurrence of 'E' or 'e'
A '.' or ',' must precede the 'E' or 'e'
To do a good multiple MSS search you need as much semantical information as possible. Since we do not force a naming schema or some kind of logical description of classes and attributes, any help in adding semantical information is of interest.
Forms normally have fields where you have to choose one alternative from a code list. An example can be if you request a person's status. Here the code list might be: married, divorced, single or widow. Other forms having an attribute using another code list with some or all the choices in the first code list is probably semantical related to the first attribute. Hence controlling the vocabulary of the code lists can improve the quality of the reuse search.
After having created a new class and before being allowed to commit it to the information model, a reuse search will be done. This search will produce a list of classes that resembles the candidate class. A manual inspection and assessment will then decide whether the candidate class can be reused, either through inheritance or direct reuse. If the class cannot be replaced by an existing class it is enter into the information model.
The products from ORmodell are XML Schema documents. There are two different types: You can export the whole model to an XML Schema document, or you can choose to export only a part of the model.
In the same way as you can restrict views to only show one part of the information model, you can also export parts of the information model. This especially makes sense since it reduces the complexity for the users of TORdesign who are basing their forms on a received XML Schema.
This is technically done by defining a document model. This model contains a number of classes and relations. The document model is a subset of the information model; none of the classes and relations is changed. They are either present or not, but if they are present they are equal to their counterparts in the information model.
Models will change over time as new requirements are added and ill-stated details are revealed. However each time a specific model is made available for use in TORdesign or for export the content of the class must be frozen.
One way to handle this would be to allow only one version of all classes. This would probably cause frequent forced update of all forms; this would be a counterproductive effect causing much extra work every time a class was updated.
To avoid this problem we plan to handle the version challenge this way: All classes have a version. If you change a class, the previous version of the class remains unchanged, but a new version of the class being the current version is created. All past version of classes remain in the system, but they are normally not visible in the modelling tool. The only exception from this rule is when you open an old view or document model where you have used an old version class. The class will still be available and useable, but through colour coding users are maid aware of the fact that their class version is an old on.
The TORdesign system supports the construction of form and message specifications. These specifications are ready to be used by an Internet portal or a web service respectively. The tool is designed to work with information or document models that are generated by TORmodell.
Form designers should focus on the form and its ability to communicate with the submitter. The designers must be able to start with the visual elements of the form without having to relate to the complete information model. It should be possible to finish the design as such before doing any binding to the attributes in the document model. When the designer proceeds to information model binding, they should not need to know about all details of the model, but be able to navigate this model to find the relevant attributes or alternatively do a search for a class or attribute based on name or type.
The design is ideally a top-down process, starting with pages, continuing with panels in one or more levels, and ending with putting fields inside (on top of) panels. A reversed or bottom-up process should still be possible creating lower-level elements and moving these into higher-level elements at a later stage.
The output from TORdesign is an XForms document. We aim at supporting a majority of elements and mechanisms of the XForms specification. All form controls (input, output, textarea etc.) will be available to be put on the form. The group element will be used to organise the individual fields into pages and sub-page panels. We will probably use the XForms vocabulary to label the element selection items. Adding illustrative graphics will help the user in grasping the function of each item.
Each form control will be bound to nodes or attributes of an information model (through XPath expressions). The bindings may refer to the entire information model or a subset. Such subsets will either be business-related information model packages or document models. The latter is a collection of subsets of the information model containing a set of undivided classes that just about covers what is needed for a specific form design task. The objective is that the information modeller facilitates the model binding for the forms designer.
As an alternative to multi-page forms, there is a need to be able to specify dialog-type forms where elements of the form are enabled/disabled or made visible/invisible depending on answers in specific other fields. Such forms are not directly supported by XForms. We will have to build a superstructure on top of a number of XForms specifications to accomplish this.
We have described how we intend to solve the problem of multiple submissions of the same data (MSS). Two principles have been promoted as important: Open standards and interoperability.
Our system covers the complete process, from creating a domain model to making the finished form available for the public.
On the modelling end of the challenge we have chosen to use object-oriented modelling and UML. UML is represented in many different versions, but the resulting model is saved as an XML Schema, and is therefore available to anyone independent of the UML tools they may or may not have access to.
At the other end of our production process the output is an XForms document plus an XML Schema document. This pair of documents completely represents the electronic form, and can be used as input to a form engine like the one used in our Altinn solution.
The interoperability is maintained at two levels. At the technical level we use XML as a file format. This allows any system to read the file. To understand the file the computer has to know how to interpret the information. Our information model is expected to cover this need.
Hence one can say that the TOR system improves the issues around MSS, taking into account both the need for open standards and interoperability.
[1] EU IDAbc Programme: "European Interoperability Framework for Pan-European eGovernment Services", IDA Working document ‐ Version 4.2 ‐ January 2004
[2] Wache, H. et al: "Ontology-Based Integration of Information - A Survey of Existing Approaches", International Joint Conference on Artificial Intelligence (IJCAI-01), Seattle USA, 2001
[3] Object Management Group Inc.: "Unified Modeling Language Specification", v. 1.5, March 2003, 736p. (http://www.omg.org/docs/formal/03-03-01.pdf)
[4] UN/CEFACT: "ebXML Technical Architecture Specification v1.0.4", 16 February 2001 (http://www.unece.org/cefact/ebxml/Documents/ebTA.pdf)
[5] W3C: "XForms 1.0 ‐ W3C Recommendation 14 October 2003" (http://www.w3.org/TR/xforms/)
![]() ![]() |
Design & Development by deepX Ltd. |