Abstract
This presentation shows how Topic Map based solutions are used to build, organize and maintain Kodak digital cameras accessories web site. The chosen approach did not require software investment. Excel, an available and familiar spreadsheet software was used as an affordable and easy to use Topic Map GUI editor and repository.
Excel worksheets represent different plains of relationships that are completely independent until the processing begins. Transformation flow is controlled by special instructions set within Excel workbook. At first individual topic map documents are generated by transforming applicable worksheets and by spidering a corpus of external XML content files. Latter in the flow these topic maps are merged based on topic naming constraint or subject merging rule. All processing is done with XSLT scripts. And finally, XSLT based framework presented earlier by one of the authors at Extreme 2000 conference[Building Dynamic Web Sites with Topic Maps and XSLT] was adopted to generate JHTML server pages from topic map source code.
It was very easy to implement new requirements and add new planes of relationships. Because the framework is Topic Map based it was very easy to satisfy new requirements and create XML extracts for loading some of relationships and data into an internal RDBMS system. We also show how Topic Map Ontology schema was used to calculate the difference between two sequential Topic Map documents. An XSLT style sheet creates a different topic map given a 'old' topic map and a 'new' topic map. This difference topic map controls how updates are posted to the web site and internal RDBMS system.
Table of Contents
The project started with a simple requirement to provide typical selling information relationships about Kodak's digital camera accessories[Kodak Digital Cameras and Accessories web site.]. Excel was already used internally to keep lookup information. That information was used for manual update of pages and as number of objects and relationships between them was increasing it was turning into a very expensive process. As relationships between products, accessories, languages and selling regions increased, maintaining them in-line within content became time consuming and cumbersome. Hence came the requirement to keep relationships, classification and other metadata separate from the content. This called for a Topic Maps based architecture.
In particular the following requirements needed to be met with no increase in labor and yet provide the best product information on the web for all customers:
The number of available accessories was increasing, manual processing was becoming expensive.
The number of languages the product information was presented in was to be increased.
The information content varied across several selling regions around the world.
The relationships among various products were getting more sophisticated. For example we needed to represent
N-ary relationships between cameras, lenses, lens adapters and step rings.
Relationships between chargers and plug styles.
Non-hierarchical classification of accessories.
Relationships between cameras and their compatible and recommended accessories.
Relationships between all digital products and geographical regions of the world.
Dependency of the maximum number of images that could be stored on a memory card as a function of the card, the camera, and the quality setting on the camera.
Etc.
Topic Maps were chosen to describe relationships between products. Excel was chosen as the UI front end, which was a natural choice because it was a standard on all desktops and did not require any learning curve. It was easy to explain to administrative personnel how to manage simple relationships with tools they were already familiar with and willing to use. The combination of Topic Maps paradigm and Excel lead to the architecture that we now call Tabular Topic Maps. XSLT was chosen for Topic Map generation and for producing final output. A special batch processing language was created to control information processing. Excel macros were used to extract XML from spreadsheets, spider external contents and file system, execute shell commands, XSLT transformations, mirroring and FTP uploads.
This became the basis of a successful project. We started with the workbook design that was in current use by administrative personnel. The Excel Workbook had additional sheets added as we moved to Topic Maps concepts and new requirements were introduced. However the framework that we had developed provided for user friendly, well structured and self explanatory user interface. Gradually the project had triumphantly developed into a large content management challenge.
The project has served as a great learning tool to adopt Topic Map thinking style and to put diverse object relationships together to create a web site with high navigation but low maintenance. Currently multilingual information and relationships between digital cameras, accessories and selling regions yield to over 200,000 unique combinations on the Kodak.com web site.
In addition the following considerations were taken into account when making design decisions:
Spreadsheets should be human readable.
Spreadsheets should be sufficiently encapsulated and defined so that different departments could work on their own tasks independently.
There should be an easy way to add new types of relationships without need to change existing data and processing flow.
Information should be mergeable.
There are should be a well defined process that turns merged knowledge into a web site.
Other tools used were:
Microsoft Excel
Microstar Near & Far to design a DTD
Softquad XMetal for content creation
Documentum for content management
Altova XMLSpy for XSLT editing
Saxon and MSXML4 for XSLT
Dynamo5 Java Application Server
Web Site visitors would usually start at http://www.kodak.com/eknec/PageQuerier.jhtml?pq-path=9/35&pq-locale=en_GB
Visitors may select his country and language. They may also select a camera or docking station or category of accessory and proceed to the display of recommended and compatible accessories for the choices that are available for sale in their country:
For example, choosing of a camera brings them to this Table Of Contents (TOC) for the selected camera.
Selecting an accessory brings visitor to a page that describes accessory's properties applicable in the scope of visitors country and digital camera.
Different departments manage different planes of relationships completely independently. Worksheets are used to manage relationships between topics, and to store related information objects such as text strings, images, and links to external resources. As new text occurrences are added a special texts worksheet is sent out for localization in many languages.
When the processing starts the system goes through the following steps:
Planes of relationships are turned into topic maps.
Content expressed in XML is indexed into a topic map following the same ontology rules.
Spidering instructions are turned into <mergeMap> constructs in the main topic map document.
All Topic Maps are merged and normalized.
Cogitative Topic Map Websites (CTW)[Cogitative Topic Maps Websites] framework approach is applied to the generated Topic Map source code to create desired output.
One output process generates collection of Dynamo JHTML pages constituting Kodak Digital Cameras and Accessories web site.
The other process generates a set of XML documents as an input for an new Oracle based web portal system.
At last one more process generates a trimmed XML dataset for dynamicScalable Vector Graphics (SVG)[Scalable Vector Graphics] visualization and navigation of Topic Map relationships.
This provides a powerful process to assist content creators with a visual view of the relationships between knowledge objects. These relationships may be used in various output presentations.
While Topic Map Constraint Language is not yet fully formulated and accordingly no tools are available we created an Ontology Language inspired by the Web Ontology Language (OWL)[Web Ontology Language (OWL)]. XSLT was used to turn Ontology into a validating style sheet. the validating style sheet applied to a topic map generates a report displaying warnings and inconsistency errors.
This allowed us to track such inconsistencies as missing occurrences of required types, labels, text translations or invalid relationships between topics of certain types.
The generated report is used for error correction. Some of the errors that we are able to catch are very tedious and normally shows up only upon reviewing a corpus of generated web pages. Automate error tracking provides a higher level of quality content for the real time page creation process.
To minimize the number of generated files to be pushed to production system we applied topic maps diff calculation.
Here we had encountered some very interesting challenges. Below we review the approach chosen:
In the Topic Map construction we rely on the notion of subject identifiers. We decided to use subject identifiers as primary keys when comparing topics. Note that all topics in our topic map have at least one subject identifier or a subject address.
After all inserted and deleted topics are determined and marked we move on to compare topic base names and occurrences. Here we rely on composite primary keys consisting of:
One or more scoping themes in the case of base names.
Occurrence type and one or more scoping themes in the case of occurrences.
Next we move on to comparing associations. We had to identify "master" and "slave" role players for each association class. Next, for each association class, a set of scoping themes and topics playing "master" roles were used as composite primary key.
After all inserted and deleted associations are determined and marked based on the above criteria we move on to compare "slave" roles and identify inserted and deleted association members and individual role players.
The output of this process is a quasi topic map: a temporary construct with "inserted" and "deleted" modifier attributes inserted in the appropriate places. This intermediate document may then be used to drive the creation of output files.
To determine that content files have changed we use a checksum procedure. Content files in the topic map are represented as content occurrences. During the spidering step the checksum value of each file is stored as an occurrence of the topic constituted by the content file.
Hear we show our framework's conventions for representing vocabularies, topic characteristics (occurrences, associations and base names), characteristic scopes and other Topic Map constructs.
Language Scopes
Additional Scopes
As the project progresses various vocabularies and Published Subjects were developed to codify the objects and relationships managed. Some of these are managed and updated by existing internal systems and are made available as XML files. This provides the opportunity for scaling from a small project to projects that integrate easily within a large corporation.
Neutral syntax, hospitable to new types of relationships.
Tabular UI familiar to most of the administrative personnel.
Sheets are independent and yet provide a clear interface for managing.
Common sets of interconnected information objects.
Requires one to think and organize content in terms of topics (products, accessories, regions, languages), vocabularies, associations, roles and contexts.
Allows automated validation of relationships based a given ontology.
Using diff algorithm can minimize sequential production uploads.
![]() ![]() |
Design & Development by deepX Ltd. |