XML Europe 2002 logo

Leveraging Intellectual Property Using XML in a Small Training Business

Abstract

XML can leverage the efforts of a small training company into the appearance of being a large training company. This case study presentation overviews the application of XML technology in training development at Crane Softwrights Ltd., a husband and wife team working from home in a small farming village near Ottawa Canada. Also included are sections on the XML training marketplace as seen by Crane, and many "lessons learned" venturing out into an international industry as an independent small company.

Keywords


Table of Contents

1. Leveraging intellectual property using XML
1.1. Leverage
1.2. Training material model
1.2.1. Training material model
1.3. Authoring
1.3.1. Authoring/Generating
1.3.2. Generating content
1.3.3. Sharing content
1.3.4. Applicability
1.3.5. Images
1.3.6. Review
1.3.7. Source code control
1.4. Production
1.4.1. Production
1.4.2. Configuration
1.4.3. Assembly
1.4.4. Effectivity
1.4.5. Optimization
1.4.6. Orchestration
1.5. Publishing
1.5.1. Publishing
1.5.2. Manifests
1.5.3. Handouts
1.5.4. Projection
1.5.5. Books (electronic and paper)
1.5.6. Accessible content
1.5.7. Excerpts
1.6. Licensing
1.6.1. Licensing
1.6.2. Branding
1.6.3. Delivery
1.6.4. Aggregation
2. The XML training opportunity
2.1. The XML training opportunity
2.1.1. Sticking to standards
2.1.2. Electronic publishing model
3. Lessons learned running a small training business
3.1. A small business
3.1.1. History
3.1.2. Independence
3.1.3. Family
3.2. Legal
3.2.1. Legal Issues
3.2.2. Taxes
3.2.3. Immigration
3.2.4. Intellectual property
3.3. Money
3.3.1. Money
3.3.2. Rates
3.3.3. Invoicing
3.4. Marketing
3.4.1. Web site
3.4.2. Volunteerism
3.5. Miscellany
3.5.1. Travel
3.5.2. Timesheets
3.5.3. Events
Biography

1. Leveraging intellectual property using XML

1.1. Leverage

Crane Softwrights Ltd. has many opportunities to exploit a single investment in IP in training material

  • instructor-led training

    • projection materials

    • handouts

  • electronically published books

    • PDF files available for sale

    • more content than just the instructor-led material (though same format)

  • real-time audio-over-IP lecturing

    • accommodates manifests in custom format for virtual classroom delivery software

  • web- and CD-based self-paced training

    • accommodates assessment features of delivery software

  • branded licensing to third parties

    • many people have training needs but no time to develop materials

    • commercial training organizations

    • internal corporate training needs

All training material is authored in XML

  • currently using XML Document Type Definitions (DTD) for modeling

    • moving to Regular Language for XML (RELAX-NG)

  • XSLT/XPath for transformation to HTML/CSS for projection

  • XSLT/XPath for transformation to delivery software manifest files

  • XSLT/XPath for transformation to accessible format

  • XSLT/XPath for transformation to XSLFO for all print images

  • historically based on SGML and DSSSL

Entire process can be separated into three distinct phases

  • authoring the content

  • producing the configuration desired from the content

  • publishing the configuration as required for delivery

Figure 1.

click image for full size view

Overview of entire process

Leverage achieved through many means

  • content sharing between separately authored courses

    • write once, use many times

  • content sharing between configurations of the same course

    • parallel content intermixed with shared content

  • publishing content to different targets

    • differing needs for the same content

      • e.g. projection, print dimensions, etc.

  • branding content for different markets and customers

    • different licensees use same content with different appearances

1.2. Training material model

1.2.1. Training material model

Simple hierarchical structure

  • overview

  • introduction

    • frame(s)

      • pane(s)

  • module(s)

    • introduction

      • frame(s)

        • pane(s)

    • lesson(s)

      • frame(s)

        • pane(s)

      • assessment

    • conclusion

      • frame(s)

        • pane(s)

    • assessment

  • conclusion

    • frame(s)

      • pane(s)

  • assessment

Decided not to use an existing model like DocBook

  • need the identifying labels for the semantic concepts of training

Cannot consider models like EML (Education Markup Language)

  • http://eml.ou.nl http://eml.ou.nl

  • pseudo-proprietary document model requiring non-disclosure and non-compete agreements

Our own evolved model is considered proprietary

  • five years of investment in features and functionality

  • if not released to others, no need to support it when one is too busy using it!

1.3. Authoring

1.3.1. Authoring/Generating

Consider the need for granularity in authoring

Figure 2.

click image for full size view

Overview of authoring process

Numerous XML parsed entities linked through general entity references

  • approximately separated at the lesson level for authored content

    • no need to bring entire course in to editor just to change a single paragraph

    • XSLT/XPath course has 53 separately authored XML files of content

  • arbitrary use for generated content

    • XSLFO course has hundreds of generated XML files of content in addition to authored content

    • synthesized from XSLT/XPath processing of XSLFO Recommendation XML

1.3.2. Generating content

A wealth of content already lives in other XML files

  • content can be extracted into a suitable form for training purposes

  • be careful of IPR of the source of information

Can generate suitable content for referencing through general entities

  • may require that a parameter entity pointing to the general entities be synthesized as well

Can post-process extracted content

1.3.3. Sharing content

XML parsed entities inappropriate for document fragment sharing

  • intuitive approach that works just fine for small, simple situations

    • catastrophic breakdown when used with larger fragments in many different contexts

  • the parsing context of the fragment is defined by the including document

    • in particular, the set of general entity declarations

  • very susceptible to changes in the parsing context dictated by other presentations

XSLT/XPath implements content sharing through use of the document() function

  • each document self contained with own parsing context

  • not susceptible to changes in parsing context of other documents

  • every frame has unique XML ID

    • frame is shared by indicating the identifier of the frame in another presentation

      • assembled into the configuration

1.3.4. Applicability

Important lessons learned from aircraft maintenance manuals

  • every aircraft is different yet authored content must accommodate all

  • many training course configurations are different and leverage is achieved when accommodating all in single set of sources

Semantics of applicability are entirely arbitrary

  • the applicability functionality is implemented indirect through authored collection of distinctions

  • no single concept is built in to the production environment

  • common applicability across training courses achieved through a shared parsed entity

    • entity has no reliance on the parsing context by having no external general entities

    • parameter entities in a configuration entity file triggers common combinations of applicability specifications

Able to mark content as applicable to a particular configuration

  • unmarked content applicable to all configurations

Able to mark content as applicable to a particular audience

  • a "compressed" applicability is used to elide extraneous detail in a presentation

Simple logical operators for combining applicability specifications

  • "and" of all of a collection of specifications

  • "or" of all of a collection of specifications

Implemented using ID/IDREF/IDREFS

  • simple interpretation using XSLT/XPath

  • simple declaration through ID attributes

  • simple specification through IDREF/IDREFS attributes

Not extended to distinction between prose/bulleted/accessible content

  • parallel content based on presentation or target publication

  • configuration applicability brings in parallel content for distinction at publishing time

1.3.5. Images

Many needs for images in technical training

  • vector-based line drawings

  • pixel-based screen shots

  • photographs

Biggest publishing problem

  • commercial XSLFO engines do not share any vector format

    • using WMF for Windows-based tools

    • using EPS for Java-based tools

  • commercial XSLFO engines do not share any lossless pixel format

    • using BMP for Windows-based tools

    • using GIF for java-based tools

  • JPEG format only appropriate for photographs

Authoring requires manual maintenance of multiple formats

  • haven't found reliable vector conversion tool

Anxiously awaiting SVG support in commercial drawing tools and publishing tools

  • legacy of vector-based images need to be converted

1.3.6. Review

Particular challenge for content authored with multiple publishing formats

  • looking at any one final rendition only shows results for one output format

  • difficult to ensure parallel content for all published formats of a given configuration are properly synchronized

A "review" rendition interleaves all publishing formats for a given configuration into a single HTML result

  • projection content

  • handout content

  • prose content

  • accessible content

1.3.7. Source code control

Maintaining many source files in a source code control system important

  • even if only one author creating all of the content

  • maintenance made easy by reviewing progression in changes

1.4. Production

1.4.1. Production

Objective is to create publishable content from authored content:

  • configure the needs for a given use

  • assemble all sources of content

  • reduce content to only what is applicable for the given configuration

  • optimize content for downstream publishing purposes

Figure 3.

click image for full size view

Overview of production process

1.4.2. Configuration

Appropriate environment modules are brought in through parameter and general entities

  • course material components

  • graphic image URIs

  • common graphics for user interface

  • language boilerplate

  • applicability framework and applicability triggers

  • licensee or customer branding information

Note that configuration information added to assembly from parameter and general entities

  • downstream processes do not need to reference configuration information through entities

1.4.3. Assembly

Appropriate shared content from other presentations brought in through assembly

  • presentations identified through unparsed entity references

  • frames identified through unique ID attribute values

  • content incorporated using XSLT document() function

Assembly stylesheet references configuration information through entities

  • structure accommodates copies of all configuration information

  • downstream processes act only on assembled information

Completed assembly has more information than is required for particular configuration

  • parallel prose and bulleted information

  • parallel applicable information for all configurations

1.4.4. Effectivity

The assembled content is reduced to the effective content for the given configuration

  • inapplicable content is not preserved

  • container structures for applying applicability are flattened

The effective instance has no memory of applicability determination

  • applicability boundaries and labels are removed from content

1.4.5. Optimization

Performance of publishing tasks based on XSLT can be impacted by needs

  • certain "looking backwards" addressing in XPath can be slow

  • a well-hyperlinked corpus has a lot of visible and background content

Optimization duplicates ancestral information in an element's attributes

  • ancestral information is passed in parameters during <xsl:apply-templates>

  • all elements are preserved and then supplemented with information in passed parameters

  • the DTD is parameterized using a #FIXED attribute to trigger optimization

    • saves changing optimizing code for knowledge of which elements are to be optimized

Net publishing performance benefit approaching 50% with only about a half-dozen cues

  • publishing tasks can find ancestral information without "leaving" the element

  • investment in writing optimization code paid off quickly

Optimized file is a lot larger than un-optimized file

  • duplicated information in most elements

  • doesn't impact on net time but does occupy more system resources

Optimized file is checked for validity against the document model

  • assembled components are checked for well-formedness by XSLT process

  • raw assembled content may violate validity checks

    • duplicate ID attribute values violating XML validity

    • parallel content violating document model validity

  • final result of production is the first file validated against the course document model

1.4.6. Orchestration

Early requirements were modest

  • very few invocation parameters

  • easy to duplicate common tool invocation scripts

  • early implementations in MSDOS batch language

Flexibility in orchestration was needed to meet developing needs

  • licensee requests for new delivery package contents

    • easier parameterization of tool invocation options

  • support for introduced phases for optimization

All orchestration converted to Python

  • very powerful and expressive

  • easy to implement convenience features unanticipated in original design

  • powerful scripting language allows for easy parameterization of tool invocation command lines

1.5. Publishing

1.5.1. Publishing

Archived optimized configured effective instance is the source of all published results

  • contains everything necessary for all publishing tasks

  • can re-run any or all publishing tasks with archived file without needing original sources

Many kinds of published results necessary for a presentation

  • manifest files for software applications

  • handouts for students

  • projection materials for instructor

  • accessible rendition designed for screen reader software

Figure 4.

click image for full size view

Overview of publishing process

1.5.2. Manifests

Computer-based Training (CBT) software typically has its own manifest requirements

  • lists of frames and relationships

  • software-orchestrated packaging or delivery logic

  • most are text-based, some are XML-based

New XML-based standards being developed in this area

Two uses by Crane Softwrights Ltd.

  • web-based and CD-based self-paced learning

  • real-time audio-over-IP lecturing

XSLT can produce simple text without escaping sensitive markup characters

  • a small investment in a stylesheet allowed entire content to be leveraged in a new platform

  • automated processes ensure content easily kept up-to-date with masters

1.5.3. Handouts

Different dimensions needed for different audiences

  • US-letter in North America

  • A4 in the rest of the world

Different layouts for different users

  • single-sided or double-sided with differing footers

    • parameter trick with page master names allows one stylesheet for both uses:

      • create a page sequence alternating odd/even between two master names

      • make both master names the same for a single-sided presentation

  • instructor content presented differently for handouts

    • no distinction in content for book form

One-up page images through parameterized XSLFO stylesheet

  • PDF manipulation for 2-up and cut/stacked arrangements using Quite Imposing

  • XSLFO cannot produce 2-up results with differing page numbers in each page image on a single page result

1.5.4. Projection

All pages hyperlinked at three levels

  • previous/next frame

  • previous/current/next lesson

  • previous/current/next module

Random access at two levels

  • all modules in course

  • all lessons in current module

Two parameterized passes to produce two renditions from one XSLT stylesheet

  • bulleted content for speaker to teach from

  • handout content for reference if required

1.5.5. Books (electronic and paper)

Content development grows beyond what can be taught in an instructor-led scenario

  • often have more detail than can go through in the time allotted for instruction

  • techniques and practices evolve and tips are shared through public forums

  • technology changes and old content is quickly out of date and not useful

Can sell a book form and offer no-charge updates when there are no distribution costs

  • announcements pushed out to customers by email

  • customers pull content by web-based delivery

  • password protection changes every week

  • one-time fee for each of three different uses by the customer

    • single-user license

    • single geographical site staff license

    • world-wide staff license

  • relies on the honor system and the honesty of the customer to not proliferate copies

Commercial paper rendition created by sending XML of configured content to editor

  • the editor fixed grammar by editing his copy of the configured XML

  • used XSLFO to produce PDF page masters sent to Prentice Hall for reproduction

Should be easy to obtain an ISBN publisher's prefix from your country's representative

  • need an estimate of how many publications you anticipate producing

  • some outstanding questions regarding uniqueness of ISBN's and renditions in an electronic world

    • Crane has decided to label each edition's configured instance with an ISBN and reuse that ISBN on all 11 renditions

  • National Library of Canada is the custodian for Canadian publishers

1.5.6. Accessible content

Typical electronic presentations are not suited to aural screen readers

  • differing font sizes and faces

  • differing indents require the user to hunt down information at undetermined locations on each line

Graphics are not useful unless described

  • content model provides for a narrative description of each graphic

  • rendered only

A monospaced presentation can be predictably navigated

  • removing indentation makes information easy to find

  • necessary to still indicate the indentation depth in order to follow the flow of the information

Figure 5.

click image for full size view

The nesting of list items in an accessible presentation

1.5.7. Excerpts

Not everyone is comfortable with an electronic presentation of information

  • paper presentation still the easiest for most customers

  • some customers take electronic presentation to copy house to make a paper presentation

    • periodic updates in content make this expensive to repeat very often

Making an excerpt available for free helps to sell the entire book

  • nature of the presentation

    • bulleted presentation

    • utility of the hyperlinking

  • overview of the content

    • introduction of every module is included in the excerpt

The excerpt can be leveraged in other venues at no charge

  • the content is already freely available, may as well use it anywhere you can

1.6. Licensing

1.6.1. Licensing

Many organizations need to teach content but cannot develop content themselves

  • difficult to keep up with changing technologies and techniques

  • configurability of content meets differing needs of licensees

Public and private training needs

  • public courses for anyone to attend

  • private corporate courses for other companies

  • in-house corporate use

Should offer "train-the-trainer" opportunities for licensee's instructors

  • easy to have them attend a public class while already being equipped with all the materials

  • proper to charge full attendance fees to prevent using licensing as an excuse for free training

1.6.2. Branding

Treat your own company as your own licensee

  • all of the Crane branding is done as "the host company"

  • change the host company graphics for a licensee

A licensee's customer may feel more ownership when branded specifically for them

  • customer name and logo

  • date of delivery

An XML-based environment supports just-in-time publishing

  • the licensee doesn't inventory only one stock presentation

  • the latest rendition ensures the most up-to-date content

Protects a given configuration from being used for another customer

1.6.3. Delivery

Configurations delivered to customer via password-protected web address

  • pull model ensures customer obtains content at their convenience

    • not just pushed at them by email

  • leaving a copy on the web site provides disaster recovery

    • instructor can obtain content again if machine crashes at or on the way to the customer's site

    • delivery done from any web browser

  • copy house can obtain content without using media

1.6.4. Aggregation

Customer information on public deliveries aggregated automatically using XSLT

  • customer maintains a private web page of XML according to Crane's document model

Recreating master schedule pulls in content from all licensees

  • schedule page regenerated every Monday or manually on request from the licensee

2. The XML training opportunity

2.1. The XML training opportunity

2.1.1. Sticking to standards

Standards are "black and white"

  • definitive Recommendation documents

    • not always perfect, but sufficient for training purposes

  • no interpretation of vendor differences

    • safe to keep to Recommendation limitations

    • important to discuss how extensions operate

    • not important to discuss particular extensions from any vendor

Students can transfer Recommendation knowledge to different vendors

  • presumably the reason they are embracing standards

Success of training programme not dependent on a vendor

  • no non-disclosure problems

  • no hidden release schedules with unknown feature lists

  • no defense for non-conformance or buggy code

2.1.2. Electronic publishing model

An electronic publishing model has many benefits to training organization

  • no inventory of product

    • sales are made by the customer downloading

    • protection of downloading password through weekly changes

  • easy resale through licensees or any interested party

    • reseller provides collection service for discount in cost

      • may choose to pass on discount to customers

  • revised content can be made available to customers

    • technology always changing or new techniques and practices being developed

Some value in "practicing what you preach"

  • some may consider it heretical to teach XML using proprietary presentation technologies

  • maintaining HTML masters makes leverage far more difficult than maintaining XML masters

An electronic publishing model has many benefits to the customer

  • perpetually available no-charge updates to new editions of the material

    • XML technologies evolve

    • techniques and common understanding improve content of material

  • site license and world license sales model meets needs of larger organization

    • all customer's staff have access to copy of the material

    • not applicable to the customer's customers

  • content is electronically searchable

  • paper copies can be made if desired

    • tools available to create stackable and bindable renditions

    • binding services available at copy centers

  • content is hyperlinked both internally and externally

    • jumping elsewhere in the document

    • bringing up a browser with a web document

Caveat: not many people trust the electronic publishing model

  • we can't give up our "day jobs" because of low sales

Caveat: relies on the honor system

  • nothing preventing customer from proliferating copies

  • trust companies to pay for site and world licenses rather than buying and posting a single-user license

3. Lessons learned running a small training business

3.1. A small business

3.1.1. History

Crane Softwrights Ltd. formed April 1997

  • federally incorporated

    • important for some companies to ensure no semblance of employment relationship

  • husband and wife team