Abstract
XML can leverage the efforts of a small training company into the appearance of being a large training company. This case study presentation overviews the application of XML technology in training development at Crane Softwrights Ltd., a husband and wife team working from home in a small farming village near Ottawa Canada. Also included are sections on the XML training marketplace as seen by Crane, and many "lessons learned" venturing out into an international industry as an independent small company.
Keywords
Table of Contents
Crane Softwrights Ltd. has many opportunities to exploit a single investment in IP in training material
instructor-led training
projection materials
handouts
electronically published books
PDF files available for sale
more content than just the instructor-led material (though same format)
real-time audio-over-IP lecturing
accommodates manifests in custom format for virtual classroom delivery software
web- and CD-based self-paced training
accommodates assessment features of delivery software
branded licensing to third parties
many people have training needs but no time to develop materials
commercial training organizations
internal corporate training needs
All training material is authored in XML
currently using XML Document Type Definitions (DTD) for modeling
moving to Regular Language for XML (RELAX-NG)
XSLT/XPath for transformation to HTML/CSS for projection
XSLT/XPath for transformation to delivery software manifest files
XSLT/XPath for transformation to accessible format
XSLT/XPath for transformation to XSLFO for all print images
historically based on SGML and DSSSL
Entire process can be separated into three distinct phases
authoring the content
producing the configuration desired from the content
publishing the configuration as required for delivery
Leverage achieved through many means
content sharing between separately authored courses
write once, use many times
content sharing between configurations of the same course
parallel content intermixed with shared content
publishing content to different targets
differing needs for the same content
e.g. projection, print dimensions, etc.
branding content for different markets and customers
different licensees use same content with different appearances
Simple hierarchical structure
overview
introduction
frame(s)
pane(s)
module(s)
introduction
frame(s)
pane(s)
lesson(s)
frame(s)
pane(s)
assessment
conclusion
frame(s)
pane(s)
assessment
conclusion
frame(s)
pane(s)
assessment
Decided not to use an existing model like DocBook
need the identifying labels for the semantic concepts of training
Cannot consider models like EML (Education Markup Language)
http://eml.ou.nl http://eml.ou.nl
pseudo-proprietary document model requiring non-disclosure and non-compete agreements
Our own evolved model is considered proprietary
five years of investment in features and functionality
if not released to others, no need to support it when one is too busy using it!
Consider the need for granularity in authoring
Numerous XML parsed entities linked through general entity references
approximately separated at the lesson level for authored content
no need to bring entire course in to editor just to change a single paragraph
XSLT/XPath course has 53 separately authored XML files of content
arbitrary use for generated content
XSLFO course has hundreds of generated XML files of content in addition to authored content
synthesized from XSLT/XPath processing of XSLFO Recommendation XML
A wealth of content already lives in other XML files
content can be extracted into a suitable form for training purposes
be careful of IPR of the source of information
Can generate suitable content for referencing through general entities
may require that a parameter entity pointing to the general entities be synthesized as well
Can post-process extracted content
XML parsed entities inappropriate for document fragment sharing
intuitive approach that works just fine for small, simple situations
catastrophic breakdown when used with larger fragments in many different contexts
the parsing context of the fragment is defined by the including document
in particular, the set of general entity declarations
very susceptible to changes in the parsing context dictated by other presentations
XSLT/XPath implements content sharing through use of the document() function
each document self contained with own parsing context
not susceptible to changes in parsing context of other documents
every frame has unique XML ID
frame is shared by indicating the identifier of the frame in another presentation
assembled into the configuration
Important lessons learned from aircraft maintenance manuals
every aircraft is different yet authored content must accommodate all
many training course configurations are different and leverage is achieved when accommodating all in single set of sources
Semantics of applicability are entirely arbitrary
the applicability functionality is implemented indirect through authored collection of distinctions
no single concept is built in to the production environment
common applicability across training courses achieved through a shared parsed entity
entity has no reliance on the parsing context by having no external general entities
parameter entities in a configuration entity file triggers common combinations of applicability specifications
Able to mark content as applicable to a particular configuration
unmarked content applicable to all configurations
Able to mark content as applicable to a particular audience
a "compressed" applicability is used to elide extraneous detail in a presentation
Simple logical operators for combining applicability specifications
"and" of all of a collection of specifications
"or" of all of a collection of specifications
Implemented using ID/IDREF/IDREFS
simple interpretation using XSLT/XPath
simple declaration through ID attributes
simple specification through IDREF/IDREFS attributes
Not extended to distinction between prose/bulleted/accessible content
parallel content based on presentation or target publication
configuration applicability brings in parallel content for distinction at publishing time
Many needs for images in technical training
vector-based line drawings
pixel-based screen shots
photographs
Biggest publishing problem
commercial XSLFO engines do not share any vector format
using WMF for Windows-based tools
using EPS for Java-based tools
commercial XSLFO engines do not share any lossless pixel format
using BMP for Windows-based tools
using GIF for java-based tools
JPEG format only appropriate for photographs
Authoring requires manual maintenance of multiple formats
haven't found reliable vector conversion tool
Anxiously awaiting SVG support in commercial drawing tools and publishing tools
legacy of vector-based images need to be converted
Particular challenge for content authored with multiple publishing formats
looking at any one final rendition only shows results for one output format
difficult to ensure parallel content for all published formats of a given configuration are properly synchronized
A "review" rendition interleaves all publishing formats for a given configuration into a single HTML result
projection content
handout content
prose content
accessible content
Objective is to create publishable content from authored content:
configure the needs for a given use
assemble all sources of content
reduce content to only what is applicable for the given configuration
optimize content for downstream publishing purposes
Appropriate environment modules are brought in through parameter and general entities
course material components
graphic image URIs
common graphics for user interface
language boilerplate
applicability framework and applicability triggers
licensee or customer branding information
Note that configuration information added to assembly from parameter and general entities
downstream processes do not need to reference configuration information through entities
Appropriate shared content from other presentations brought in through assembly
presentations identified through unparsed entity references
frames identified through unique ID attribute values
content incorporated using XSLT document() function
Assembly stylesheet references configuration information through entities
structure accommodates copies of all configuration information
downstream processes act only on assembled information
Completed assembly has more information than is required for particular configuration
parallel prose and bulleted information
parallel applicable information for all configurations
The assembled content is reduced to the effective content for the given configuration
inapplicable content is not preserved
container structures for applying applicability are flattened
The effective instance has no memory of applicability determination
applicability boundaries and labels are removed from content
Performance of publishing tasks based on XSLT can be impacted by needs
certain "looking backwards" addressing in XPath can be slow
a well-hyperlinked corpus has a lot of visible and background content
Optimization duplicates ancestral information in an element's attributes
ancestral information is passed in parameters during <xsl:apply-templates>
all elements are preserved and then supplemented with information in passed parameters
the DTD is parameterized using a #FIXED attribute to trigger optimization
saves changing optimizing code for knowledge of which elements are to be optimized
Net publishing performance benefit approaching 50% with only about a half-dozen cues
publishing tasks can find ancestral information without "leaving" the element
investment in writing optimization code paid off quickly
Optimized file is a lot larger than un-optimized file
duplicated information in most elements
doesn't impact on net time but does occupy more system resources
Optimized file is checked for validity against the document model
assembled components are checked for well-formedness by XSLT process
raw assembled content may violate validity checks
duplicate ID attribute values violating XML validity
parallel content violating document model validity
final result of production is the first file validated against the course document model
Early requirements were modest
very few invocation parameters
easy to duplicate common tool invocation scripts
early implementations in MSDOS batch language
Flexibility in orchestration was needed to meet developing needs
licensee requests for new delivery package contents
easier parameterization of tool invocation options
support for introduced phases for optimization
All orchestration converted to Python
very powerful and expressive
easy to implement convenience features unanticipated in original design
powerful scripting language allows for easy parameterization of tool invocation command lines
Archived optimized configured effective instance is the source of all published results
contains everything necessary for all publishing tasks
can re-run any or all publishing tasks with archived file without needing original sources
Many kinds of published results necessary for a presentation
manifest files for software applications
handouts for students
projection materials for instructor
accessible rendition designed for screen reader software
Computer-based Training (CBT) software typically has its own manifest requirements
lists of frames and relationships
software-orchestrated packaging or delivery logic
most are text-based, some are XML-based
New XML-based standards being developed in this area
Instructional Management Systems (IMS)
http://www.imsproject.org/ http://www.imsproject.org/
Sharable Content Object Reference Model (SCORM)
http://www.adlnet.org/Scorm/scorm_index.cfm http://www.adlnet.org/Scorm/scorm_index.cfm
Two uses by Crane Softwrights Ltd.
web-based and CD-based self-paced learning
real-time audio-over-IP lecturing
XSLT can produce simple text without escaping sensitive markup characters
a small investment in a stylesheet allowed entire content to be leveraged in a new platform
automated processes ensure content easily kept up-to-date with masters
Different dimensions needed for different audiences
US-letter in North America
A4 in the rest of the world
Different layouts for different users
single-sided or double-sided with differing footers
parameter trick with page master names allows one stylesheet for both uses:
create a page sequence alternating odd/even between two master names
make both master names the same for a single-sided presentation
instructor content presented differently for handouts
no distinction in content for book form
One-up page images through parameterized XSLFO stylesheet
PDF manipulation for 2-up and cut/stacked arrangements using Quite Imposing
http://www.quite.com http://www.quite.com
XSLFO cannot produce 2-up results with differing page numbers in each page image on a single page result
All pages hyperlinked at three levels
previous/next frame
previous/current/next lesson
previous/current/next module
Random access at two levels
all modules in course
all lessons in current module
Two parameterized passes to produce two renditions from one XSLT stylesheet
bulleted content for speaker to teach from
handout content for reference if required
Content development grows beyond what can be taught in an instructor-led scenario
often have more detail than can go through in the time allotted for instruction
techniques and practices evolve and tips are shared through public forums
technology changes and old content is quickly out of date and not useful
Can sell a book form and offer no-charge updates when there are no distribution costs
announcements pushed out to customers by email
customers pull content by web-based delivery
password protection changes every week
one-time fee for each of three different uses by the customer
single-user license
single geographical site staff license
world-wide staff license
relies on the honor system and the honesty of the customer to not proliferate copies
Commercial paper rendition created by sending XML of configured content to editor
the editor fixed grammar by editing his copy of the configured XML
used XSLFO to produce PDF page masters sent to Prentice Hall for reproduction
Should be easy to obtain an ISBN publisher's prefix from your country's representative
need an estimate of how many publications you anticipate producing
some outstanding questions regarding uniqueness of ISBN's and renditions in an electronic world
Crane has decided to label each edition's configured instance with an ISBN and reuse that ISBN on all 11 renditions
National Library of Canada is the custodian for Canadian publishers
Typical electronic presentations are not suited to aural screen readers
differing font sizes and faces
differing indents require the user to hunt down information at undetermined locations on each line
Graphics are not useful unless described
content model provides for a narrative description of each graphic
rendered only
A monospaced presentation can be predictably navigated
removing indentation makes information easy to find
necessary to still indicate the indentation depth in order to follow the flow of the information
Not everyone is comfortable with an electronic presentation of information
paper presentation still the easiest for most customers
some customers take electronic presentation to copy house to make a paper presentation
periodic updates in content make this expensive to repeat very often
Making an excerpt available for free helps to sell the entire book
nature of the presentation
bulleted presentation
utility of the hyperlinking
overview of the content
introduction of every module is included in the excerpt
The excerpt can be leveraged in other venues at no charge
the content is already freely available, may as well use it anywhere you can
Many organizations need to teach content but cannot develop content themselves
difficult to keep up with changing technologies and techniques
configurability of content meets differing needs of licensees
Public and private training needs
public courses for anyone to attend
private corporate courses for other companies
in-house corporate use
Should offer "train-the-trainer" opportunities for licensee's instructors
easy to have them attend a public class while already being equipped with all the materials
proper to charge full attendance fees to prevent using licensing as an excuse for free training
Treat your own company as your own licensee
all of the Crane branding is done as "the host company"
change the host company graphics for a licensee
A licensee's customer may feel more ownership when branded specifically for them
customer name and logo
date of delivery
An XML-based environment supports just-in-time publishing
the licensee doesn't inventory only one stock presentation
the latest rendition ensures the most up-to-date content
Protects a given configuration from being used for another customer
Configurations delivered to customer via password-protected web address
pull model ensures customer obtains content at their convenience
not just pushed at them by email
leaving a copy on the web site provides disaster recovery
instructor can obtain content again if machine crashes at or on the way to the customer's site
delivery done from any web browser
copy house can obtain content without using media
Customer information on public deliveries aggregated automatically using XSLT
customer maintains a private web page of XML according to Crane's document model
Recreating master schedule pulls in content from all licensees
schedule page regenerated every Monday or manually on request from the licensee
Standards are "black and white"
definitive Recommendation documents
not always perfect, but sufficient for training purposes
no interpretation of vendor differences
safe to keep to Recommendation limitations
important to discuss how extensions operate
not important to discuss particular extensions from any vendor
Students can transfer Recommendation knowledge to different vendors
presumably the reason they are embracing standards
Success of training programme not dependent on a vendor
no non-disclosure problems
no hidden release schedules with unknown feature lists
no defense for non-conformance or buggy code
An electronic publishing model has many benefits to training organization
no inventory of product
sales are made by the customer downloading
protection of downloading password through weekly changes
easy resale through licensees or any interested party
reseller provides collection service for discount in cost
may choose to pass on discount to customers
revised content can be made available to customers
technology always changing or new techniques and practices being developed
Some value in "practicing what you preach"
some may consider it heretical to teach XML using proprietary presentation technologies
maintaining HTML masters makes leverage far more difficult than maintaining XML masters
An electronic publishing model has many benefits to the customer
perpetually available no-charge updates to new editions of the material
XML technologies evolve
techniques and common understanding improve content of material
site license and world license sales model meets needs of larger organization
all customer's staff have access to copy of the material
not applicable to the customer's customers
content is electronically searchable
paper copies can be made if desired
tools available to create stackable and bindable renditions
binding services available at copy centers
content is hyperlinked both internally and externally
jumping elsewhere in the document
bringing up a browser with a web document
Caveat: not many people trust the electronic publishing model
we can't give up our "day jobs" because of low sales
Caveat: relies on the honor system
nothing preventing customer from proliferating copies
trust companies to pay for site and world licenses rather than buying and posting a single-user license