Print Production turnaround to XML Content FirstMigrating from print based publishing to the feature-rich flexibility
of structured content.
ABSTRACT
The journal publishing industry is deeply rooted in paper products with the layout performed using graphical layout programs. This market is universally turning towards the internet, which is becoming the content delivery meduim of choice.
In order to fill websites with content, publishers used to start with their print purposed content, which was extracted and converted into mark-up, usually SGML. This would be stored as SGML, and then converted to HTML for the web sites.
Increasingly sophisticated end-users have driven the publication market towards immediate delivery of content to the web. Now that's been expanded to be WAP, PDA, email alerts etc.
The problem facing organisations is both technical and business practice, and in many instances focuses on the workflow. The changes in the workflow that are needed to implement a content management system can change the employment picture at the company at the same time as the technical implementation is posing serious problems. If the technicians can talk to the business side better, there will be fewer problems,
Presentation will give overview of the workflow, process and architecture. The workflow alterations will be explored, as well as approaches to business-side problems. Additionally, we will drill down into technical detail for areas such as XML document conversions, XML data storage in content repository, and metadata enrichment and data retrieval operations.
-
Print Production
-
The starting point is with a publisher, doing a good business, having a lot of circulation.
-
Printed page is physical, technology harks back to Gutenberg.
-
Distribution is physical paper, entitlement management requires human intervention.
-
Built a strong business, and became valuable to business.
-
Print production process
-
Author / Editor/Peer review exchanges, articles created in MSWord, track with CAP workflow, generate pii number
-
SGML Conversion by outside supplier
-
Layout in Quark XPress
-
To print
-
Postal delivery to userh
-
Index
-
merge
-
Upload to staging svr
-
Upload to production svr
-
Upload to Embase, EW, etc.
-
Validate document & Dataset file
-
Business reasons for CMS
-
The nineties come, technology improves for delivery of content, and they build a few websites that have content from their print titles.
-
Readers want content online
-
Then the market gets more experienced, and the technology gets better, and they can do more things with their content.
-
Content archive wanted for searching
-
Facilitates offering integrated set of services
-
Business is complicated
-
Publishers find that they have more content than they realised, and need to marry the existing content to the current content to make a consistent offering.
-
They realise they need to structure the content so that it can be used and repurposed for all these new products and services.
-
They realise that repurposing requires separation of content from presentation.
-
SGML was good, but publishers need to use a content management system that handles XML, and is more flexible for repurposing and reuse.
-
Modern Business
-
Meets needs of clients, does not just give the clients whatever the company wants to give them.
-
Anticipates the wants and needs of clients.
-
Keeps pace with technology to add value to content offerings.
-
Pushes market competition with quality and functionality of offerings.
-
Objectives of CMS
-
Separate commissioning and creation of content from presentation.
-
Rationalise branding
-
Deliver content faster
-
Enable re-use at product level
-
Enable re-use at distribution level
-
Focus editorial resource on services rather than products
-
streamline workflows
-
enable reuse of all content
-
Continuous publication
-
Applies to original research articles
-
Articles will be made available as soon as they are approved for publication
-
Articles will be available for viewing continuously through the Author - Editor workflow
-
Meets the needs of the community and contributor for advanced exposure of research.
-
Content Management
-
Stabilises workflows
-
Separates content from presentation
-
Facilitates reuse - varied products and services
-
Facilitates repurposing and transformation
-
Extended archiving
-
Improved search and retrieval
-
Futureproofing
-
Flexibility
-
Content Administration
-
Content Data tracked from the beginning of the workflow
-
Metatagging is controlled and optimised for the search technology
-
Content Repository structure must be optimised to the data structure
-
Workflows
-
Authoring process both facilitated and controlled
-
Editorial functions made easier and faster
-
Information structure stabilised and maintained
-
Attachments linked and stored in stable way
-
Work product is immediately usable for any product and service
-
Content Acquisition Workflow
-
Tables & Equations
-
Figures
-
Print Issue
-
Accept commission
-
Notify author that corrections are needed
-
All content to disk conversion
-
M/s to admin
-
M/s to copy edit
-
figs
-
Graphics support, figures edited
-
M/s returned to author
-
Editor to check
-
Freelancer to check
-
Author corrections
-
Word text Quark tables separate figures separate equations
-
Proof readers
-
Word text Word tables & equations Figs embedded
-
TJO check Editorial support
-
Www server
-
Peer Review
-
Workflow of the Author - Editor procedures for an original research journal article.
-
Data output transformation
-
Content data is separated from form, and resides in a database.
-
Products and services can be created, repurposing the data as needed.
-
Content data is structured to a very granular level, the content model is very detailed.
-
Data transformations can be tailored to the business needs - business not limited by technology.
-
Data is sent to:
-
other databases
-
Quark for print production
-
pdf for article downloads
-
emails for alert services
-
web services
-
text output services
-
Data intake transformation
-
Data coming into the content management system starts out as MSWord files.
-
These are transformed into XML and validated.
-
Elsevier DTD is of a complex and granular nature.
-
Transformed using object-oriented programming functions such as C++ and Perl.
-
The output is written in ASCII text format in the XML structure
-
Content is returned into the CMS
-
Tracked to the data repository
-
Stored there in a combined table/BLOB database.
-
CMS Processing system architecture
-
Author/Editor Workflow
-
Final Approval in MSWord
-
XML transformation
-
validation
-
Template validation
-
Log
-
Track
-
Manage
-
XML Content repository
-
Content Management system processing events.
-
MS Word Docs In
-
commission
-
Peer review
-
Errors
-
validating
-
Content Tracking
-
Content tracked from entry into system, until it's delivered to the end user
-
Tracking to integrate with external tracking system.
-
Assignment of article identification numbers to be picked up from external tracking system.
-
Content Logging
-
Logging incorporates version control
-
Starts with the acceptance of the article for publication
-
Allows for content stability during the Author - Editor workflow
-
Ends when the article gets final approval
-
Content output repurposing
-
Content will be sent to a variety of products and services.
-
New products and services will be brought on line as the market's needs evolve
-
Content output repurposing
-
Content will be sent to a variety of products and services.
-
New products and services will be brought on line as the market's needs evolve
-
Content Repository
-
holding as XML
-
ESBD
-
eMail alerts
-
Website feeds as XML
-
Quark Xpress
-
pdf
-
Wap, PDA, etc.
-
Data transforms
-
Data output
-
Data management for the various products and services that Elsevier Science makes available
-
Data access may be done by database calls in SQL or XQuery, which return data to a staging area.
-
Staging areas unique for each product.
-
Various products are quite different, the form and structure of the data will be different.
-
The data for each product goes through different transformations.
-
Static or Dynamic Content?
-
Static content service is simpler and less expensive.
-
Lowered costs of implementation, licensing, maintenance and administration
-
Dynamic content service allows for personalisation and momentary updating
-
Greater flexibility and ability to tailor page service to user needs
-
Greater market focus can increase financial returns
-
Static Content Service
-
Content is managed through workflow to content repository
-
Web page templates on staging server pull content into layout page
-
Web page full of content is published to the web server as flat pages
-
Always the same page served
-
No information about user is incorporated into the page that's served.
-
Content models and conversions
-
Emails require both mark-up and text feeds, narrative content stripped of citations and linkages.
-
pdf files will be full content, including citations, images and tables.
-
Database feeds will be full content, converted to SGML for backwards compatibility in first phase, with XML alone used in later versions.
-
Web feed will take full feed, and will convert to HTML for loading onto web sites.
-
WAP feed will take narrative content and tables, with citations.
-
Supplements will be XML alone, which is as they start out.
-
Static Content architecture
-
Content repository
-
Stored pages
-
CMS
-
Web page templates
-
Web user
-
Web service
-
Web server
-
Staging server
-
Dynamically Served Content
-
Content is managed through workflow to content repository
-
Content is mirrored to production repository
-
Web page template on production server awaits page call to be filled and served
-
Different page served - depends on current content, and on profile of user
-
Dynamic service model
-
Personalisation possible, based on login, cookie or IP address.
-
Profiles kept in central repository enable continuous access to all varieties of content.
-
Usage tracking facilitated, and it makes targeted marketing easier as well.
-
Requires more machinery, administration and connectivity.
-
Requires more software licenses.
-
Has built-in redundancy aspect - the staging server is a backup for production.
-
Dynamic Content architecture
-
CMS
-
Staging content repository
-
Production content repository
-
Template & content merge
-
Web service
-
Web user
-
Staging server
-
Production server
-
Content mirrored

