XML-centric workflow offers benefits to scholarly publishers
Track: Case Studies, Publishing, Storing XML
Audience Level: High Level/Technical View
Time: Thursday, November 18 at 09:00
Keywords: Application Architecture, Case Studies, Content Repurposing, Data Interchange, Electronic Publishing, Java, Legacy Data Conversion, Metadata Middleware, Relational Database, Repository, Unicode, XSLT, Publishing, Citation Linking, Cross-Journal Collection, CrossRef, DTD, Math Markup, MathML, LaTeX, MDDB, Metadata Database, Metadata Loader, Metadata Management, Response Page, Special Characters, STM Publishing, Virtual Journal, XML-Centric Architecture And Workflow, XML-Centric Journal Production Process, XML Character Entities, XML Scholarly Publishing, XML Scientific Publishing, XML Validator, Workflow
Abstract:
During the transitional paper–electronic period, a nonprofit STM publisher faces the challenge of publishing a scientific journal in both digital and analog formats while controlling costs and ensuring consistency between electronic and printed representations of an article. This must be achieved, as its sophisticated constituency expects a constantly expanding range of information products and services. In a few short years the American Geophysical Union (AGU) leapfrogged from the paste-up era, when authors prepared their own “camera-ready copy” to be pasted on boards for a printer, to the age of XML, when an article marked up in accordance with a custom-designed DTD serves both as a version of record and a source for generating PDF and HTML article representations. Bibliographic and reference metadata are then extracted from the XML article instance into a relational database, which serves as a basis for generating online and print access mechanisms/products, including various tables of contents and author and subject indices.
Maintaining metadata in a database has allowed AGU to offer its journal subscribers a number of innovative information products in electronic form, among them a “virtual journal” that cuts across the boundaries of a traditional printed periodical. The database also serves as a source of metadata exchanges with A&I services. As a result of ongoing interaction with the publishers' consortium CrossRef, the metadata database stores a continuously growing list of cited and citing publications' DOIs, thereby enabling implementation of dynamic linking of referenced materials as well as inbound and “forward” (i.e., “cited by”) linking.
As a result of implementing an XML-centric workflow and a suite of Java and XSLT applications, AGU has increased article production capacity and shortened publication time while reducing costs and labor. In addition, using the XML-tagged article as a source for all derivative formats, including print, coupled with automation of the production process, has enabled AGU to ensure consistency of style across its publications regardless of format and to improve accuracy of bibliographic and reference metadata. Such an approach has also allowed authors to concentrate solely on producing scientific content, leaving to the publisher the responsibility for presenting their papers in a variety of formats and for enriching the works' usability with value-added electronic features, such as reference and citation linking, multimedia files, article grouping by subsets, special sections, and index terms.
Architectural and workflow models are presented; rationales for selecting particular technologies (DTD, Relational DB, XSLT, Java) and representational formats (PDF, HTML) are explained; problems in converting legacy data from multiple sources are addressed; custom-written software to ensure strict datatyping and enforce dependencies, which cannot be provided for by the W3C XML Schema nor by validating parsers is presented; and solutions for dealing with special characters, including math, are proposed. Lessons learned are shared, and benefits to a scholarly publisher are enumerated.
XML version
HTML version
PDF version
SVG version