Abstract
Commercial publishers were some of the earliest adopters of SGML and XML technology, and the use of markup technologies for production has become an established and accepted norm in many parts of the publishing industry.
The emergence of new XML-based languages and technologies, together with the overturning of some of the old verities about what SGML was 'for', has opened the possibility for publishers to capitalise on the XML expertise they have acquired, and expand its use beyond production departments to many other areas within a publishing business.
Drawn from real-world experience of the transitions currently taking place or planned within a number of major players within the international publishing industry, this presentation gives a high-level overview of the expanded role XML can play in the publishing process, and the benefits and problems that follow.
The presentation tracks the evolution of markup technologies in publishing from a position where deep semantic tagging was expected, to a more pragmatic business position in which semantically leaner content is produced, but with a compensating increase in the richness of metadata. The emergence of XML and the rise of XSLT and de facto 'off-the-shelf' DTD and Schema modules will be shown to have had some effect in homogenising publishers' content and driving down the cost of turning XML content into products.
Publishing companies have typically evolved into institutions in which 'production' is seen as a distinct and separable part of the business. Yet the wealth of XML data and knowledge now being accumulated in production departments has a wider application within publishing business - particularly to the core areas of sales and marketing.
Sales and marketing departments are now increasingly being required to use XML for a number of eCommerce activities. The eCommerce need not only be between a publisher and other organisations, however. Internal eCommerce between the business and production parts of a publishing enterprise can automate a number of data transfers which traditionally have been manual tasks.
Within manufacturing too, the arrival of JDF (Job Description Format) has created the potential to specify information that carries a print job from genesis through to completion. Again, there is the possibility for internal eCommerce within a publishing company to exploit and share this kind of data with other processes within publishing.
Ultimately, it is possible to describe a publishing business in which XML informs the processes from product inception through to manufacturing and beyond, and to show how XML data can flow between departments, systems and suppliers within publishing to bring business benefits from increased automation and interoperability.
Keywords
Table of Contents
It could be said now with some confidence that there is a new orthodoxy in publishing production, where workflows are typically streamlined, content is turned into XML as early as possible, and where that content is, or is planned to be, multi-purposed into multiple delivery media. The recognition that this is the way things should be done predates the reality of it actually happening, and an interesting slice of history here is expressed in Mike Shatzkin's quotation from a conference in 1999 setting out what many publishers now embrace, more-or-less, as an informing vision:
It has become obvious today that every book should be made in a file that can be flexibly altered to deliver what each commercial manifestation of the file requires. Exactly when it became obvious may be open to debate, but we are hereby declaring that it is obvious now. Publishers who can cost- and time-effectively deliver the files to make the offset-printed book, the digitally-printed book, the Web presentation, and the format for Rocket Books, Softbooks, and their proliferating competitors, will make sales at the expense of publishers who cannot.
The publisher who did not prepare files by this standard as they made books in 1996 or 1997, or even in 1998, failed to see the future as well as they might. The publisher who is not preparing files as they make books that way right now is open to valid questions about the company's mastery of a core competence.[Shatzkin]
On the one hand what he said can be dismissed as naïve - after all, it's not as if publishers with eBook content are making sales at the expense of those that don't, in any meaningful business sense. On the other hand his comments are logically appealing, and hold out the enticing (or fearful) prospect of a truly 'content-agile' publishing company that can re-purpose its content in whatever form the market dictates.
Commercial publishers were some of the earliest adopters of SGML and XML technology, and this was the business area in which markup technologies first thrived.
The possibilities of rich semantic tagging often gave rise in the early days of SGML and XML to a concept of structured content that was sometimes inappropriately rich - a phenomenon that has been referred to as the 'mega-markup model'[McGrath]. While there is nothing conceptually wrong with rich semantic tagging, in commercial publishing the costs attached to such richness need to be carefully controlled. There is clearly a difference of content value between, say, a prestigious multi-volume reference work, and a recondite academic monograph. While the former might be expected to sell in volume, and to afford possibilities of multi-purposing content, on the latter the profit margin are tight enough even using a lowest-possible cost workflow; adding the cost of rich semantic tagging would make such a product non-viable.
The factors in effect here can be thought of as represented by a content pyramid. At the top are the few very-high value brand name products typified by long production cycles, big budgets, and correspondingly high retail prices. Examples might be the Grove Dictionary of Music, or the Dictionary of National Biography. At the bottom of the pyramid is the more day-to-day disposable content - magazines and newspaper content, characterised by rapid production cycles and low unit retail prices.
What is changing is the spread of the use of XML in production, from the top of the pyramid downwards, and the changes in its deployment this has demanded.
The top of the pyramid was, of course, a very natural environment for such a technology as SGML. As well as having content ripe for 'mega-markup', this is a region often inhabited by people of an academic bent attracted to the conceptual purity of SGML solutions
The emergence of new XML-based languages and technologies, together with the overturning of some of the old verities about what SGML was 'for', has opened the possibility for publishers to capitalise on the XML expertise they have acquired, and expand its use beyond production departments to many other areas within a publishing business.
SGML was 'for' (to quote from part 0 of the standard) 'publishing in its broadest definition'[SGML]; XML (by design, hype, marketing spend, accident or sheer force of will) is 'for' data as much as documents. In the past SGML consultants would talk sagely about the separation of structure from presentation. Within the XML world, technologies like XSL-FO deploy XML for describing nothing but presentation.
Thus for lower-value content it is now possible to find many publishers adopting a more presentation-centric approach to markup, typically combining fairly sparsely structured content with discrete blocks of descriptive metadata.
One of the most basic ways in which a publishing company can begin to integrate is to make its content interoperable. By this, I mean modelling the same type of content, the same way, across the enterprise.
Back in the 20th Century when XML was a new technology I remember sitting in a meeting with the senior management of a publishing company that was commissioning a new DTD, and the view was seriously expressed that every author should have their own DTD.
Since then we've come a long way and over the years that has been a pronounced tendency within publishing for DTDs (and hence the XML they govern) to grow in their scope more and more.
A notable - perhaps unique - development in this area was the joint development, in 1999, of the mrwML DTD - a DTD developed jointly by John Wiley and Academic press for the modelling of all major reference content.
However, while this cross-publisher cooperation may be exceptional, publishers should at least strive for consistency within their own content holding. Simply put, if there are common content structures modelled (like lists, tables, paragraphs and notes), then these should be modelled the same way - or in a compatible way - each and every time they are created, for every type of content the publisher holds. Already some publishers are exploiting this practice to enable the creation of products that cut across traditional print product boundaries where - for example - a new product can be created on-the-fly by combining reference content, journal content and monograph content.
Publishing companies have typically evolved into institutions in which 'production' is seen as a distinct and separable part of the business. Yet the wealth of XML data and knowledge now being accumulated in production departments has a wider application within publishing business.
The application of XML outside production (by which I mean the turning of content into products), is in the field of eCommerce. 'eCommerce' can be a slippery term but in this context I am using it to mean nothing more than information transactions. Such transactions of course take place all the time within publishing companies and between publishing companies and other external entities using a variety of means. The 'e' in eCommerce here signifies the transaction is entirely electronic (i.e., between systems).
Sales and marketing departments are now increasingly being required to use XML for a number of eCommerce activities because of one of the most significant recent developments in eCommerce standards within the publishing industry. ONIX International[ONIX] in an XML-based language which described itself as the 'international standard for representing and communicating book industry product information in electronic form.' The standard has become a notable success with wide adoption among the larger publishers in the publishing industry, and among potential recipients: Nielsen Bookdata and Amazon will, for example, both take ONIX international records directly, allowing a product's visibility to suppliers and buyers to be controlled electronically from pre-release stages through to deletion, almost literally at the press of a button.
The scope within the workflow (business process) of an ONIX record gives it a huge potential. A number of major publishers are taking advantage of the fact that ONIX records model the bibliographic details of products right through the workflow, from product approval (or even before) to the product's mature existence.
This enables the ONIX record for a product to act as a pivotal element in an integrated XML workflow. At first the ONIX record is the only data associated with a nascent product, but as production continues more and more data becomes available that can enrich the ONIX record. The real title of the book, to take a simple example, can be drawn from the content XML (and of course titles change during production). Later, in manufacturing, manufacturing data can inform the ONIX record - the true number of pages for example. Finally, post-production, it is the ONIX record which holds the authoritative bibliographic information, metric and pricing information for the now physical products that are being managed in the supply chain.
JDF[JDF] - job tickets, modelling: 'creative, prepress, press, postpress and delivery processes'
ONIX International - book / eBook production information standards
ONIX for Serials[OfS] - serial bibliographic information; alerting; subscription messaging
XBITS[XBITS] - manufacturing supply chain transactions
At some point in any discussion of whether XML can be used to integrate a publishing business, the question might be asked - 'so what?' Is there anything more significant in suggesting that XML (which is, after all, only a technology) should be the basis of information systems within a publishing business, than suggesting that relational databases should form that basis, or ASCII text, or Microsoft Office?
The answer to this is that alone, XML - in common with relational databases, etc. - can achieve nothing. But XML offers the potential to create systems that, compared to those based on these other technologies, offer potentially greater simplicity, longevity and interoperability. All those basic XML nostrums about vendor neutrality, while they may have had perhaps a less-than-expected effect in the arena of production, can really count for something when such neutrality is in effect across an entire enterprise, even more so when in effect across an entire industry.
A challenge for publishers here is in defining roles for people to spot opportunities for integration of systems and data. Such a role (maybe Chief Information Office, or Data Strategist) needs to be cross-organisational to be effective.
I have asserted that in production there is in place a new orthodoxy for XML-driven workflows. I would suggest that in the coming months and years, XML will become a more pervasive technology within publishing applying not just to production workflows, but to many different cross-organisational business processes.
Forward-looking publishers have the opportunity now, by planning their systems with the bigger picture in view, to create an enterprise that will benefit from greater automation and interoperability of all their business information.
[JDF] Job Definition Format. http://www.cip4.org/
[McGrath] McGrath, Sean, Zen and the art of motorcycle manuals, XML in Practice. 22 August 2002. http://www.itworld.com/nl/xml_prac/08222002/.
[mrwML] Academic Press / John Wiley & Sons Joint STM MRW DTD. http://jws-edck.wiley.com:3535/mrwmlorg/..
[ONIX] ONIX Product Information Standards. http://www.editeur.org/onix.html
[OfS] ONIX for Serials. http://www.editeur.org/onixserials.html
[Shatzkin] Shatzkin, Mike, The Core Competencies of 21st Century Publishing. Paper given at 'Information in Action' Conference, New York City, June 9, 1999. http://www.idealog.com/990609.html.
[XBITS] XML Book Industry Transaction Standards. http://www.idealliance.org/xbits/.
![]() ![]() |
Design & Development by deepX Ltd. |