XML 2002 logo

XML for Newcomers and Managers - Part I

Abstract

The XML Overview will provide the student a broad overview of XML. This is 45 session that will cover the following areas.

During the first section the speaker will discuss XML's background and its rich history. It will also describes the relationship of the XML specification to the ISO 8879 SGML standard and the W3C HTML specification. We will also discuss the advantages of using XML.

This second section will provide a 'gentle' introduction into XML. It will discuss what XML is and what it is not. It will also discuss some of the benefits of using XML. This section will also talk about what constitutes a well-formed document as defined in the W3C XML Specification.

The last section will provide a brief discussion of XML information objects and reusable components. It will demonstrate how information objects can be processed using XSLT.

Keywords

»CALS, »CSS, »DOM, »DTD, »EDI, »HTML, »ISO, »RELAX, »SAX, »SGML, »SOAP, »STEP, »UTF-16 , »UTF-8 , »W3C, »WML, »WWW, »XML, »XSL/XSLT, »XSLT.

Table of Contents

1. Introduction
2. An Gentle Introduction into XML
2.1. What is XML
2.2. XML is Not
2.3. XML Family of Specifications
2.4. Benefits of XML
3. XML Background
3.1. XML History
3.2. XML History
3.3. What is XML
3.4. XML Document Components
3.5. What Does XML Look Like
3.6. A Little History of SGML
3.7. SGML Problems
3.8. Then Came The Web
3.9. HyperText Markup Language
3.10. XML Roots
3.11. Bragging About HTML
3.12. HTML Problems
3.13. Poem Tagged With HTML
3.14. Bragging About XML 1.0
3.15. XML Criticisms
3.16. HTML vs. XML tags
4. Well-formed Documents
4.1. Well-Formed Documents
4.2. XML Well-Formedness
5. XML Application
5.1. XML Document
5.2. XML Declaration
5.3. XML Document Components
5.4. What's in a DTD/Schema
5.5. XML Elements - Structural Building Blocks
5.6. What is an Attribute
6. Strengths of XML
6.1. XML is Reusable
6.2. XML - Reusable Objects - Showing Hierarchical Relationships
6.3. Information Objects
6.4. Common Structures
6.5. Reusable Information
7. Presentation of XML
7.1. Benefits of Standardized Presentation
7.2. Presentation of the XML
7.3. Cascading Stylesheet (CSS)
7.4. eXtensible Stylesheet Transformation (XSLT)
7.5. XSL-Formatting Object (FO)
7.6. Attaching a Stylesheet to XML
Biography

1. Introduction

  1. The session will cover 5 areas of XML:.

    1. An Gentle Introduction into XML

    2. The background of XML;

    3. Strengths of XML;

    4. Information Objects;

    5. XML Application;

    6. XML Document;

    7. Well-formed Document;

    8. Structure vs. Content

    9. The advantages of using XML;

  1. - Continuous Acquisition and Lifecycle Support, Computer Aided Logistics Support, (Commerce At LightSpeed)

  2. - Cascading Stylesheet

  3. - Document Object Model

  4. - Document Type Definition

  5. - Electronic Data Interchange

  6. - HyperText Markup Language

  7. - International Standards Organization

  8. - REgular LAnguage description for XML

  9. - Simple API for XML

  10. - Standard Generalized Markup Language

  11. - Simple Object Access Protocol

  12. - Standard Exchange for Product Data

  13. - World Wide Web Consortium - Develop specifications for the WWW.

  14. - World Wide Web

  15. - Wireless Markup Language

  16. - eXtensible Markup Language

  17. - XML Style Language/XML Style Language Transformation

  18. - UCS Transformation Format 16 190.

  19. - UCS Transformation Format 8

There are over 250 acronyms associated with XML and XML initiatives. The above acronyms will be used in this tutorial. A list of the common acronyms are available at http://www.eccnet.com/acronyms.

2. An Gentle Introduction into XML

2.1. What is XML

  1. XML is a generic data format

    1. Describes structure

    2. Describes content

    3. Describes the relationship of information

  2. Provides a standard syntax.

  3. Does not provide semantics!

    1. Semantics — The meaning of the tags.

  4. Provides the ability to include business rules (metadata) information within the data.

XML is a markup language that provides a standard syntax. It is important to understand that XML specification provides only a standard syntax. Syntax can be thought of as the construct of how the information is marked up.

XML does not provide the semantics (definition) of the elements and / or attributes used within an XML vocabulary. XML allows organizations to develop their own vocabulary for their specific application.

For example, a publishing company might use an element called <title> for a title of a book, chapter, section, etc. However, legislative bodies would create an element called <title> with a complex structure for legislation.

2.2. XML is Not

  1. A programming language

  2. An automatic translation format between databases

    1. Provides transformation capabilities through external processes

    2. - eXtensible Markup Language Transformation

    3. Omnimark

XML can be created in many different ways. Software is required to create and process of the XML. XML is an intelligent format. XML makes it easy for computers to generate and process data. XML data is also platform independent.

2.3. XML Family of Specifications

XML Schema XSL (eXtensible Style Language)
Query XHTML
XPath MathML
XPointer SMIL (Synchronized Multimedia Integration Language)
XLink SVG (Scalable Vector Graphics)
DOM (Document Object Model) XML Signature
RDF (Resource Description Framework ebXML
CSS (Cascading Style Sheets) UDDI
RELAX (Regular Language description for XML)  

Table 1. 

Note

Over 300 different XML related acronyms (and counting) - http://www.eccnet.com/acronyms

The W3C XML Specification 1.0 relates only to syntax. It does not contain information about presentation, application or semantics of the XML. Other standards and specifications have been developed around the XML 1.0 specification.

It is important to understand that there are currently hundreds of XML specifications and initiatives. Some of these specifications and initiatives are currently in direct conflict with each other.

2.4. Benefits of XML

  1. XML provides greater levels of standardization.

    1. Industry

    2. Organization

    3. Company

  2. XML has an intelligent structural framework.

  3. XML will eventually be used widespread on the web.

  4. Single, extensible vocabulary

  5. Application integration

    1. Transport

    2. Transformation

  6. Data aggregation

  7. Data validation

  8. Intelligent searching

  9. Personalization

There are thousands of different applications using XML. XML standard vocabularies (DTD/Schema) have been developed for almost every industry.

  1. Legislation

  2. Insurance

  3. Airline

  4. Metadata

  5. Chemical

  6. Travel

  7. etc.

All the benefits and more are available with XML that are contained on this file. It also also allows metadata (information about the information) to be included in the XML. This provides flexible indexing and retrieval, as well as a knowledge-base for information.

3. XML Background

The XML background section will describe XML's rich history. It will also describe the relationship of the XML specification to the ISO 8879 - Standard Generalized Markup Language standard and the W3C - HyperText Markup Language specification.

It is important to understand where XML came from. Most people think that XML is a 'new fangled' language that has been developed since the inception of the web. In reality, XML has a long and robust history. In this section we will describe how we got XML. The 'SGML on the Web' initiative was the beginning of XML. SGML on the Web was the original vision of Yuri Rubinsky, President, SoftQuad (1952-1996, http://www.utoronto.ca/atrc/rd/Rubinsky/yuri/about-yuri.html). Charles Goldfarb is considered the 'Father of SGML' and Yuri Rubinsky is considered the "Father of SGML on the Web". Yuri worked hard to get SGML to be viable for the Web, and XML was first proposed as a version of SGML for the Web. Ten months after Yuri's untimely death the first draft of the XML specification was released.

Yuri provided the first SGML browser for the web to the SGML community. Panorama was a free plug-in to Netscape and IE and became very popular with the SGML community.

3.1. XML History

  1. XML is a subset of the International Standards Organization (ISO) Standard Generalized Markup Language (SGML), ISO 8879:1986

    1. SGML is an ISO Standard - ISO 8879:1986

    2. SGML Established Standard for 12 years.

  2. SGML was released as ISO 8879 in 1986

  3. Used in major industries

    1. Manufacturing (Automobile, Heavy Equipment, Semiconductors, etc.

    2. Telecommunications

    3. Publishing

    4. Government

    5. Aviation

If you are new to the XML world or have been working with XML for only a short time, you are probably wondering why this section is included in this section. It is important that companies who develop XML applications understand the history of XML and the importance that SGML and HTML play in the world of XML.

SGML and HTML have had a profound impact on businesses and they still play and will continue to play a significant role in development of business documents. For example, a few years ago one major on-line legal publisher claimed they had more SGML information in their database than the entire WWW has in HTML. This organization continues to use SGML has a basis for their data. Organizations that are currently using SGML have continued to use their established business practices. They are using XML publishing their information on the web.

3.2. XML History

  1. At SGML 96 Conference, XML specification was released by a working group associated with the W3C.

  2. XML 1.0 is a W3C recommendation (32 pages)

    1. XML became a recommendation in February, 1998

Even with the success of Panorama, SGML was complex and many people knew that it had to be simplified before it would be accepted by the world at large. XML was born by looking at SGML and deciding what was the core functionality required. This is important: XML is a subset of SGML!

Some of the functionality that was extracted from XML DTD's for simplicity purposes have been put back in W3C XML Schemas.

3.3. What is XML

  1. The eXtensible Markup Language (Metalanguage)

    1. Metalanguage — A language used to talk about language.

  2. A simplified subset of the Standard Generalized Markup Language (SGML)

  3. A standard for describing different types of data

  4. A standard designed to extend the use of markup languages on the WWW

XML can be used to model any kind of language.

3.4. XML Document Components

  1. Elements - Building Blocks

    1. <h1>This is an HTML/XML element</h1>

  2. Attributes - Qualifiers for elements

    1. <h1 align="center">This is an HTML/XML element aligned center using the attribute 'align'</h1>

  3. Entities - reusable components, links to external information, character encoding

    1. <h1>Here is a copyright "&#x00A9; character entity</h1>

  4. Comments - internal comments not seen by presentation system

    1. <!--This is a comment-->

  5. Processing Instructions (PI) - system specific information.

    1. <? This is a processing instruction?>

An XML document consists of the above components. An XML document must have at least one element. Everything else is optional. For example, the following tagged information would consist of a valid XML document.

<?xml version="1.0"?>
<myXML/>

Although this example isn't a realistic document, it shows how simplistic an XML document can be.

3.5. What Does XML Look Like

<?xml version="1.0"?>
<poem id="poem1">
	<title>The Raven</title>
	<poet>Edgar Allan Poe</poet>
	<stanza id="stanza1">
		<line>Once upon a midnight deary, while I pondered, weak and weary,</line>
		<line>Over many a quaint and curious volume of forgotten lore-</line>
		<line>While I nodded, nearly napping, suddenly there came a tapping</line>
		<line>As of some one gently rapping, rapping at my chamber door.</line>
		<line>"‘Tis some visitor," I muttered, "tapping at my chamber door-</line>
		<line>Only this and nothing more."</line>
	</stanza>
	<stanza id="stanza2">
		<line>Ah, distinctly I remember it was in the bleak December;</line>
		<line>And each separate dying ember wrought its ghost upon the floor. </line>
		<line>Eagerly I wished the morrow; - vainly I had sought to borrow </line>
		<line>From my books surcease of sorrow-sorrow for the lost Lenore-</line>
		<line>For the rare and radiant maiden whom the angels name Lenore-</line>
		<line>Nameless <whisper>here</whisper> for evermore.</line>
	</stanza>
</poem>

This slide demonstrates how information can be identified according to its content. Semantic understanding can be derived from the element names. Unique ID's can also be assigned.

3.6. A Little History of SGML

  1. Began at IBM as GML (Generalized Markup Language).

  2. Charles Goldfarb considered the "˜Father of SGML"™.

  3. DoD adopted SGML early as part of the - (Computer Aided Logistics Support aka Continuous Acquisition and Lifecycle Support aka Commerce at Lightspeed).

  4. Industry Standards Organizations Using SGML.

    1. European Association of Aerospace Industries (AECMA)

    2. Continuous Acquisition and Lifecycle Support (CALS) Defense Departments (U.S., U.K., Australia, Japan, NATO, etc.)

    3. Airline Transportation Association (ATA)

    4. Telecommunication Industry Forum (TCIF)

    5. Railroad Industry Forum (RIF)

    6. Society of Automotive Engineers (SAE)

  5. A lot of the same industries currently using traditional EDI.

Standard Generalized Markup Language - ISO 8879 (SGML) is an International Standards Organization (ISO) standard. SGML became an official ISO standard since 1986. IBM had originally created an internal IBM standard called GML (Generalized Markup Language). GML was the basis for SGML. Dr. Charles Goldfarb, one of the original architects of GML at IBM continued his work to ensure that SGML became an ISO standard. Charles Goldfarb is considered the "Father of SGML" by many.

SGML was an 'industrial strength' standard. It was complicated and was very flexible. Because of the complexity and the flexibility of the standard, vendors found it very difficult to write software to support the standard. Therefore, SGML software was and is still very expensive. However, we are seeing the costs of SGML/XML software products being driven down because of XML.

Major industries adopted SGML for their publishing standards after SGML became an ISO standard. The notable industries were:

  1. Government Defense Departments (U.S., Canada, U.K., Australia, Japan, to name a few)

  2. Manufacturing (aviation, automobile, semiconductor)

  3. Publishing (textbooks, medical, literary libraries). The University of Virginia and Oxford University (http://library.ox.ac.uk/) (two notable universities out of many) have maintain their literary collection on-line and archived in SGML.

  4. Telecommunications

SGML was developed to define structure and content of the information. SGML isn't concerned with the format of the information or how it is presented to potential users.

3.7. SGML Problems

  1. High initial investment

  2. Complexity

  3. Too many options/features

  4. Vendors supported a subset of features

  5. Applications weren't portable because of various feature sets

  6. Lack of intuitive end-user software

    1. Fear of "pointy brackets"™* (<>)

Note

Pointy Brackets™ is a technical term!

Because of the complexity and high initial cost to get into the SGML market, many organizations who were looking at the technology coined a phrase - Sounds Good, Maybe Later. Because of the high investment only large organizations with a large IT budget could use SGML. The small and medium-sized organizations only used SGML when they were forced too by larger business partners who required information in SGML. This was true for several industries, airline manufacturers, defense departments (US, UK, Australia, etc.).

3.8. Then Came The Web

  1. The Global Hypertext Project began in December 1990 at CERN University, European Laboratory for Particle Physics under the direction of Tim Berners Lee

  2. The Global Hypertext Project became to be known as the World Wide Web (WWW)

  3. Underlying data format for the WWW is HyperText Markup Language (HTML)

The Global Hypertext Project began in December 1990 at CERN, European Laboratory for Particle Physics under the direction of Tim Berners Lee. The project needed a way to communicate between different buildings. HyperText Markup Language (HTML) was developed for this project. HTML was an application of SGML. HTML is an application that defines information based on its presentation. HTML information is not tagged according to its content or structure - however, it is still a SGML application because it has a defined DTD, elements and attributes.

3.9. HyperText Markup Language

  1. HTML is an SGML application.

    1. Largest SGML application in the world

    2. Most successful SGML application in the world

    3. Cheapest SGML application in the world

  2. HTML 4.0.1 released December 24, 1999 (367 pages)

  3. HTML specification describes the syntax and semantics of HTML.

    1. XML specification only syntax (32 pages)

HTML became the largest SGML application in the world. It became the most successful SGML application in the world. It also became the cheapest SGML application in the world. It proved that SGML could be used by the 'common man'. HTML provided the move of SGML into the mainstream of corporate computing. It was easy enough for everyone to learn. Many executives and information providers were afraid of SGML because of the learning curve and the fear of the pointy brackets. HTML was a paradigm change.

3.10. XML Roots

  1. Yuri Rubinsky, President, SoftQuad had a vision of "SGML on the Web"™.

  2. Panorama was the first effort to bring a full SGML browser to the Web in 1994.

    1. Full SGML Publishing on the Web

    2. Dynamic table of contents

    3. Easy to learn style sheet

    4. Support for HyTime

    5. Personal linking capability

    1. Required DTD validation

    2. Plug-in browser

    3. Non-standard stylesheet

  3. The idea of XML came from the early SGML on the Web efforts.

Companies and corporations flocked to HTML. However, it was soon realized by these companies that HTML was not robust enough to handle 'real business' information. SGML was still needed but SGML was still complicated. In October 1994, the second international WWW Conference in Chicago, Yuri Rubinsky, President of a small Canadian company called SoftQuad, held a session called 'SGML on the Web'. SoftQuad was (and still is) a provider of SGML/XML authoring tools. During this conference Yuri announced Panorama a browser for SGML. Panorama proved that you could provide real SGML on the Web.

Many companies who were doing SGML flocked to Panorama. One major telecommunications company bought 30,000 copies of Panorama to put on all their employees desktop for access to corporate data. The U.S. Office of the Secretary of Defense used Panorama to access administrative data.

3.11. Bragging About HTML

  1. Cheap Lots of available tools

  2. ASCII editors will work

  3. Portable

  4. Easy to learn

    1. Users quickly lost their fear of "˜pointy brackets" <>

    2. Doesn't require Computer Science degree to create web pages with HTML.

  5. Workable and consistent hypertext facility

  6. Browser support

HTML is an SGML vocabulary. HTML broke the SGML edict that SGML is about structure content and not about format. HTML tags were all about formatting. However, HTML proved to the world that you could do SGML in a cost effective environment. Cheap tools for creating HTML flooded the marketplace very quickly.

3.12. HTML Problems

  1. Fixed formatting tags

    1. XML deals with structure and content.

  2. No reusability or modularity

  3. Browser wars

  4. No facility to personalize

    1. Not extensible

  5. Very little structure

    1. Data relationships cannot be established.

HTML has fixed formatting tags. You cannot use content tags to provide meaning to the HTML element. Some companies have tried to include semantic meaning for the elements by using the 'class' attributes.

There is no modularity or reusability to HTML. You can do 'server-side' includes with HTML but this is hardware / software specific.

You cannot establish relationship within HTML using the hierarchy (parent/child structure). There are basically only two structures to an HTML document, <head> and <body>.

3.13. Poem Tagged With HTML

<HTML>
	<HEAD>
		<TITLE>The Raven</TITLE>
	</HEAD>
	<BODY>
		<H1 ALIGN="CENTER">The Raven</H1>
		<H2 ALIGN="CENTER"> Edgar Allan Poe</H2>
                <HR>
		<P>Once upon a midnight deary, while I pondered, weak and weary,<BR>
	   	   Over many a quaint and curious volume of forgotten lore-<BR>
	   	   While I nodded, nearly napping, suddenly there came a tapping<BR>
	   	   As of some one gently rapping, rapping at my chamber door.<BR>
	   	   "‘Tis some visitor," I muttered, "tapping at my chamber door-<BR>
	   	   Only this and nothing more.
        		</P>
		<P>Ah, distinctly I remember it was in the bleak December;<BR>
	   	   And each separate dying ember wrought its ghost upon the floor. <BR>
	   	   Eagerly I wished the morrow; - vainly I had sought to borrow <BR>
	   	   From my books surcease of sorrow-sorrow for the lost Lenore-<BR>
	   	   For the rare and radiant maiden whom the angels name Lenore-<BR>
	   	   Nameless <FONT color="red">here</FONT> for evermore.
		</P>
	</BODY>
</HTML>

Looking at the poem above tagged in HTML, you can see that you cannot identify pieces of the poem. For example, you can't recognize the name of the poem, the author or the individual stanza's.

The lack of robust tagging limits the usability of the information. For example, with the poem tagged as HTML, you can't ask for the author of the poem 'The Raven'. You would have to do a full-text search for 'The Raven' then take your chances that you find the poem 'The Raven' and not some other reference to 'the raven'.

3.14. Bragging About XML 1.0

  1. Identifies content according to its type not its format

  2. Conveys information specific to an organization or application

  3. Communicates this information to both humans and computers

  4. Works for any type of information

  5. All the advantages of SGML without the complexity

  6. Portable

  7. XML provides the robust functionality of SGML without most of the complex feature-set

  8. Vendor support, i.e., Microsoft, Netscape, IBM, Sun, Oracle, etc.

  9. Easy to learn

  10. Less expensive to implement than SGML

  11. Internationalization

  12. Web Accessibility

  13. Easy to build applications

XML 1.0 provides most of the advantages of SGML without the complexity.

It is important to distinguish the difference between all the specifications around XML and XML 1.0. The surrounding specifications, XML Schema, RDF, XSL(T), can be difficult to understand. However, the core XML is very easy.

It doesn't take the large cost investment to use XML that it did with SGML. A lot of good XML tools are open source. It is inexpensive to create and present XML information. There are many times more tools available for XML than there were for SGML. SGML tools were very limited.

3.15. XML Criticisms

  1. A lot of hype (Hype is dying down and reality is setting in)

  2. Hard to distinguish reality from hype

    1. W3C Schema Specification

    2. ebXML

    3. RDF

    4. and more ...

    1. Schema (W3C/Relax NG)

    2. Repository (ebXML/UDDI)

    3. Transport (Simple Object Access Protocol [SOAP]/ebXML Transport and Routing Protocol [TRP])

    4. Stylesheet

      1. eXtensible Style Language (XSL)

      2. Cascading Stylesheet (CSS)

      3. Document Style and Semantic Stylesheet Language (DSSSL)

  3. Confusion about which rules-based specification to use

    1. DTD versus Schema

  4. Browser implementations slow

    1. Browser incompatibility (wars)

As noted before, the activities currently around XML are difficult to navigate. It is important that managers and developers understand the implications of any XML development before trudging ahead.

In some cases, the specification that is used for a particular project, i.e., SOAP/ebXML TRP, Schema(s)/DTD, may depend on the software that will be used to implement the XML project.

A good example would be an authoring tool. Currently, authoring tools (Arbortext Epic, WordPerfect, Frame +SGML) do not support schemas. Corel's XMetaL 3.0 currently supports W3C Schema language. Epic editor is rumored to support schemas during a future release.

3.16. HTML vs. XML tags

HTML

  1. Format Driven

  2. Little Structure

  3. Mostly Formatting Tags

  4. No Intelligence in the Data

click image for full size view

Figure 1. Hierarchical Diagram of HTML DTD

XML - Structure and Content Driven

click image for full size view

Figure 2. Hierarchical Diagram of a Coffee Order DTD

Comparing the two hierarchical graphics above, you can see that HTML has two levels of hierarchy, <head> and <body> The <body> can contain any element in any order.

The <CoffeeOrder> element shows that you can order multiple types of <Latte>, <Mocha> or <Cappuchino> . You can also see that the model is extensible if the menu grows, for example if they decide to include an <ExtraGrande> size.

4. Well-formed Documents

There are two types of XML, well-formed and valid. Well-formed documents are the least stringent type. A well-formed document simply requires that all elements are cleanly nested. Also, all attribute values must be enclosed in quotes ("..." or '...').

Valid documents, on the other hand, must include a DTD or a Schema and adhere to it!

4.1. Well-Formed Documents

  1. Contains one or more elements.

  2. There is exactly one element, called the root element, or document element.

  3. Each element has a start tag.

  4. Each element has an end tag.

  5. Each attribute value is delimited using quotes (single or double).

  6. Element and attribute names are case sensitive, i.e., <p> and <P> are considered two separate and distinct elements.

A well-formed document requires that all elements must have a start tag and an end tag. It also requires each attribute value to be delimited in quotes (double or single are fine as long as they are consistent).

If you are familiar with HTML authoring, this is different. In HTML, you can have a start tag with an ending tag. Each attribute has to be enclosed in quotes.

4.2. XML Well-Formedness

<H1 align=center>HTML snippet</h1>
<hr>
<p>This is a paragraph
<p>Next paragraph

Figure 3. Valid HTML fragment

  1. Start tag for H1 is upper case whereas the end tag is lower case

  2. The attribute value is not enclosed in quotes.

  3. The <hr> element is an empty element so it doesn't have content or a closing tag.

  4. The <p> elements do not have an end tag

Below is the resulting well-formed document fragment.

<h1 align="center">HTML snippet</h1>
<hr/>
<p>This is a paragraph</p>
<p>Next paragraph</p>

Figure 4. Well-formed fragment

Although it is good form to enclose attribute values in quotes in attribute values, HTML browsers are lenient in this respect.

5. XML Application

5.1. XML Document

  1. An XML document is composed of four components

    1. XML Declaration - <?xml version="1.0" encoding="UTF-8"?>

      1. The declaration is

    2. Document Type Definition (DTD) or Schema

      1. The Rules that define an XML

    3. Tagged document (instance)

    4. Stylesheet

      1. eXtensible Stylesheet (XSL)

      2. Cascading Stylesheet

      3. Document Style Semantic and Specification Language (DSSSL)

click image for full size view

Figure 5. XML Document

A complete XML document is comprised of three components. The DTD or Schema is used to create a rule-based document. It also is used to validate the document to ensure that has been created based upon the defined rules.

The stylesheet is used to present the document in a human-readable format. A browser or publishing engine uses the stylesheet to represent the document visually.

5.2. XML Declaration

  1. Three parts:

    1. version number - required

    2. encoding - optional

    3. standalone - optional

  2. Can be used with both SGML and XML tools. Available from:

http://www.w3.org/TR/NOTE-sgml-xml.html 
<?xml version="1.0"?>

Figure 6. Simple XML Declaration

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Figure 7. Full XML Declaration

The version number is the only required component of the XML declaration. Currently the version is 1.0.

The encoding declaration describes what character encoding is used.

The standalone declaration declares what components of the document type definition are necessary for complete processing.

5.3. XML Document Components

  1. Elements - building blocks

  2. Attributes

    1. qualifiers of elements

    2. properties of elements

  3. Entities

    1. Reusable components

    2. References to binary information

    3. Special character handling

5.4. What's in a DTD/Schema

  1. The rules in the DTD describe:

    1. the names of allowable elements (tags)

    2. the content of each element type

    3. the structure of the document, including:

      1. the order in which elements must appear

      2. how often the elements can appear

    4. the properties of the elements (which are called attributes)

<!ELEMENT poem (title, poet, stanza+)>
<!ATTLIST poem
          id    ID   #REQUIRED>

<!ELEMENT title (#PCDATA)>

<!ELEMENT poet (#PCDATA)>

<!ELEMENT stanza (line+)>
<!ATTLIST stanza
          id     ID    #REQUIRED>

<!ELEMENT line  (#PCDATA | whisper)*>

<!ELEMENT whisper (#PCDATA)>

Figure 8. Example DTD

5.5. XML Elements - Structural Building Blocks

  1. The DTD describes:

    1. What elements are allowed

    2. How the elements are related

    3. The allowable content of an element

    4. The properties (attributes) of the element

    5. Elements have unique names and lengths are not restricted

    6. The first NAME character must be a letter, “_” or “:”

5.6. What is an Attribute

An attribute is a property that is associated to an element.

  1. Examples:

    1. Unique identifier

    2. Revision level of a document

    3. Review status of a proposal

    4. Author of a review comment

    5. Size of a graph

    6. Internal audit designation of a repair manual

6. Strengths of XML

During this section we will outline why you would want to use XML. We will also talk about the capabilities of using XML.

6.1. XML is Reusable

  1. Information can be reused via addressing mechanisms.

  2. Information can be reused by external entities (reusable modules, i.e., boilerplate text).

  3. XML is currently being expanded to allow more robust addressing mechanisms (XLink/XPointer)

XML provides the powerful capability of reusing information objects. An information object is a logical piece of information. A good example of an information object is an address. An individual address can be used in many different contexts within a single document, transaction or message. For instance, the same address could be reused within a purchase order for requestor, shipping address, etc. This address can also be used in multiple transactions. XML provides the facility for these reusable objects. Create once - reuse many.

6.2. XML - Reusable Objects - Showing Hierarchical Relationships

click image for full size view

Figure 9. Reusable Object

The graphic above, shows how an information object can be reused. For example, the poem "The Raven" by Edgar Allen Poe can be used in multiple locations, one chapter devoted to Edgar Allen Poe and another in a chapter concerning birds in poetry. This becomes really useful for on-line information. Everywhere that the poem "The Raven" could be referenced you would have only one source of the document.

6.3. Information Objects

click image for full size view

Figure 10. Information Objects

This graphic shows how the information objects for purchaser can be used in both an invoice and a purchase order. It also shows how the catalog item can be used in the catalog, invoice and purchase order. In the following slides we will show how the information objects remain the same, even though the objects are processed differently depending on the particular document model (invoice, purchaser and catalog item).

6.4. Common Structures

  1. Common structures (information objects) allow you to:

    1. Use common structures across multiple information components.

    2. Reuse common fragments organization or industry

click image for full size view

Figure 11. Example of a Common Business Object

The above model shows how a common information object can be created for a standard structure. This model can be used in multiple contexts, as well as multiple documents.

An example of a common information object would be a table. There are currently two table models in wide use. The first and oldest model is the CALS (Computer Aided Logistics Support) table model. The second is the HTML table model. The CALS table model is the more robust of the two. However, both table models have been widely used in XML applications. In some cases, the table models have been enhanced to allow for individualized tables, i.e., source notes, etc.

6.5. Reusable Information

click image for full size view

Figure 12. Common Business Object Is Reusable

  1. Example of Catalog

  2. Example of Purchase Order

  3. Example of Invoice

The graphic above shows how the information object <item> can be enhanced for the purchase order and invoice applications by wrapping the <item> element in a parent element and adding the <qty> element before the item and the <amount> after the item. This way, an external program can be used to calculate the <amount> element once the <qty> ordered is known.

7. Presentation of XML

7.1. Benefits of Standardized Presentation

  1. The same stylesheets can work across all platforms

  2. One stylesheet language can be used for a class of documents

  3. Stylesheet code can be re-used for different document types

  4. Many different applications can process stylesheets that use the same standard

Presentation standards have been developed for presenting XML. This provides the ability to create one stylesheet and present it across all platforms. There are currently two stylesheet specifications that can be used with XML (CSS and XSL/XSLT).

7.2. Presentation of the XML

  1. Presentation of XML based on XML tags.

  2. Provides flexibility to other formats, HTML, CD-ROM, paper, etc.

  3. Standardized Stylesheets

    1. Cascading StyleSheet (CSS)

    2. XML Stylesheet Transformation (XSLT)

    3. XML Stylesheet (XSL)

click image for full size view

Link to Coffee Example.

<!DOCTYPE CoffeeOrder SYSTEM "coffee.dtd">
			 <CoffeeOrder> 
         <Type> 
            <Name> <Latte/> </Name>
			         <Size> <Grande/> </Size> 
            <Cost>$3.40</Cost>
			     </Type> 
        <Type> 
             <Name> <Mocha/> </Name>
			          <Size> <Vente/> </Size> 
            <Cost>$4.40</Cost>
			     </Type> 
        <Type> 
             <Name> <Cappuchino/> </Name>
			          <Size> <Tall/> </Size> 
             <Cost>$2.40</Cost>
			     </Type> 
   </CoffeeOrder> 

Presentation of the XML is based on the tag elements. If you look at the tagged example, you will see that there is no form elements used, there isn't any presentation information included in the text. However, showing the example shows a nice presentation order form based on the XML elements. The presentation can also be attached to attributes as well.

7.3. Cascading Stylesheet (CSS)

  1. W3C Recommended Specification - May 12, 1998

  2. Support for CSS2 in both MS IE 5/6 and Netscape 5.

  3. Doesn’t require transformation of XML (down translation).

P { 
   font-family: non-serif; 
   font-size: medium; 
   color: black;
			overflow: visible; 
   margin-left: 40px; 
   margin-right: 40px 
} 

CSS stylesheets can be used with HTML and XML. CSS can also be used to enhance HTML presentation with XSLT transformations to HTML.

7.4. eXtensible Stylesheet Transformation (XSLT)

  1. XSL Transformations (XSLT) - Version 1.0

  2. W3C Proposed Recommendation

  3. Describes syntax and semantics for transforming

  4. XML documents into other XML documents.

    1. HTML

    2. WML

    3. XML

  5. XSL Parsers

    1. XT by James Clark (www.jclark.com)

    2. MSXSL - Microsoft

    3. SAXON - Michael Kaye

    4. Cocoon - (www.apache.org)

<xsl:template match="CoffeeOrder">   <html>
			  <head> 
        <title>Cool Coffee Menu </title> 
     </head>
			  <body font-family="Arial, helvetica, sans-serif" 
           font-size="10pt" bgcolor="#EEEEEE"> 
        <h1 align="center">Cool Coffee Menu</h1>
			     <hr width="75%"/> 
        <form method="POST">  
        <center>
        <table border="1" cellpadding="10"> 
           <tr> 
              <th colspan="8" align="center">Place Your Order</th>
           </tr>
			           <xsl:apply-templates/> 
        </table> 
        </center> 
        <p align="center">
            <a href="order.xml">Place and Order</a>
        </p>
			     </form> 
     </body> 
   </html> 
</xsl:template> 

XSLT defines how to transform XML documents into other XML documents or into HTML or text documents. The XSLT specification defines how to filter, sort and transform an XML information which allows presentation to be applied to the transformed document.

7.5. XSL-Formatting Object (FO)

  1. Part of the W3C XSL Specification

  2. Draws from XSLT and CSS Specifications

  3. Provides formatting capability for Print

  4. XSL-FO Processors

    1. RenderX (www.renderx.com)

    2. Antenna House (www.antennahouse.com)

    1. Apache FOP (Formatting Object Processor)

 <xsl:template match="para"> 
    <fo:block
			    font-family="Times" 
       font-size="11pt" 
       margin-left="25pt" 
       margin-right="25pt"
			    space-before.minimum="18pt"> 
       <xsl:apply-templates/> 
    </fo:block>
</xsl:template> 

XSL-FO is the style component of the XSL specification. The XSL-FO specification provides the mechanism to support complex page layout, similar to desktop publishing capability. It uses CSS as a basis for the style attributes. It also used XSLT as the mechanism to transform an XML document into an XSL-FO document.

If you are reading these instructor notes, the printed copies have been created using XSL-FO for formatting.

7.6. Attaching a Stylesheet to XML

  1. Stylesheets are attached to XML files via a processing instruction.

  2. A specification has not been approved yet for attaching stylesheets.

  3. De Facto approach

  4. Two Mime Types (text/xsl and text/css)

  5. href attribute can be a URL/URI

<?xml-stylesheet href="poem.xsl" type="text/xsl"?>
<?xml-stylesheet href="poem.css" type="text/css"?>

You can attach the stylesheet to an XML document using a processing instruction. Currently there isn't a definitive standard for attaching stylesheets. However, browsers recognize this mechanism.

Biography