SectionThree: What is XML?

XML stands for extensible markup language. Extensible means that the language is a shell, or skeleton that can be extended by anyone who wants to create additional ways to use XML. Markup means that XML’s primary task is to give definition to text and symbols. Language means that XML is a method of presenting information that has accepted rules and formats.

According to the World Wide Web Committee (W3C)(1), XML is described as:

The Extensible Markup Language (XML) is descriptively identified in the XML 1.0 W3C Recommendation as "an extremely simple dialect [or 'subset'] of SGML" the goal of which "is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML," for which reason "XML has been designed for ease of implementation, and for interoperability with both SGML and HTML." (2)

Where did XML Come From?

Markup languages were first developed by IBM and later implemented on a large scale within publishing companies. In 1986, the International Organization for Standardization (ISO) established the markup language SGML, or standard generalized markup language. SGML is an ISO standard for defining document structures for the application of mark-up schemes.(3) SGML is the base language used to create both HTML and XML.

Why was SGML developed? Publishing companies needed a means for marking up a document so that the words could be presented in a number of different ways. For example, a new product might require a detailed instruction manual and a beginner’s guide. Both documents may require the publisher to re-use elements of the original. Coding a document in SGML permits the text to be re-used in many different formats. The standards process fine-tuned a rich document markup language that allowed the author to separate the content from the presentation of document. Authors could write once, tag with SGML, and display the information in many different formats.

SGML is not a language in itself, it is a meta-language. The main purpose of SGML is to describe a method for describing data. It provides a method for depicting any logically structured set of information. A meta-language is a set of rules that defines how other things work. As a meta-language, SGML provides the overall rules and procedures that permit a wealth of varied applications to exist.

Using the Olympics as an example, SGML would be similar to the rule-making body of the Olympics. No real games are held directly by the Olympic committee, but the structure and participation criteria are developed and enforced at this level. Taking a step closer to the actual games such as the summer Olympic games, the protocol HTML (hyper-test markup language) is similar to the rules developed for participation in demonstration sports; few rules, no official metals are awarded and not a lot of athletes participant. HTML is derived from SGML and it simply describes a method for presentation of information.

XML is a more complex derivative of SGML than HTML. XML, by comparison to HTML, would be similar to the official rules that run the Olympic summer sports. Official rules, similar to the W3C’s release of the XML 1.0 recommendation in February, 1998, govern how individual sports are conducted. In the summer Olympic event men’s 100 meter dash, qualifying rules are sent to each participating country with information on how individual athletes need to perform to qualify for time trials. Any disputes or questions about the rules are handled by the Olympic committee.

The W3C recommendation(4) for XML laid out the rules for creating a markup system that would give context and structure to data. XML is actually a sub-set of the parent language, SGML. The rules by which XML operates are contained in the above mentioned W3C recommendations.

HTML Characteristics · A simplified version of SGML· Presentation only· A markup language XML Characteristics · A Sub-section of SGML· Complex, Defines Content and Presentation· A Meta-language enabling uniform data structure and individual presentations SGML is the parent language to both HTML and XML

SGML is a very complicated, difficult to work with language. It was intended as a language that would account for every possible type of data format and presentation. The portion of SGML that has become XML is a small part of the parent markup language.

XBRL, or extensible business reporting language, is a fully compliant extension of the XML 1.0 recommendation. XBRL is presently taking its place along side of other industry groups who are defining their own XML-compliant extensions to XML. Registries for industry specific XML extensions can be found at rossettanet.org and xml.org.

XML-Coded data:

<?xml version="1.0"?>
<!DOCTYPE Sales_Budget "http://www.webpro-ri.com/budgets/sales.dtd"> <Sales_Budget>
  <HEADER> <DEPARTMENT>
    <NAME>New England</NAME>
    <PERIOD>02312000</PERIOD>
    <CUSTOMER>Wal-Mart</CUSTOMER>
  </DEPARTMENT> </HEADER>
</Sales_Budget>

XML BASICS

XML, or eXtensible markup language, is all about creating a universal way for both formatting and presenting data. Once data is coded or marked up with XML tags, data can then be used in many different ways.

According to the World Wide Web Consortium (W3C), the definition of XML is as follows:

XML is a set of rules, guidelines, conventions, whatever you want to call them, for designing text formats for such data, in a way that produces files that are easy to generate and read (by a computer), that are unambiguous, and that avoid common pitfalls, such as lack of extensibility, lack of support for internationalization/localization, and platform-dependency. (5)

The way XML works is that programmers mark-up a text-based document with tags (similar to HTML tags) that tell what each word, number or group of words represent.  For example, the tag <invoice number> might be used to describe the number of an invoice.  Software can understand what <invoice number> means if it has access to the information's key, or schema.  

XML is the language of e-business

The market processes for e-Business require interoperability of software applications, and consistent protocols and formats for information interchange. XML is designed to accomplish these goals. In its basic format, XML enables information exchange inside and outside of organizations, as well as between individual users and different software applications. As a result of these abilities, XML is the foundation language for e-Business information exchanges.

Today, with the exception of Electronic Data Interchange (EDI), most Business-to-Business (B2B) and Business-to-Commerce (B2C) transactions involve the exchange of information through that product the Chinese invented over a thousand years ago, called paper. Paper enabled people to create "documents." The creation of documents, and the technology to produce them in mass as a result of the Gutenberg Press, was partially responsible for a Renaissance (and some might say several revolutions).

With XML, the (information) revolution moves one step further. XML is the technology enabler for the exchange of data contained within a document without requiring that a related document format be specified by either the information provider or user.

This technological change, a focus solely on content rather than on both content and appearance, is expected to have consequences similar to the invention of paper and the printing press. Its impact on the enhanced exchange of information and data upon our business world will be profound. In fact, the movement toward full-scale adoption of XML is already underway. Several industry (vertical) supply chains(6) are currently leveraging XML to drive efficiencies through the entire business channel.

Footnotes:

  1. The World Wide Web Committee, or W3C, is a world-wide consortium that establishes protocols or rules for the Internet. Visit the W3C Web Site at www.w3c.org .
  2. The XML Cover Pages Extensible Markup Language (XML) By: Robin Cover Last modified: July 31, 2000 Found at http://www.oasis-open.org/cover/xml.html#overview   August 1, 2000.
  3. "W3C contributes to efforts to standardize Web technologies by producing specifications (called "Recommendations") that describe the building blocks of the Web. W3C makes these Recommendations and other technical reports freely available to all." www.w3.org/Consortium/ 
  4. XML in 10 Points, by Bert Bos, © 1999-2000 ®, All Rights Reserved. Created 27 Mar 1999 (last update: Date: 2000/05/26 15:48:52 ) www.w3.org/XML/1999/XML-in-10-points , Viewed July 31, 2000.
  5. See www.verticalnet.com, which has over 50 business communities and growing.

Articles for further Study: 

XML: Powering the Twenty-First Century, By Eric Cohen and Walter C. Schmidt.  Eric Cohen is an XBRL Steering committee member

An Introduction to XML, Benefits and Applications, posted in Intranet Developer's Online Magazine, by Ken Sall.

XML, Java and the Future of the Web, by Jon Bosak, Sun Microsystems.  Describes in great detail how XML works and what applications are in store for XML.

The XML Cover Pages, by Robin Cover.  Hundreds of  links to official publications about XML.   This pages is a must for discovering the impact of XML, including applications.

Spreading Some XML, by Clint Boulton of InternetNews reports that IntraNet Solutions has joined OASIS, an organization dedicated to providing uniform XML standards for businesses.

Previous - Next

Copyright 2008 Saeed Roohani, XBRL Education