A List Apart


What the Hell is XML?

XML (Extensible Markup Language) is the Eurodollar of web development. Both XML and the Euro bring order to chaos; both offer undeniable, wide–ranging   benefits; both are poised, in 2002, to change the way we do things. Frankly,   both scare the crap out of people.

Article Continues Below

For web developers, 2002 is a time to conquer fears and take their first hands–on   approach to XML. It’s time to examine XML and realize the practical benefits   that it can provide to web projects today.

The bankers can fend for themselves.

XML, HTML & Databases

If you need a good analogy to describe XML to other people, don’t mention HTML.  Although XML looks a lot like HTML, creating a good XML file is more like designing a database than creating a web page.

Databases and XML documents are both used   as a means to organize data. As a result, they share a lot of similarities.

A database table design for a table containing news stories would   look something like this:

Table Name:
Table Columns:
  • Headline
  • Category
  • Author
  • Date
  • Abstract
  • Body
  • Status

A basic XML document containing the same information might look like this:

<?xml version="1.0"?>



In addition to these similarities, both databases and XML represent a huge step forward in the ability to publish and manage web content.

XML everywhere

At any scale above that of the small, personal site, database–driven websites are indisputably better at managing, updating, and maintaining content than HTML–only sites. What everyone will discover in 2002 in that XML–driven database sites will prove to be indisputably better than database–driven sites. XML is going to be everywhere.

And as a web developer, you are going to love it.

XML is poised to eliminate more headaches than a bottle of Ibuprofen, improve   productivity more than cans of Red Bull,  and increase profitability more than   we’ll want to our clients to know about.

How? Two words: Content management.

Content management & migration

Before projects are initiated by a client, a website usually reaches a stage   of obsolescence, immediacy, or embarrassment. Web projects are big projects   with short time lines. It’s not surprising, then, that one of the biggest factors   influencing the profitability and success of web projects is the ability to   effectively manage content.

Separation of style, programming, and content

The ability to store a site’s content, programming, and design separately and   mix them together transparently, on demand, is the art of our craft. Each moment   eliminating rework and duplication is a dollar in our pocket. It’s time spent   adding new features to a site rather than rewriting, reworking, and “searching   and replacing.”

We’ve solved much of the problem with databases, templates, style sheets and   server–side includes. Much that remains, XML can address. It’s the best tool for managing content – the content itself, not the way text appears on screen.   XML is used to structure, store and send information in a platform–neutral,   object–oriented, plain text format.

Guerilla tactics

The power of XML is unleashed when its placed in the hands of content providers.   However, since copywriters and clients are accustomed to writing in platform–neutral, object–oriented, plain text formats, it means helping them do it unknowingly. Guerilla content management tactics, such as MS-Word–to–XML migration, can be wildly successful.

The basic model for XML migration is to start in a text editor, such as MS Word ,   that can be converted directly to XML, or via RTF, using third party tools.   After conversion to XML, the documents can be used by an XML–aware server, or   converted to HTML using another third-party tool.

Successful migration requires providing content creators with a Microsoft   Word template and a set of basic instructions prior to Web development. The   template must include custom style tags based on the organization of the   pending website.

When using the template, content developers need to avoid   using MS Word formatting options that are not defined within the custom style   tags. If custom tags are insufficient, new tags must be added that reflect   the type of content being addressed.

While the process seems cumbersome, with enough practice, it takes significantly   less time to update site content than using processes without XML – particularly once you harness the power of XML validation.


Websites either evolve or suffer the slow, painful death of neglect. New content   needs to be added. Old content needs to be removed. Missing content needs to   be found. Clients are frustrated by their inability to maintain and manage their web content. Web developers are frustrated by the aftermath. XML can help.

XML–based documents make it easy to find outdated and   missing content at a glance. This is achieved by using XML Data Type Definitions   (DTDs) to identify the timeliness of information and determine what information   “nuggets” must be present within the content.

Like databases, XML documents allow you to validate information, before you   use it, to make sure the content is timely, appropriate, and complete. Since   we’re used to talking about validation as it relates to databases, let’s take   a more detailed look at the database table we created to hold news stories.   In reality, a database table must include definitions for each column:

News Table:




varcharYesMax of 50 characters


VarcharYesSelected from drop-down list


date/timeYesDate added to table
AbstractvarcharYes250 character intro.
BodytextYesAllows text formatting in field

pending - No distribution

      public - Public distribution

      private - Internal distribution

By validating fields, the data table ensures that each news story contains   all of the required information. So, with the proper integration and a web–based   interface, the data table could be an efficient tool for publishing news on the   web.

The XML document with simple DTD validation used for the same information might   look like this:


<?xml version=“1.0”?>




The XML document makes significant contributions to web publishing when compared to the database alone. XML allows data to be validated based on the embedded DTDs, XML tags and attributes. This means that appropriate content can be extracted directly from the XML document based on selection criteria without requiring an interim database,    without requiring a database query,  and without being separated from the source   document.

Using DTD, XML documents suddenly become self–aware.

Substance & Style

XML finds advocates on both sides of the ongoing “content” versus   “style” debate.

XSL (the eXtensible Stylesheet Language), the style sheet language of XML, packs   a wallop. It’s much more robust than Cascading Style Sheets (CSS). Instead   of using rules (as CSS does) to format content, XSL uses (.xsl) templates to describe   how to transform XML into other types of documents. When you implement an XML–based site, XML doesn’t replace HTML. If it sounds a bit confusing, here’s why. When you deal with XSL files, all is not as it appears:

  1. The .XSL file embeds HTML with XML tags and logic that define how information     should be displayed at run time.
  2. At run–time, the .XML file is displayed in the web browser on the fly.
  3. Although HTML formatting included in the .XSL file is applied, it won’t     appear in the source for the .XML document being displayed in the browser.  
  4. The appearance in HTML is based on the combination of XML tags and logic     within the .XSL file.
  5. Because the .XSL file can transform XML in the browser, the document that     appears in the browser may only be a subset of the content in the actual     XML file.

The ability to transform the XML conditionally in a web browser means that content   can be centralized. Parts of the document are displayed or ignored on an as–needed   basis.

Now is the Time

Web developers have been telling others that they are waiting to dabble in   XML until it becomes widely available. The truth is, it’s been widely available for months:

  • Internet Explorer 5 contains an XML engine that fully supports XML 1.0, as defined by the World Wide Web Consortium (W3C). This is a huge improvement       over the engine in IE4.

  • Netscape 6.0/Mozilla includes full XML support.

  • Flash 5 ActionScript supports XML–based data transfer to and from a server.

  • Director has offered an XML Parser Xtra since Director 7.0 that allows       Shockwave movies to read, parse, and make use of the contents of XML documents.

    (Ed.Note: Director’s somewhat buggy XML parser has put off many developers. Reader Hussein Boon recommends Andy White’s user–extensible Lingo scripts instead. Boon also recommends a DOM–Lingo binding that binds Director’s Lingo scripting language to the W3C DOM Level 2.)

  • IIS servers offer XML integration     via the Microsoft XML Parser. Version 4 of the parser supports XML 1.0.
  • SQL Server 2000 provides integrated XML support. It’s the first release       to do so.

  • Microsoft’s       XML technology preview runs under any SQL Server release. Although       the output is slightly different in a few cases, it’s a solid XML environment       for the pre–SQL Server 2000 crowd.

  • Version 2 of the Apache       Cocoon XML, a powerful framework for XML web publishing, been released.     

  • Expat,       an XML 1.0 parser can be used in cooperation with the XML       parser function for PHP. This toolkit lets you parse, but not validate,       XML documents.

        is a platform–neutral protocol for executing programs remotely, “designed to be as simple as possible, while allowing complex data structures to be transmitted, processed and returned.”

This means we’ve all run out of excuses for putting off XML. Today, the benefits   of developing web projects in XML aren’t merely imaginable. They are achievable.

No Comments