What the Hell is XML?

XML (Extensible Markup Language) is the Eurodollar of web development. Both XML and the Euro bring order to chaos; both offer undeniable, wide–ranging
  benefits; both are poised, in 2002, to change the way we do things. Frankly,
  both scare the crap out of people.

Article Continues Below

For web developers, 2002 is a time to conquer fears and take their first hands–on
  approach to XML. It’s time to examine XML and realize the practical benefits
  that it can provide to web projects today.

The bankers can fend for themselves.

XML, HTML & Databases#section2

If you need a good analogy to describe XML to other people, don’t mention HTML.  Although XML looks a lot like HTML, creating a good XML file is more like designing a database than creating a web page.

Databases and XML documents are both used
  as a means to organize data. As a result, they share a lot of similarities.

A database table design for a table containing news stories would
  look something like this:

Table Name:
Table Columns:


  • Headline
  • Category
  • Author
  • Date
  • Abstract
  • Body
  • Status

A basic XML document containing the same information might look like this:

<?xml version="1.0"?>



In addition to these similarities, both databases and XML represent a huge step forward in the ability to publish and manage web content.

XML everywhere#section3

At any scale above that of the small, personal site, database–driven websites are indisputably better at managing, updating, and maintaining content than HTML–only sites. What everyone will discover in 2002 in that XML–driven database sites will prove to be indisputably better than database–driven sites. XML is going to be everywhere.

And as a web developer, you are going to love it.

XML is poised to eliminate more headaches than a bottle of Ibuprofen, improve
  productivity more than cans of Red Bull,  and increase profitability more than
  we’ll want to our clients to know about.

How? Two words: Content management.

Content management & migration#section4

Before projects are initiated by a client, a website usually reaches a stage
  of obsolescence, immediacy, or embarrassment. Web projects are big projects
  with short time lines. It’s not surprising, then, that one of the biggest factors
  influencing the profitability and success of web projects is the ability to
  effectively manage content.

Separation of style, programming, and content#section5

The ability to store a site’s content, programming, and design separately and
  mix them together transparently, on demand, is the art of our craft. Each moment
  eliminating rework and duplication is a dollar in our pocket. It’s time spent
  adding new features to a site rather than rewriting, reworking, and “searching
  and replacing.”

We’ve solved much of the problem with databases, templates, style sheets and
  server–side includes. Much that remains, XML can address. It’s the best tool for managing content – the content itself, not the way text appears on screen.
  XML is used to structure, store and send information in a platform–neutral,
  object–oriented, plain text format.

Guerilla tactics#section6

The power of XML is unleashed when its placed in the hands of content providers.
  However, since copywriters and clients are accustomed to writing in platform–neutral, object–oriented, plain text formats, it means helping them do it unknowingly. Guerilla content management tactics, such as MS-Word–to–XML migration, can be wildly successful.

The basic model for XML migration is to start in a text editor, such as MS Word ,
  that can be converted directly to XML, or via RTF, using third party tools.
  After conversion to XML, the documents can be used by an XML–aware server, or
  converted to HTML using another third-party tool.

Successful migration requires providing content creators with a Microsoft
  Word template and a set of basic instructions prior to Web development. The
  template must include custom style tags based on the organization of the
  pending website.

When using the template, content developers need to avoid
  using MS Word formatting options that are not defined within the custom style
  tags. If custom tags are insufficient, new tags must be added that reflect
  the type of content being addressed.

While the process seems cumbersome, with enough practice, it takes significantly
  less time to update site content than using processes without XML – particularly once you harness the power of XML validation.


Websites either evolve or suffer the slow, painful death of neglect. New content
  needs to be added. Old content needs to be removed. Missing content needs to
  be found. Clients are frustrated by their inability to maintain and manage their
web content. Web developers are frustrated by the aftermath. XML can help.

XML–based documents make it easy to find outdated and
  missing content at a glance. This is achieved by using XML Data Type Definitions
  (DTDs) to identify the timeliness of information and determine what information
  “nuggets” must be present within the content.

Like databases, XML documents allow you to validate information, before you
  use it, to make sure the content is timely, appropriate, and complete. Since
  we’re used to talking about validation as it relates to databases, let’s take
  a more detailed look at the database table we created to hold news stories.
  In reality, a database table must include definitions for each column:






News Table:


Type Required? Notes


varchar Yes Max of 50 characters
Author varchar no  


Varchar Yes Selected from drop-down list


date/time Yes Date added to table
Abstract varchar Yes 250 character intro.
Body text Yes Allows text formatting in field
Status varchar Yes

pending – No distribution

      public – Public distribution

      private – Internal distribution

By validating fields, the data table ensures that each news story contains
  all of the required information. So, with the proper integration and a web–based
  interface, the data table could be an efficient tool for publishing news on the

The XML document with simple DTD validation used for the same information might
  look like this:



<?xml version=“1.0”?>





The XML document makes significant contributions to web publishing when compared to the database alone. XML allows data to be validated based on the embedded DTDs, XML tags and attributes. This means that appropriate content can be extracted directly from the XML document based on selection criteria without requiring an interim database, 
  without requiring a database query,  and without being separated from the source

Using DTD, XML documents suddenly become self–aware.

Substance & Style#section8

XML finds advocates on both sides of the ongoing “content” versus
  “style” debate.

XSL (the eXtensible Stylesheet Language), the style sheet language of XML, packs
  a wallop. It’s much more robust than Cascading Style Sheets (CSS). Instead
  of using rules (as CSS does) to format content, XSL uses (.xsl) templates to describe
  how to transform XML into other types of documents. When you implement an XML–based site, XML doesn’t replace HTML. If it sounds a bit confusing, here’s why. When you deal with XSL files, all is not as it appears:

  1. The .XSL file embeds HTML with XML tags and logic that define how information
       should be displayed at run time.
  2. At run–time, the .XML file is displayed in the web browser on the fly.
  3. Although HTML formatting included in the .XSL file is applied, it won’t
       appear in the source for the .XML document being displayed in the browser.
  4. The appearance in HTML is based on the combination of XML tags and logic
       within the .XSL file.
  5. Because the .XSL file can transform XML in the browser, the document that
       appears in the browser may only be a subset of the content in the actual
       XML file.

The ability to transform the XML conditionally in a web browser means that content
  can be centralized. Parts of the document are displayed or ignored on an as–needed   basis.

Now is the Time#section9

Web developers have been telling others that they are waiting to dabble in
  XML until it becomes widely available. The truth is, it’s been widely available for months:

  • Internet Explorer 5 contains an XML engine that fully supports XML 1.0,
    as defined by the World Wide Web Consortium (W3C). This is a huge improvement
          over the engine in IE4.

  • Netscape 6.0/Mozilla includes full XML support.

  • Flash 5 ActionScript supports XML–based data transfer to and from a server.

  • Director has offered an XML Parser Xtra since Director 7.0 that allows
          Shockwave movies to read, parse, and make use of the contents of XML documents.

    (Ed.Note: Director’s somewhat buggy XML parser has put off many developers. Reader Hussein Boon recommends Andy White’s user–extensible Lingo scripts instead. Boon also recommends a DOM–Lingo binding that binds Director’s Lingo scripting language to the W3C DOM Level 2.)

  • IIS servers offer XML integration
       via the Microsoft XML Parser. Version 4 of the parser supports XML 1.0.
  • SQL Server 2000 provides integrated XML support. It’s the first release
          to do so.

  • Microsoft’s
          XML technology preview runs under any SQL Server release. Although
          the output is slightly different in a few cases, it’s a solid XML environment
          for the pre–SQL Server 2000 crowd.

  • Version 2 of the Apache
          Cocoon XML
    , a powerful framework for XML web publishing, been released.

  • Expat,
          an XML 1.0 parser can be used in cooperation with the XML
          parser function
    for PHP. This toolkit lets you parse, but not validate,
          XML documents.

       is a platform–neutral protocol for executing programs remotely, “designed to be as simple as possible, while allowing complex data structures to be transmitted, processed and returned.”

This means we’ve all run out of excuses for putting off XML. Today, the benefits   of developing web projects in XML aren’t merely imaginable. They are achievable.

About the Author

Troy Janisch

Troy Janisch is president and founder of the Icon Interactive Group. He built his first XML-based site in January, 2002.

No Comments

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA