Using XML

During my second lecture to an XML class at a local
community college, I explained how XML lets you define your own markup language with custom tags and attributes. I had finished defining a simple markup language for use
with a list of amateur sports clubs, and had displayed a sample document
written with that markup. At that point, one student asked:

Article Continues Below

“Isn’t it inefficient to have to type all those tags for
every club? What good is this? It looks nice, but what can I
do with this document? How can I put this in a web page or use it with
other programs? Wouldn’t it be easier to just use HTML or a
database/word processor/fill-in-the-blank?”

The reason that we use XML instead of a specific application is that
XML is not just a pretty face, living in isolation from the rest
of the computing world. XML is more than a rulebook for generating
custom markup languages. It is part of a family of technologies, which,
working together, make your XML-based documents very useful indeed. To
demonstrate what I mean, I decided to create a new XML-based markup
language from scratch, and show what you can do with a document written
in that language, using off-the-shelf tools.

Creating a New Markup Language#section1

The language that I created stores the nutritional
information that you find on food labels in the United States. The
document starts with a <nutrition> tag, followed by
a <daily-values> element that gives the maximum
amounts of fat, sodium, etc. for a 2000-calorie-a-day diet, and the
units in which the amount is measured.

The daily values are followed by a series of
<food> elements, each of which gives information
about a specific food and its nutritional categories. Because the
<daily-values> element has already defined the units
in which each category is measured, we don’t need to repeat them
for every food; we just enter the numbers for that particular
food’s total fat, sodium, etc. After the last food, we close the
document with a closing </nutrition> tag.

<nutrition><!-- Establish the daily values -->
<daily-values>
<total-fat units="g"> 65 </total-fat>
<saturated-fat units="g"> 20 </saturated-fat>
<cholesterol units="mg"> 300 </cholesterol>
<sodium units="mg"> 2400 </sodium>
<carb units="g"> 300 </carb>
<fiber units="g"> 25 </fiber>
<protein units="g"> 50 </protein>
</daily-values><p><!-- Now list the individual foods --></p><food>
<name>Avocado Dip</name>
<mfr>Sunnydale</mfr><serving units="g"> 29 </serving>
<calories total="110" fat="100"/><total-fat> 11 </total-fat>
<saturated-fat> 3 </saturated-fat>
<cholesterol> 5 </cholesterol>
<sodium> 210 </sodium>
<carb> 2 </carb>
<fiber> 0 </fiber>
<protein> 1 </protein><vitamins>
<p>    <a> 0 </a><br />
    </p><c> 0 </c>
</vitamins><minerals>
<p>    </p><ca> 0 </ca>
<p>    </p><fe> 0 </fe>
</minerals>
</food><p><!-- etc. --></p>
</nutrition>

You may see the entire document
that is used for the examples in this article. All the numbers
are real; only the manufacturers’ names have been changed
to protect the innocent and avoid lawsuits.

A quick note: vitamins and minerals are measured in percentages, not
grams or milligrams. That’s why we don’t need to establish
any units or maximums for them in the <daily-values>
element.

I entered the data by hand using the nedit program on
Linux. I could have used any editor that lets me save files
as plain ASCII text; notepad on Windows or vi on Linux would have done
equally well. To make data entry easier, I created an empty
“template” for a food, which you see at the bottom of the
file. I copied and pasted it for each new food, so that I didn’t
have to type the tags over and over again.

Immediate Benefits#section2

What have we bought by creating this XML file in a text
editor rather than creating an HTML document or a spreadsheet or data
base? First, the data is structured; it’s not just a mass of
numbers in an HTML table or a text file of tab–separated values.
Because of the custom tags, it’s something that humans can read
and understand. It’s also open; we don’t need some
expensive, proprietary software to extract the information from a
binary file. So, as a transport medium, XML already serves us
nicely.

Validating the Document#section3

Even if you’re the only person who ever enters
data into the document, you’d like to be able to check that you
haven’t left out any information or added extra tags.
Additionally, you’d like to be sure that your percentages are all
between 0 and 100.

This becomes even more important if many people enter data. Even if
you give other folks instructions on the proper format, they may ignore
it or make errors. In short, you would like to have the computer help
you determine that the data in your documents is valid.

You do this by creating a machine-readable grammar which
specifies which tags and attributes are valid, and in what
combinations, and what values your tags and attributes may contain.
You then hand your document and the grammar to a program called a
validator, and it checks that the document matches your
specifications.

One machine-readable form of specifying such a grammar is a notation
called Relax NG. Relax NG is, itself, an XML-based markup
language. Its purpose is to specify what is valid in other
markup languages. This isn’t as crazy or impossible as it
sounds. After all, books that tell you how to use English grammar
correctly are also written in English.

For example, one of the specifications of our nutritional markup
language is that the <calories> element is an empty
element, and it has two attributes, the total attribute
and the fat attribute. These must both have decimal
numbers in them. We say this in Relax NG as follows:

<element name="calories">
<empty/>
<attribute name="total"><data type="decimal"/>
 </attribute>
<attribute name="fat"><data type="decimal"/>
 </attribute>
</element>

When we pass nutrition documents through the validator with this
document, the validator will tell us that the first tag below
is correct, but the second one isn’t.

<calories total="100" fat="10"/>
<calories total="217" fat="don't ask!"/>

You may see the entire grammar
specification for the nutrition markup here
. You may
also find
out more about Relax NG
. By the way,
Relax NG is not the only game in town if you want to specify
grammar. You may use something called a DTD (Document
Type Definition), which is not as powerful
as Relax NG; or you may use XML Schema, which is
about as powerful as Relax NG, but far more complex to learn.

Try it!#section4

If you are feeling adventurous, you may want to try these
files yourself. You will need some XML tools in order to
do this. Here is how to set up the tools
for Windows
, and here’s the setup for Linux.

To validate a file, go to the command prompt if you are using
Windows, or go to a console window and get a shell prompt if you
are using Linux. Then use the batch/shell file described
in the setup instructions to invoke
the Multi-Schema Validator:

msvalidate nutrition.rng nutrition.xml

Now What?#section5

Although we can enter readable data and check to see if it’s
OK, we still can’t do anything with it. If we display it
in a browser, we just see the text all squeezed together. That’s
because the browser doesn’t know how to display a
<food> or <vitamins> tag.

Displaying the XML#section6

If you are using the very latest browsers, you can
attach a stylesheet to the XML file. We have done that in
this example by putting this line at the top of file
nutrition.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" 
 href="nutrition.css"?>
<nutrition></nutrition>

The style sheet that we write for file nutrition.css
looks very much like the style sheets that you use with your HTML
files. The difference is that we assign styles to our new nutrition
tags, not to the standard HTML tags. For example, to say that a
food’s manufacturer should appear in 16 point italic type without
starting a new line, you would write:

mfr {
    display: inline;
    font-size: 16pt;
    font-style: italic;
 }

Once you have created
the entire stylesheet in the same
directory as the XML file, you can open the XML file in a
modern browser such as Mozilla, and it will display the information.

Transformation—A Better Way#section7

The problems with the stylesheet are that:

  • It only works with the very latest browsers that handle
    Cascading Style Sheets Level 2.
  • It can’t extract all the information (for example, the units
    don’t show up in the output document because they are
    “hidden” in the attribute values.
  • It can’t calculate percentages.

Additionally, the markup we’ve invented here is data-oriented;
it is designed to describe data to be stored or to be transmitted to
other programs. In these documents, the order of elements and the type
of data in each element is fairly rigid. Stylesheets work better with
narrative-oriented markup documents. These are documents which are
generally meant for human reading, and are more “free-form”
than data-oriented documents. Examples of narrative-oriented markup are
XHTML, DocBook (a markup for writing books and articles), and NewsML
(for writing news reports).

In order to get around these problems, we can use XSLT,
Extensible Stylesheet Language Transformations, to convert the
nutrition file into other forms. XSLT is, again, another XML-based
markup language. Its purpose is to describe how to take input from one
XML file (the “source document”) and output it to a result
document. XSLT has the flexibility to extract data from attributes as
well as element content, and it can do calculation and sorting upon the
data in the source document.

This power makes XSLT a key technology in the XML family of
technologies. For a good introduction, read
Norman Walsh’s excellent presentation on the subject or
this
hands-on tutorial
.

Transformation to HTML#section8

The first
XSLT file, which you may see here
, converts the nutrition document
into a very plain HTML file suitable for display on any browser on a
desktop or PDA. To do the transformation, you’d type this
command:

transform nutrition.xml 
nutrition_plain.xslt nutrition_plain.html

The result of the transformation is an HTML file named nutrition_plain.html,
which you may open in any browser you like. Even this simple
transformation has done two things that we could not do with CSS:
it uses the information in attributes to display the units for each
nutritional category, and it calculates percentages of the daily
values.

Fancy Transformation#section9

OK, so maybe you want something a bit fancier. Here’s a more complex
transformation
which sorts the data by the ratio of fat calories to
total calories per serving; sort of a “healthiness
index.”

If you have saved the XSLT in a file called
nutrition_fancy.xslt you can type this command:

transform nutrition.xml 
nutrition_fancy.xslt nutrition_fancy.html

That produces a file named
nutrition_fancy.html,
which looks remarkably different from the plain version. It uses
Cascading Style Sheets to produce the little bar graphs; you’ll
need a modern browser like Internet Explorer 5+ or Mozilla/Netscape 6
to see the effect. Notice that XSLT lets you pick and choose the data
you want to display; the information about carbohydrates, fiber,
vitamins, and minerals are omitted in the fancy version. (They
could, of course, be added by changing the XSLT file.)

We have used XSLT to take the source XML file and transform
it to two different HTML files; a plain version that is suitable for
display on old browsers and PDAs, and a fancier version that is
suitable for use with desktop computers and modern browsers.

Non-HTML Transformation#section10

But wait, maybe you don’t want HTML;
there’s more than just browsers in the world, you know. You might
want to take the data and convert it to a text file of tab–separated
values for import into a spreadsheet or database program.

Here is a
transformation file that does this
, using this command:

transform nutrition.xml nutrition_csv.xslt nutrition.csv

And here’s the resulting text
file
.

Conversion to Print#section11

Let’s say you want to create a PDF file from your
XML. That’s possible by using a transformation to change the XML to
another markup language: XSL-FO (Extensible Stylesheet
Language – Formatting Objects). This is a page layout language. A tool
called FOP (Formatting Objects to PDF) takes that markup and
creates PDF files for you.

Here is a transformation file which
takes the nutrition data and converts it to formatting objects. If
you save it in nutrition_fo.xslt, you can use FOP to do
the conversion to PDF:

fop -xml nutrition.xml -xsl nutrition_fo.xslt -pdf nutrition.pdf

The result is a PDF file; it
produces pages that are approximately 8 centimeters wide and 9
centimeters high, which fits comfortably into a shirt pocket.

Generating Graphics#section12

Finally, you may wish to create an interactive, graphic
version of the data. Another XML-based markup,
SVG—Scalable Vector Graphics— gives you this
capability. SVG has elements like the following, which draw a black
diagonal line and a yellow circle with a green outline:

<line x1="0" y1="0" x2="50" y2="50" />
<circle cx="100" cy="100" r="30" />

By using a transformation file that
produces SVG
, we can construct a graphic that shows a bar graph for
the food whose name you click. Here’s what you type:

transform nutrition.xml nutrition_svg.xslt nutrition.svg

You may display the result with the SVG browser that is part of the
Batik toolkit. If you have installed Batik as per the instructions
given for Linux or for Windows, you type
batik�nutrition.svg. I have not tested the file with
the latest version of the Adobe SVG
Viewer
, but it should work nicely. Here is a screenshot;
click it to see it full size.

bar chart showing categories for a given food

Other Ways to Use the XML Tools#section13

In this article, we’ve used the Multi-Schema
Validator, Xalan Transformer, FOP converter, and Batik viewer from the
command prompt. That’s the fastest and easiest way to get things
working so that you can have an experience of what XML can do.

The batch or shell file approach would work in a production
environment where you generate a whole website’s worth of HTML
files from one or more XML files at regular time intervals. You just
set up a batch job to run at scheduled times (a cron job
in Unix terms) to generate the files you need.

What if you need to generate HTML pages or PDF files dynamically
in response to user requests? Obviously, you don’t want the overhead
of starting a Java process every time a request comes in, and a
static batch file certainly won’t do the trick. Both the
Multi-Schema Validator and Xalan have an API (Application Program
Interface) and can thus become part of a Java servlet running on
your server and handling dynamic user requests. Once a servlet is
loaded, it stays in memory, so there is no extra overhead for
subsequent uses of a transformation.

If you are interested in running servlets, one option is to use the
Jakarta Tomcat servlet container. It can run as a stand-alone server for testing or as a
module for either Apache or Microsoft IIS.

Timing#section14

There are two aspects to timing: how long it takes to
write the grammars and transformations, and how fast they run.

Designing the markup
language took me about 25 minutes, and entering the data took me
another 25 minutes, some of it running out to the kitchen to grab items
from the shelf or refrigerator. Writing and testing the Relax NG
grammar required 30 minutes.

The Cascading Style Sheet for displaying the XML directly in Mozilla
took all of 15 minutes to write. The “plain HTML”
transformation took about 50 minutes, including time for looking up
some XSLT constructs and doing some experimentation. The
“fancy” transformation took 45 minutes. I needed 20 minutes
to figure out how to do the bar graphs with stylesheets in the first
place, and I used another 5 minutes for minor aesthetic adjustments.
The file for conversion to tab–separated values was a fifteen-minute
job.

The transformation for PDF took an hour. The first time through, I
designed it for paper the size of a compact disc insert. I thought
better of it, and decided to reduce it to shirt-pocket size. That took
another 30 to 45 minutes of tweaking and getting the font sizes just
the way I wanted them. I also had to make some changes to avoid using
parts of XSL Formatting Objects that FOP does not implement yet.

Finally, the SVG transformation took an hour and a half to write.
About half that time was experimenting to get everything positioned
nicely and making the ECMA Script interaction work properly.

You don’t have to be an expert at Relax NG, XSLT, XSL
Formatting Objects, or SVG to do this. I don’t use any of these
techonlogies on a daily basis. I just know enough about each of them
to get things to work. In this case, my philosophy was “the first
way you think of that works is the right way.” That is why XSLT
experts will be shocked when they see an inefficient construct like
this in the plain HTML transform file.

select="/nutrition/daily-values/*[name(.)=name($node)]/@units"

This is not to say that there is no learning involved here; you will
need to spend some time on that. You don’t need to spend a
lifetime on it, though. It is definitely possible to learn enough about
these technologies to put them to effective use in a short time.

Performance#section15

I tested all of these files on a 400MHz AMD K-6 with
128Mb of memory running SuSE Linux
7.2. For the transformations, I modified the SimpleTransform.java
sample program that comes with Xalan. This program records the total
time to generate the output and the time involved in transformation
after the XSLT file has been parsed. If you are running transformations
on a server, you can cache the parsed XSLT file, so the overhead for
parsing occurs only once.

Transformation Time in seconds
Total Transform
Plain HTML 3.691 1.018
Fancy HTML 4.057 1.409
Tab–separated Values 3.057 0.548
SVG 3.386 0.689

I measured the time for the PDF transformation with the
Linux time command. Generating the file took
15.115 seconds real time, with 10.920 seconds of user CPU time.

Of course, these are not the only tools available. There are other
XSLT processors and other programs for converting XSL Formatting
Objects to PDF. I chose MSV, Xalan, Fop, and Batik because they are
free, easy to use, and I was already familiar with them.

Summary#section16

  • Using XML-based markup gives your document structure,
    and makes it readable and open.

  • XML is part of a family of technologies.

  • You can use grammar markup languages like Relax NG
    or XML Schema to validate
    your documents.

  • You can use XSLT transformations to repurpose a document.
    A single document can serve as the source for XHTML, plain text,
    PDF, or other XML markup languages like SVG.

  • Programs which do validation and transformation are freely
    available and easy to use.

These capabilities exist right now, and they are easy to learn and
utilize. That is why XML is good, and why people are so excited about
it once they start to use it.

You may download the
XML files and the resulting HTML, text, and PDF files
.

69 Reader Comments

  1. Wow! I have had so many questions regarding XML and how it should be effectively used. This “tutorial” answered most of my questions right off. Zeldman would have me believe that the next tutorial was weeks away when it would be unveiled the next day. What a pleasant surprise. My thanks to you.

  2. Kinda thought this is not allowed:
    style=”background-color:green; width:5″>
    Values cannot stand alone in CSS, do they? So it should say 5px, 5% or something.

  3. Nope, the units must be specified. IE in quirks mode automatically assumes you want px, and I suspect other programs do so as well.

  4. Nice article thanks.

    One question, is there an app which takes an XML file and allows for flexible and easy data entry? There is no chance in hell I can get our salesteam or non-techy guys to use Notepad to enter data into the XML file. Sure I could go and write ASP apps for each XML file, but that is a chore. And it seems to me as if XML is quite a qood language to support a flexible generic data-entry app.

  5. Scary case of deja vu here. The new work website that (finally) went on line uses very simliar techniques, xml articles that are transformed by xslt stylesheets (depending on the viewers browser) to produce the output. Reading this reminded me of the fun I had making the system.
    One really good reason I found for self specified makup was it stops people thinking “h1 looks like this” and instead makes them think about the structure.

  6. just been reading loads of stuff about XML / XSL / XSLT / XPATH, this is the clearest and most enjoyable article(yes i used the E word) i have read yet. trust the guys @ alistapart to get the goods. now i just have to learn it….

  7. Nice article, unfortunately the setup instructions for batik don’t lead anywhere!

    Overall though a well written introduction to the power of XML and XSLT.

  8. Use something like:

    serving:after {
    content: attr(units)
    }

    This technique can be especially helpful for, for example, printable stylesheets, where link URI’s, image titles, acronym defenitions and so on can be displayed, ala:

    @media print {
    acronym:after {
    content: ” (” attr(title) “)”
    }
    }

    Failing adding this, Mr Eisenberg should at least close the brackets where he says CSS can’t do this 😉

  9. Mr. Eisenberg, you’ve done a great thing here!

    I’ve a number of friends I’ve been trying to turn on to XML, and I’ve had some grey areas myself.

    While you gave the executive summary on technical details, you have given enough information to really dig in.

    On the positive side, you have demonstrated a large number of great XML facets in a practical manner.

    Thanks for the excellent article! I’ll be sending my friends to read it. 🙂

  10. Freaky,
    attr(title) is CSS3, which is poorly (or not at all) implemented in even modern browsers. Mr. Eisenberg was trying to give a -practical- application, and glossed over this detail for the clarity of the point– CSS is not sufficient for all rendering needs.
    Even accepting full CSS3 support, the reflowing and calculations often needed for XML to be presented in various representations requires a more robust language– a programming language.
    CSS is elegant and powerful for it’s purpose, but it (intentionally) can’t do everything. Modularization, baby!

  11. Finally! I’ve been asking for YEARS now if XML meant double-markup (one for XML, one for XHTML), and the answer is NO. Just link to a style sheet with identical tags.

    Dude. Sweet. Good article.

  12. Great article I had been following XML for a while, and I found this artice to be well-written and informative. One question I have is why he didn’t discuss client-side transformations using modern browsers and XSLT. I know it’s possible, heck I’ve done it, but he didn’t mention it at all. Something else I’m curious about is whether or not browsers for the Palm Pilots can handle XSLT. I know they parse XML but I don’t know how they would handle XSLT. Well great article (once again). I really enjoyed it.

  13. Oops. You are correct.I forgot the “px”. In file nutrition_fancy.xslt, change line 113 of nutrition_fancy.xslt to: background-color:green; width:px

    Change d line 116 to: background-color:red; width:px and run the transform again. That’s another nice thing about generating HTML via XSLT; you don’t have to fix a website’s worth of files by hand – just run a batch file agan.

  14. I don’t know if I’ll get flamed for adding this but I thought it should be mentioned.

    Flash is also a font-end option for XML data. Just goes to show the beauty of XML, that you can use it anyway you see fit.

  15. Another use for XML: a general purpose container for data.

    It’s easy to add/remove/modify elements in the XML data (on the server side.) Once you have the XML data in a DOM object, you can massage the data quite easily. It’s also easy to save these changes by serializing the document.

    Of course, all of the transformations that David presents can be applied to any updated data.

    For a content management product that I’m working on now, we are using a DOM to cache information from searches and other expensive backend services. Keeping this cached object at the web tier reduces round-trips to the application server and provides a great deal of flexibility in presentation (ie. the cache can be transformed via XSL to different types of views.)

    Another point: many of the backend services (like databases) can be configured to produce output in XML. And any data that’s retrieved over a SOAP connection will be in XML. Creating the cache in XML takes very little effort.

  16. Using XSLT to transform XML documents into (X)HTML for different browsers is mentioned as a possibility. In fact, one of those who posted mentions that they do just that [http://www.alistapart.com/stories/usingxml/discuss/#ala-705].

    How is this done? I assume that some sort of browser detection is required, but I’m not sure how one would implement browser detection on the server side?

    I suppose you could parse the HTTP request’s user agent string before sending a response. But deciding which browser is which from the user agent string is a complex guessing game. How do we simplify it?

  17. I’m getting involved with an XML to HTML project… and this is how I’m doing it.

    A complex newspaper software generates XML documents (like articles) with a lot of great markup.

    I then parse that XML and stuff it into a database, defining fields based on tags.

    Then, on the web, someone comes to my database driven web site and I define styles as I wish, usually based on the customer’s custom specifications. I just don’t see the pre-application of styles to an XML document as efficient. I also think that I’m not quite getting the big picture… but thanks to articles like this one, things are beginning to come together.

  18. Good article, though I find the idea that manufacturer names had been changed to be ridiculous to the extreme. Lawsuits? The innocent? Now we are at the point where plainly descriptive facts cannot even be used. Harumph.

    Anyway, more to the point, I would love it if there was a similar tutorial expanding on the theme of generating multiple documents from a single XML file. There was a tantalizing hint in the mention of cron jobs, but I would love more.

    I have several XML files that include many records, each of which I would like to put on separate pages. (And, of course, also create navigation among the pages.) Right now I use a server-side kludge to dynamically create a page based only on a single part of the XML tree. I would love to generate separate pages to reduce server load, facilitate indexing, etc…

    Anyone know of another good article on this?

  19. The mention of cron jobs was with the intention of running all your 231 XML files through an XSLT transform to create 231 HTML files. On the other hand, you want to take one XML file (say, containing a three-chapter book) and generate four HTML files, one per chapter plus an index file.
    This is easily possible with XSLT extensions. Xalan, Saxon, and XT (three well-known XSLT processors) all ship with extensions that allow you to generate multiple output files from a single input file. This is detailed in Chapter 8 of XSLT, by Doug Tidwell, published by O’Reilly. I’ve used it before, and it works great.

  20. In fact, what you’re asking for is called server-side content negotiation.

    HTTP was designed to allow negotiation of content on both the client and server sides.

    In this case, you wish to negotiate on the basis of whether a browser accepts a particular representational language. MIME Types may be sufficient for this purpose (though they are insufficient for other types of negotiation).

    Here’s an RFC on the topic.
    http://www.ietf.org/rfc/rfc2295.txt

    Unfortunately, content negotiation is still a largely un-standardized and un-implemented facet of the web.

    I feel that it is under-addressed. While it is largely a technical issue (in that no one user will likely cry out for the need for resources to have alternate representations), it is, IMHO, vital to the long-term health of the web.

    It is the geek version of accessibility. I mustn’t crank out purely-XHTML sites, even if it is current and “cool”. I must continue to provide HTML. I mustn’t use PNG exclusively, I must continue to provide JPG. And so on.

    Thus far, browser sniffing has been good enough to get the job done, but as more browsers on various devices come out, and as new web languages proliferate (XSLT, anyone?), it will be necessary to provide alternate representations based on automatically negotiated user agent capabilities.

    I’d like to see WaSP take up this torch. If they are for the long-term health of the web, then surely backwards-compatibility must *at some point* be established and agreed to.

    It’s fine to say that we will let NN4 quietly die for its maverick implementations, but what about when *standards compatible* browsers are old, and the standards they implement are no longer the flavor of the week?

    …I’m off the soap box, now.

  21. Excellent article. Timely, well written, and easily understandable.

    One thing I was wondering about: with google and amazon releasing API’s for their services and the same being accessible through SOAP, etc., I was wondering if someone could speak quickly to how the contents of this article relate to the use of XML in service based applications.

    Any help would be greatly appreciated.

    Jason

  22. What are the advantages/disadvantages of using xml over a db?

    Why not just store your data in a db and use whatever you want to produce your doc of choice (php, cf, asp, etc.)? Seems much easier, especially when it comes to browsers. Isn’t that the best solution for 99% of the cases?

    If you really need to pass some generic text representation of the data, you could output an xml doc, though I would think in most cases you could simply output the end product directly. Services, I can see, might need a generic text representation.

    (Is it fair to call xml a text db?)

  23. Because they’re doing different things. A lot of data-types that can be represented well by xml, can’t be done well by dbs and vice-versa. You know, use each tool where it works best?

  24. The one problem I had with this article was that it claims you can’t show attribute values via CSS. That’s wrong (unless you’re using IE).

    For example, `element:after {content: ” [” attr(name) “] “}`, would display the value of the element tag’s name attribute value, in between square brackets. This works in at least Mozilla 1.0, and Opera 6.04. Instead of using the :after pseudo class, one could choose the :before pseudo class, to display the attribute value before the element. There’s a whole section in the CSS 2 spec about text or content generation – http://www.w3.org/TR/CSS2/generate.html

    Overall, I liked this article.

  25. I’ve been struggling to understand the heady world of xml, and this put the final bridges together in my mind to understand the different bits, thank you squire!

  26. “what about when *standards compatible* browsers are old, and the standards they implement are no longer the flavor of the week?” – Jeremy Dunck

    Standards are made in such a way that when a standards compliant browser recieves a page that contains things it doesn’t understand (such as a new CSS property, or a new style sheet language entirely, or a new scripting language), it will ignore the parts that it doesn’t understand. If the markup is well written, however, then the browser will still be able to display the *content* of the page, which, I hope we’ll all agree, is the most important part. All the user will miss out on is some nice visuals.

    The standards themselves are what are backwards compatible. For instance, XHTML was created in such a way that a browser that doesn’t understand XML will still recognize it as HTML 4.01.

    About the article: I’m happy to see this article, ’cause there are a lot of people who only half understand XML, and this shows a lot of its power. Well written =)

  27. Standards are made in such a way that when a standards compliant browser recieves a page that contains things it doesn’t understand (such as a new CSS property, or a new style sheet language entirely, or a new scripting language), it will ignore the parts that it doesn’t understand. — Slime

    Within a particular Recommendation’s evolution, I can agree with this statement. The rules for evolving HTML were well understood. The rules for evolving XML is well understood. The rules for evolving CSS is well understood. Yep.

    But an old standards-compliant browser that doesn’t understand XSLT will be unable to do anything with an XML file which was intended to be transformed. Indeed, the “content” rendered by untransformed XML will likely be unusable, even if the browser chooses to render it, because logical filtering, calculations, and tree restructuring will not have occured.

    Likewise, that browser which doesn’t understand XML namespacing will not render an XHTML document correctly, should it also contain namespaced SVG, for example.

    In such a case, different representations must be presented to the old browser on the basis of client-side or server-side content negotiation. Backwards compatibility of HTML 4.01 with HTML 2.0 is irrelavent in the situation I am describing.

    ==============================
    The standards themselves are what are backwards compatible. For instance, XHTML was created in such a way that a browser that doesn’t understand XML will still recognize it as HTML 4.01. — Slime

    Actually, this is incorrect for a couple of reasons.

    First, XHTML 1.0 -can be- made to also conform to HTML 4.01, if the XHTML complies with restrictions made in the XHTML spec [XHTML].

    Such a document can be served as either text/html or application/xhtml+xml. [2]

    Any XHTML document which does not adhere to the restrictions made in the previous reference (for whatever reason) may cause problems with HTML browsers.

    In this case, it is not valid to describe the content as text/html, and should be served as application/xhtml+xml [XHTMLMediaTypes]. In this case, a browser which does not understand the application/xhtml+xml MIME type will generally popup a Save As dialog, or some such catch-all behavior.

    Slime, thanks for taking the time to respond, but your rebuttal only illustrates my point– people don’t seem to understand the purpose of MIME types, or the concept and necessity of content negotiation.

    -Jeremy

    [XHTML]
    http://www.w3.org/TR/xhtml1/#guidelines

    [XHTMLMediaTypes]
    http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020430/

  28. >>A lot of data-types that can be represented well by xml, can’t be done well by dbs and vice-versa.

    Can you give some examples? Why can’t you have columns and rows in a db that are the same as the data in XML?

    Looking at the examples in the article, all them seem far easier to draw out of a db using php/cf/asp. I can’t imagine implementing them on any of my sites. When is it really necessary to use XML? (Again, I can see how services might need a generic text representation of the data.)

  29. I’m looking forward to these examples, as well.

    However, another benefit of XML over a DB is that the XML can be repurposed. What I mean is, you’re not going to send your proprietary DB over to anyone that’d like to use your data.

    What XML gives you is a well-structured way to easily transport any old data in a rigorous format, which is well supported by tools.

    CSVs can’t hold a candle to XML, I hope you’ll agree.

  30. I’ve been looking forward to ALA 147 and have not been disappointed. Thanks Mr. Eisenberg and everyone at ALA for their continued good work!

    I’ve been looking (unsuccessfully) for easy validation for Mac OS X, similar to the setups mentioned for Linux and Windows. I wouldn’t be surprised if it’s built into the OS and I just haven’t found it–a lot of the files (e.g., .plists) are XML and there are directories for DTDs. Any suggestions?

  31. Looks like that’s a part of CSS that I hadn’t read yet. You can extract the attributes for the element in question (show the units=”g” on the element), but you can’t use CSS2 to reach up and grab units from the in when showing the for the potato chips.

  32. Okay, I’m curious. I keep seeing XML info, and articles devoted to implementing it.. But there’s an underlying question, at least to my mind.

    Why?

    Yes, I’m serious. I currently use a MySQL database for data storage, and PHP to access that data and “transform” it into whatever I need. I can’t see using XML as any better – In fact, it seems to be much more difficult, to me. So why should I switch?

    It seems to me that this is just another “buzzword” for the marketing types: “Ooh! Let’s use XML!”

  33. Thanks for the help! I’m working through it and msvalidate works fine. I ended up putting the shell scripts in ~/bin where the worked fine. (They didn’t seem to work in while in ~/xmlapps even though I used chmod and rehash. I’m not sure why not, but there’s always something to learn!)

  34. The key concept to understand is an xml document actually performs the role of a database (or datasource) which can be queried (using xpath) to find what ever data is desired, and then quickly output and formatted (using xsl). XML is always presented as a document format, which is wrong. XML has nothing to do with documents except that it happens to be easy to create an xml-encoded file with a text editor.

    XML is very good at representing recursion, which makes it unique when compared to other database formats or data representations. But resursion does not have much to offer designers. It is the darling of programmer types like me.

    In the real world it is much more difficult to create useful webapps out of XML/XSLT than it might appear from the article, which glosses over many very problematic issues the main one being where you will get your xml-encoded data from. If from a database, you might as well work directly with JDBC/JSP. XML only becomes practical as a middle layer in a very sophisticated project where the participants understand how to plan extensively–something that almost never happens in the real world! Don’t get bogged down with xml unless you know what you are doing and exactly what benefits your project is supposed to derive from the extra effort involved in supporting an additional layer between a SQL database and your presentation layer.

  35. «Yes, I’m serious. I currently use a MySQL database for data storage, and PHP to access that data and “transform” it into whatever I need. I can’t see using XML as any better – In fact, it seems to be much more difficult, to me. So why should I switch?»

    That’s a good question, and one I’ve thought about to great length. As has been mentioned, a little further up the page, the main benefits are in cross-platform compatibility, ease of data transfer, and the benefits of the data entry style.

    Why not compromise? Have you read up on the PHP functions relating to XML files? If not, take a look at http://www.php.net/manual/en/ref.xml.php. I’ve begun experimenting, just out of interest, with a CMS using XML for data storage and PHP for parsing and displaying the data. It’s quite nifty.

  36. I’ve been programming in PHP for the past three years and didn’t used XML since. A college of mine gave me the URL adres of this article because a friend had an idea for something which had to use XML so I start reading.

    I think XML is simple to use (it sounds simple in my ears) and I think it’s easy to do if you have the knowledge about HTML. It’s like ‘something the same’ as HTML but in advantage using your own defined tags. It was a great help using your examples so I could see how it worked and what it did (I’m a visual person, like to see things happening).

    Thanks for offering this great article to us!

    Bas (Netherlands)

  37. seems like the right place to ask for help…

    I am trying to send an XML formatted job to Jobserve.co.uk without success.

    I have the details of how to do it but it doesn’t explain exactly the syntax required for ASP to connect to their server.

    The manual says that you build up the XML string and then says:

    “Using an HTTP POST program add a HTTP header called “˜SOAPMETHODNAME’ with value “˜PostAdvert_IT’ Post the string representing the SOAP XML to the specified URL.”?

    So far I have got:
    Set xmlHTTP = server.Createobject(“MSXML2.ServerXMLHTTP”)
    xmlHTTP.open “POST”, strAddress, False
    xmlHTTP.setRequestHeader “SOAPMethodName”, strMethod
    xmlHTTP.send strXML
    POST = xmlHTTP.responseText
    Set xmlHTTP = Nothing

    when I run the code I get:

    Error Type:
    msxml3.dll (0x80072EE6)
    The URL does not use a recognized protocol
    /xmltest/process.asp, line 84

    Any Ideas anyone ???? Please help

  38. I have been involved with web design, to some varying degree, throughout it’s inception. I have always been “shy” of XML due to it’s lack of real “community” acceptance. After walking through the examples and using the technologies that Mr. Eisenberg presented, I feel LIBERATED!

    Thank you for showing me an example of XML implementation in terms that even someone such as myself could understand. I now share the exuberance that so many of my colleages have felt for some time: XML is the way!

    I have plans to redesign several of the “content management” systems that I have written in ASP, Perl and Java to reflect my new-found wisdom and revelations about this wonderful technology.

    William Dodson

  39. In the article, Mr. Eisenburg says the following :

    “Once you have created the entire stylesheet in the same directory as the XML file, you can open the XML file in a modern browser such as Mozilla, and it will display the information.”

    What other browsers beside Mozilla supports this? I found that Opera 6.0 was the only browser aside from Mozilla that was able to support this option. Explorer 6.0 did display SOME of it, but nothing usable.

  40. This is a really helpful article—thanks for writing it!
    One thing, however—You’ve included links to XML Tools for Linux & Windows (no surprise there), but what about XML Tools for use on Mac OS X?

    Thanks.

    Ethan

  41. Well, I fully stand behind the idea of separating presentation and content, and this can be seen on my website(which you can get to by going to the posted URL), but instead of using someone else’s programs to transform the content into the presentation, I do it myself on my website…or at least I am in the process of doing so on my site…
    I parse the corresponding file for the page, and depending on what content is in between which tags, I display it somewhere, somehow on the page. This approach is time- consuming because it requires the content, and the presentation to be separate, with the programming tying them together(correctly!). Nonetheless, this is my preferred approach…until i can learn to use XSL/XSLT to do the CSS and the PHP/Perl’s work for me!!!

    please feel free to e-mail me

  42. Quote:
    Why not compromise? Have you read up on the PHP functions relating to XML files? If not, take a look at http://www.php.net/manual/en/ref.xml.php. I’ve begun experimenting, just out of interest, with a CMS using XML for data storage and PHP for parsing and displaying the data. It’s quite nifty.
    ————————

    hmm. See, it’s not the implementation I’m having problems with, it’s the whole concept. I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.

    I really can’t see a reason to change what I already know quite well, that works very well for every instance that I can come up with, in order to use XML.

    And if you say “You don’t always have access to PHP and a DB”.. Well, if you don’t have DB access on your webhost… what are the chances of getting them to add the PHP extensions? It’s not installed by default for PHP..

    (shrug)

  43. The decision to use XML or a relational DB should ideally be based on the nature of the data. Some data fits better into a relational DB, some fits better into XML, some can be represented using either pretty much equally.

    Data that can be represented using just a single table can generally be represented using either technology with no compelling wins either way. (More speed with a DB, more portability with XML, but nothing much beyond that.)

    Data that would be represented in a relational DB using two or more tables that get joined together is probably best kept in a DB. The DB will handle all the join operations, referential integrity, etc., more easily than will be possible using XML. (You could do it with XML, but you’d probably end up writing a lot of code yourself.)

    Data that consists largely of text with markup is best represented using XML or some other form of markup language. If you’ve got a document that could have an arbitrary number of sections, or styles appearing at arbitrary points in the text, you really need to use a markup language of some sort. There isn’t any way to represent that information using rows and columns. That goes double for recursive data structures such as subsections.

    Relational DBs and markup languages represent two different philosophies about the structure of data. Neither of them is suitable for all data.

    Apart from that, I personally like text files that I can hack by hand. I get worried when I have important data that requires a particular program to access.

  44. As a follow-up to my previous post [Post], I’d like to point out that XHTML 2 is incompatible with XHTML 1.0 [XHTML2], and is certainly incompatible with HTML. Further, CSS 2.1 is not backwards compatible with even CSS 2.0 [CSS2.1].

    Further, here’s evidence that even people that arguably “Get it” don’t understand MIME types. [DiveInto XHTML2] “My fresh IE 5.5 install asks to download the page…”. The Save As dialog in IE is popped up for any unknown MIME Type (after IE’s sniffing algorithm fails). [IEMIME]

    (As a side note, it appears the URL Mark references returns text/html now. I am pretty positive that he was getting the dialog because at the time he tested, it was (properly) returning application/xhtml+xml, and it has since been changed to return text/html.)

    It is not my intention to harm anyone in these statements. I am simply trying to call attention to the need for content negotiation, and to the fact that “forward compatibility” can’t be strictly counted on.

    A mechanism for negotiating representations based on client capabilities is -necessary-. In fact, Mark’s closing (sarcastic) note ” Looks great in Opera and Mozilla, though. That does it. I’m converting all my pages to XHTML 2.0. Accessibility be damned. Backward compatibility be damned. IE 5 be damned.” points to this fact, though he may not realize it.

    Please… think about it.

    -Jeremy

    [Post]
    http://www.alistapart.com/stories/usingxml/discuss/2/#ala-731

    [XHTML2]
    http://www.w3.org/TR/xhtml2/
    (Sorry, I can’t point out specific examples of non-conformance here.. They’ve not included a change summary, and I can’t do the research needed to gather evidence just now)

    [CSS2.1]
    http://www.w3.org/TR/2002/WD-CSS21-20020802/about.html#q1

    [DiveInto XHTML2]
    http://diveintomark.org/archives/2002/08/06.html#changes_in_xhtml_20

    [IEMIME]
    http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/moniker/overview/appendix_a.asp

  45. Hi
    I get a Microsoft Jscript runtime error saying Null is not a null object, while I try to run Nutrition.svg. Help!

  46. This XML article was reccomended to me as I am in a hurry to make its R&D for our web division. I was given a 500-page book on the subject, very good nonetheless, but J.David’s article does what that book does in much less time and without any unnecesary jargon speak or hype. Now I can say I really get what XML is about! I just hope I can understand XML’s role in Flash as well, in the future.

  47. I went through the process of creating all of the parsed files. I can be a goon on the computer, but i found it to be a breeze. Very cool application of the technologies. Well written article too!

  48. This was good overview. An example with DTD and XML Schema could also throw some light to those grammars.

  49. Hi,
    Good article – much food for thought (pardon, no pun intended).
    However, the msvalidate reported a JRE clash problem. I’ve got version 1.4.1_01 of the J2sdk & JRE on my machine. Running the msvalidate.bat reported that I needed the JRE1.3.
    I resolved the issue by going into regedit & changing the JRE current version from 1.4 to 1.3. Obviously not ideal but it works.
    Thanks for your hard work,
    Eddie

  50. So that’s how you use XML. I keep hearing how great it is but, up until this point, had no idea why creating your own markup was a good thing. Great article. Thanks.

    Even so, I question the usefulness of it. If you ask me XML seems to be a fancy way of managing data in text files. For someone who uses only text files that may be a good thing. But as Twyst says “I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.”

    I read colin_zr’s response with interest. He said: “There isn’t any way to represent that information using rows and columns. ” Ok, can someone provide an example of this. Any examples I’ve seen could all easily be represented in a DB. In fact, I seem to remember reading a tutorial somewhere that described how to use XML to display data in a HTML table (using php, I think). Kind of pointless, if you ask me.

    Now, I’m not saying all XML is pointless. This article showed the value of XML for those who may not have access to a DB for whatever reason; or those you do not want to go beyond markup (in other words: those who shy away from scripting lanuages such as php, asp, …). I just don’t see it’s value for those who do, such as myself. Maybe, someday, somewhere, someone will provide an example that simply can not be implemented into a relational DB. Until then, I will set XML aside.

  51. I would like to give a nod to Jeremy Dunck for coming to the same conclusion I did about this ariticle. It is a perfect primer for an explanation on the use of content negotiation. Based on a user agent’s (i.e. brower) capabilties you can serve the document as any one of the types listed.

    These capabilties include SUPPORTED MIME TYPES and supported languages. Therefore if a browers says it supports ‘en-us’ (United Sates English) and your site has THE SAME content in two languages, say en (English), fr (French), you can serve the apportiate one to the user (in this case the english one). No need for a new URI or to ask the user which version they prefer.

    In terms of the article you can also serve documents by MIME type based not only on if a type is supported but also by the qualty of that support.

    In a real world example IE supports text/html (HTML) and text/plain (Text) and Mozilla supports text/html, application/xhtml+xml (XHTML) and text/plain (Text) . Mozilla supports XHTML with a quality of 1 (Best) and HTML with a quality of 0.9. Therefore in IE your only options are to transform the xml document into HTML or text based on stated support, but in Mozilla you have more options. You could send the document either as XHTML, HTML or text. Since XHTML has a higher quailty for XHTML you would probably want to transform the document to XHTML and send it as such. Since (X)HTML is usally preffered over raw text we won’t send the text version to either user agent.

    To further extend this exmaple if user agent supports image/svg+xml (SVG) you can send it the SVG document instead or application/x-pdf (PDF) for the PDF document.

    I sure much of this post is somewhat short sided but the bottom line is one URI can serve multple versions of a RESOURCE based on what’s avalible and what the user agent’s capbilties are. For the most part this goes unused, but this is how HTTP is DESGINED to be used. And to be honest this can all be done today and is support unfortunly most servers make this diffucult as they ARE NOT desgined to work this way, but like most things their are ways around it.

  52. Is a DTD file always needed?

    From the xml file I saw something like:

    which seems like to be a template.

    Can we automate the “record” generation process by having a definition file?


  53. fop -xml nutrition.xml -xsl nutrition_fo.xslt -pdf nutrition.pdf

    The result is a PDF file; it produces pages that are approximately 8 centimeters wide and 9 centimeters high, which fits comfortably into a shirt pocket.
    ” — from the article

    Now how can I use this with say ASP or ASP.Net or Java to generate pdf file on the fly… say for example a customer order some item from a e-commerce site… I want to be able to generate a pdf version of pre-designed templated invoice with their details filled in dynamiclly.

    Is this possible is so how?

  54. I’ve only just stumbled across this article, and it has been great help. The software the author linked to is very useful on PC and *nix, but I’m using a Mac. Does anyone know of any comparable software for me? (I suspect the Linux stuff can be made to work in OS X, but I don’t know how). Please email me if you have any ideas.

  55. Grebmil: A response a few months late…

    Ok, fair enough. Most of the examples you see in these introductory articles involve record-oriented data. But that’s not the only kind of data.

    Here’s an example of some data that really needs to be stored in markup rather than in a relational model:

    Hello, my name is colin. I like XML.

    You can’t take data like that, make a field for names and a field for abbreviations, and force it into third normal form. That’s just not the structure of the data.

    To give you another illustration, think about how you’d take an HTML file and represent it in a database. Would you have a table of div elements, a table of h1 elements, a table of p elements? What would the records in those tables look like? How would you indicate all the p elements that belonged within a specific div? And those are the easy bits. Just wait till we get to inline elements…

    Obviously that’s silly.

    What you might well do is take the contents of the HTML file, or perhaps a fragment of it, and put it into a field of a database. But then you’ve still got all the HTML markup within that field. Markup just happens to be the best way to represent that data.

  56. Hello, my name is colin. I like XML.

    Is that the data itself, or is it really a list of people that like abbreviations?

    I personally do not see xml as a way to deal with millions of records over hundreds of tables. I will more than happily export a subset of data to someone else in xml in any way they want it. However, that does not make my application any more a user of xml than if the two of us had agreed to use pig-latin or parenthesis delimited text files with a header row in rot13.

    I do think xml has its place, it just doesn’t overlap with my domain except as another export format. I haven’t really had to deal with importing xml because everybody else uses databases which means they’re just as happy to give me a few csv files or a direct tap into their database.

    For database dumps, a CSV file with a header row is far more space/bandwidth contientious than XML.

    To people like me, who use SQL, XML seems very clunky and broken.
    To people who like XML, I think SQL and RDBs look big and overpowered for their needs.

    IMarv

  57. Twyst(e) is right, I’d say,

    I certainly don’t want to rely on client side functions
    (ever heard of browser quirks? do you really think there’ll be no more in times to come?)
    when I can access reliable server side functions (PHP, MySQL).

    >You don’t always have access to PHP and a DB…
    I guess angelfire/lycos account holders sharing there pastry recipes with the world
    are not the target audience here.
    (no offense: private homepages/pastries are OK)

    Marek

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

Nothing Fails Like Success

Our own @zeldman paints the complicated catch-22 that our free, democratized web has with our money-making capitalist roots. As creators, how do we untangle this web? #LetsFixThis