Using XML

During my second lecture to an XML class at a local
community college, I explained how XML lets you define your own markup language with custom tags and attributes. I had finished defining a simple markup language for use
with a list of amateur sports clubs, and had displayed a sample document
written with that markup. At that point, one student asked:

Article Continues Below

“Isn’t it inefficient to have to type all those tags for
every club? What good is this? It looks nice, but what can I
do with this document? How can I put this in a web page or use it with
other programs? Wouldn’t it be easier to just use HTML or a
database/word processor/fill-in-the-blank?”

The reason that we use XML instead of a specific application is that
XML is not just a pretty face, living in isolation from the rest
of the computing world. XML is more than a rulebook for generating
custom markup languages. It is part of a family of technologies, which,
working together, make your XML-based documents very useful indeed. To
demonstrate what I mean, I decided to create a new XML-based markup
language from scratch, and show what you can do with a document written
in that language, using off-the-shelf tools.

Creating a New Markup Language#section2

The language that I created stores the nutritional
information that you find on food labels in the United States. The
document starts with a <nutrition> tag, followed by
a <daily-values> element that gives the maximum
amounts of fat, sodium, etc. for a 2000-calorie-a-day diet, and the
units in which the amount is measured.

The daily values are followed by a series of
<food> elements, each of which gives information
about a specific food and its nutritional categories. Because the
<daily-values> element has already defined the units
in which each category is measured, we don’t need to repeat them
for every food; we just enter the numbers for that particular
food’s total fat, sodium, etc. After the last food, we close the
document with a closing </nutrition> tag.

<nutrition><!-- Establish the daily values -->
<daily-values>
<total-fat units="g"> 65 </total-fat>
<saturated-fat units="g"> 20 </saturated-fat>
<cholesterol units="mg"> 300 </cholesterol>
<sodium units="mg"> 2400 </sodium>
<carb units="g"> 300 </carb>
<fiber units="g"> 25 </fiber>
<protein units="g"> 50 </protein>
</daily-values><p><!-- Now list the individual foods --></p><food>
<name>Avocado Dip</name>
<mfr>Sunnydale</mfr><serving units="g"> 29 </serving>
<calories total="110" fat="100"/><total-fat> 11 </total-fat>
<saturated-fat> 3 </saturated-fat>
<cholesterol> 5 </cholesterol>
<sodium> 210 </sodium>
<carb> 2 </carb>
<fiber> 0 </fiber>
<protein> 1 </protein><vitamins>
<p>    <a> 0 </a><br />
    </p><c> 0 </c>
</vitamins><minerals>
<p>    </p><ca> 0 </ca>
<p>    </p><fe> 0 </fe>
</minerals>
</food><p><!-- etc. --></p>
</nutrition>

You may see the entire document
that is used for the examples in this article. All the numbers
are real; only the manufacturers’ names have been changed
to protect the innocent and avoid lawsuits.

A quick note: vitamins and minerals are measured in percentages, not
grams or milligrams. That’s why we don’t need to establish
any units or maximums for them in the <daily-values>
element.

I entered the data by hand using the nedit program on
Linux. I could have used any editor that lets me save files
as plain ASCII text; notepad on Windows or vi on Linux would have done
equally well. To make data entry easier, I created an empty
“template” for a food, which you see at the bottom of the
file. I copied and pasted it for each new food, so that I didn’t
have to type the tags over and over again.

Immediate Benefits#section3

What have we bought by creating this XML file in a text
editor rather than creating an HTML document or a spreadsheet or data
base? First, the data is structured; it’s not just a mass of
numbers in an HTML table or a text file of tab–separated values.
Because of the custom tags, it’s something that humans can read
and understand. It’s also open; we don’t need some
expensive, proprietary software to extract the information from a
binary file. So, as a transport medium, XML already serves us
nicely.

Validating the Document#section4

Even if you’re the only person who ever enters
data into the document, you’d like to be able to check that you
haven’t left out any information or added extra tags.
Additionally, you’d like to be sure that your percentages are all
between 0 and 100.

This becomes even more important if many people enter data. Even if
you give other folks instructions on the proper format, they may ignore
it or make errors. In short, you would like to have the computer help
you determine that the data in your documents is valid.

You do this by creating a machine-readable grammar which
specifies which tags and attributes are valid, and in what
combinations, and what values your tags and attributes may contain.
You then hand your document and the grammar to a program called a
validator, and it checks that the document matches your
specifications.

One machine-readable form of specifying such a grammar is a notation
called Relax NG. Relax NG is, itself, an XML-based markup
language. Its purpose is to specify what is valid in other
markup languages. This isn’t as crazy or impossible as it
sounds. After all, books that tell you how to use English grammar
correctly are also written in English.

For example, one of the specifications of our nutritional markup
language is that the <calories> element is an empty
element, and it has two attributes, the total attribute
and the fat attribute. These must both have decimal
numbers in them. We say this in Relax NG as follows:

<element name="calories">
<empty/>
<attribute name="total"><data type="decimal"/>
 </attribute>
<attribute name="fat"><data type="decimal"/>
 </attribute>
</element>

When we pass nutrition documents through the validator with this
document, the validator will tell us that the first tag below
is correct, but the second one isn’t.

<calories total="100" fat="10"/>
<calories total="217" fat="don't ask!"/>

You may see the entire grammar
specification for the nutrition markup here
. You may
also find
out more about Relax NG
. By the way,
Relax NG is not the only game in town if you want to specify
grammar. You may use something called a DTD (Document
Type Definition), which is not as powerful
as Relax NG; or you may use XML Schema, which is
about as powerful as Relax NG, but far more complex to learn.

Try it!#section5

If you are feeling adventurous, you may want to try these
files yourself. You will need some XML tools in order to
do this. Here is how to set up the tools
for Windows
, and here’s the setup for Linux.

To validate a file, go to the command prompt if you are using
Windows, or go to a console window and get a shell prompt if you
are using Linux. Then use the batch/shell file described
in the setup instructions to invoke
the Multi-Schema Validator:

msvalidate nutrition.rng nutrition.xml

Now What?#section6

Although we can enter readable data and check to see if it’s
OK, we still can’t do anything with it. If we display it
in a browser, we just see the text all squeezed together. That’s
because the browser doesn’t know how to display a
<food> or <vitamins> tag.

Displaying the XML#section7

If you are using the very latest browsers, you can
attach a stylesheet to the XML file. We have done that in
this example by putting this line at the top of file
nutrition.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/css" 
 href="nutrition.css"?>
<nutrition></nutrition>

The style sheet that we write for file nutrition.css
looks very much like the style sheets that you use with your HTML
files. The difference is that we assign styles to our new nutrition
tags, not to the standard HTML tags. For example, to say that a
food’s manufacturer should appear in 16 point italic type without
starting a new line, you would write:

mfr {
    display: inline;
    font-size: 16pt;
    font-style: italic;
 }

Once you have created
the entire stylesheet in the same
directory as the XML file, you can open the XML file in a
modern browser such as Mozilla, and it will display the information.

Transformation—A Better Way#section8

The problems with the stylesheet are that:

  • It only works with the very latest browsers that handle
    Cascading Style Sheets Level 2.
  • It can’t extract all the information (for example, the units
    don’t show up in the output document because they are
    “hidden” in the attribute values.
  • It can’t calculate percentages.

Additionally, the markup we’ve invented here is data-oriented;
it is designed to describe data to be stored or to be transmitted to
other programs. In these documents, the order of elements and the type
of data in each element is fairly rigid. Stylesheets work better with
narrative-oriented markup documents. These are documents which are
generally meant for human reading, and are more “free-form”
than data-oriented documents. Examples of narrative-oriented markup are
XHTML, DocBook (a markup for writing books and articles), and NewsML
(for writing news reports).

In order to get around these problems, we can use XSLT,
Extensible Stylesheet Language Transformations, to convert the
nutrition file into other forms. XSLT is, again, another XML-based
markup language. Its purpose is to describe how to take input from one
XML file (the “source document”) and output it to a result
document. XSLT has the flexibility to extract data from attributes as
well as element content, and it can do calculation and sorting upon the
data in the source document.

This power makes XSLT a key technology in the XML family of
technologies. For a good introduction, read
Norman Walsh’s excellent presentation on the subject or
this
hands-on tutorial
.

Transformation to HTML#section9

The first
XSLT file, which you may see here
, converts the nutrition document
into a very plain HTML file suitable for display on any browser on a
desktop or PDA. To do the transformation, you’d type this
command:

transform nutrition.xml 
nutrition_plain.xslt nutrition_plain.html

The result of the transformation is an HTML file named nutrition_plain.html,
which you may open in any browser you like. Even this simple
transformation has done two things that we could not do with CSS:
it uses the information in attributes to display the units for each
nutritional category, and it calculates percentages of the daily
values.

Fancy Transformation#section10

OK, so maybe you want something a bit fancier. Here’s a more complex
transformation
which sorts the data by the ratio of fat calories to
total calories per serving; sort of a “healthiness
index.”

If you have saved the XSLT in a file called
nutrition_fancy.xslt you can type this command:

transform nutrition.xml 
nutrition_fancy.xslt nutrition_fancy.html

That produces a file named
nutrition_fancy.html,
which looks remarkably different from the plain version. It uses
Cascading Style Sheets to produce the little bar graphs; you’ll
need a modern browser like Internet Explorer 5+ or Mozilla/Netscape 6
to see the effect. Notice that XSLT lets you pick and choose the data
you want to display; the information about carbohydrates, fiber,
vitamins, and minerals are omitted in the fancy version. (They
could, of course, be added by changing the XSLT file.)

We have used XSLT to take the source XML file and transform
it to two different HTML files; a plain version that is suitable for
display on old browsers and PDAs, and a fancier version that is
suitable for use with desktop computers and modern browsers.

Non-HTML Transformation#section11

But wait, maybe you don’t want HTML;
there’s more than just browsers in the world, you know. You might
want to take the data and convert it to a text file of tab–separated
values for import into a spreadsheet or database program.

Here is a
transformation file that does this
, using this command:

transform nutrition.xml nutrition_csv.xslt nutrition.csv

And here’s the resulting text
file
.

Conversion to Print#section12

Let’s say you want to create a PDF file from your
XML. That’s possible by using a transformation to change the XML to
another markup language: XSL-FO (Extensible Stylesheet
Language – Formatting Objects). This is a page layout language. A tool
called FOP (Formatting Objects to PDF) takes that markup and
creates PDF files for you.

Here is a transformation file which
takes the nutrition data and converts it to formatting objects. If
you save it in nutrition_fo.xslt, you can use FOP to do
the conversion to PDF:

fop -xml nutrition.xml -xsl nutrition_fo.xslt -pdf nutrition.pdf

The result is a PDF file; it
produces pages that are approximately 8 centimeters wide and 9
centimeters high, which fits comfortably into a shirt pocket.

Generating Graphics#section13

Finally, you may wish to create an interactive, graphic
version of the data. Another XML-based markup,
SVG—Scalable Vector Graphics— gives you this
capability. SVG has elements like the following, which draw a black
diagonal line and a yellow circle with a green outline:

<line x1="0" y1="0" x2="50" y2="50" />
<circle cx="100" cy="100" r="30" />

By using a transformation file that
produces SVG
, we can construct a graphic that shows a bar graph for
the food whose name you click. Here’s what you type:

transform nutrition.xml nutrition_svg.xslt nutrition.svg

You may display the result with the SVG browser that is part of the
Batik toolkit. If you have installed Batik as per the instructions
given for Linux or for Windows, you type
batik�nutrition.svg. I have not tested the file with
the latest version of the Adobe SVG
Viewer
, but it should work nicely. Here is a screenshot;
click it to see it full size.

bar chart showing categories for a given food

Other Ways to Use the XML Tools#section14

In this article, we’ve used the Multi-Schema
Validator, Xalan Transformer, FOP converter, and Batik viewer from the
command prompt. That’s the fastest and easiest way to get things
working so that you can have an experience of what XML can do.

The batch or shell file approach would work in a production
environment where you generate a whole website’s worth of HTML
files from one or more XML files at regular time intervals. You just
set up a batch job to run at scheduled times (a cron job
in Unix terms) to generate the files you need.

What if you need to generate HTML pages or PDF files dynamically
in response to user requests? Obviously, you don’t want the overhead
of starting a Java process every time a request comes in, and a
static batch file certainly won’t do the trick. Both the
Multi-Schema Validator and Xalan have an API (Application Program
Interface) and can thus become part of a Java servlet running on
your server and handling dynamic user requests. Once a servlet is
loaded, it stays in memory, so there is no extra overhead for
subsequent uses of a transformation.

If you are interested in running servlets, one option is to use the
Jakarta Tomcat servlet container. It can run as a stand-alone server for testing or as a
module for either Apache or Microsoft IIS.

Timing#section15

There are two aspects to timing: how long it takes to
write the grammars and transformations, and how fast they run.

Designing the markup
language took me about 25 minutes, and entering the data took me
another 25 minutes, some of it running out to the kitchen to grab items
from the shelf or refrigerator. Writing and testing the Relax NG
grammar required 30 minutes.

The Cascading Style Sheet for displaying the XML directly in Mozilla
took all of 15 minutes to write. The “plain HTML”
transformation took about 50 minutes, including time for looking up
some XSLT constructs and doing some experimentation. The
“fancy” transformation took 45 minutes. I needed 20 minutes
to figure out how to do the bar graphs with stylesheets in the first
place, and I used another 5 minutes for minor aesthetic adjustments.
The file for conversion to tab–separated values was a fifteen-minute
job.

The transformation for PDF took an hour. The first time through, I
designed it for paper the size of a compact disc insert. I thought
better of it, and decided to reduce it to shirt-pocket size. That took
another 30 to 45 minutes of tweaking and getting the font sizes just
the way I wanted them. I also had to make some changes to avoid using
parts of XSL Formatting Objects that FOP does not implement yet.

Finally, the SVG transformation took an hour and a half to write.
About half that time was experimenting to get everything positioned
nicely and making the ECMA Script interaction work properly.

You don’t have to be an expert at Relax NG, XSLT, XSL
Formatting Objects, or SVG to do this. I don’t use any of these
techonlogies on a daily basis. I just know enough about each of them
to get things to work. In this case, my philosophy was “the first
way you think of that works is the right way.” That is why XSLT
experts will be shocked when they see an inefficient construct like
this in the plain HTML transform file.

select="/nutrition/daily-values/*[name(.)=name($node)]/@units"

This is not to say that there is no learning involved here; you will
need to spend some time on that. You don’t need to spend a
lifetime on it, though. It is definitely possible to learn enough about
these technologies to put them to effective use in a short time.

Performance#section16

I tested all of these files on a 400MHz AMD K-6 with
128Mb of memory running SuSE Linux
7.2. For the transformations, I modified the SimpleTransform.java
sample program that comes with Xalan. This program records the total
time to generate the output and the time involved in transformation
after the XSLT file has been parsed. If you are running transformations
on a server, you can cache the parsed XSLT file, so the overhead for
parsing occurs only once.

Transformation Time in seconds
Total Transform
Plain HTML 3.691 1.018
Fancy HTML 4.057 1.409
Tab–separated Values 3.057 0.548
SVG 3.386 0.689

I measured the time for the PDF transformation with the
Linux time command. Generating the file took
15.115 seconds real time, with 10.920 seconds of user CPU time.

Of course, these are not the only tools available. There are other
XSLT processors and other programs for converting XSL Formatting
Objects to PDF. I chose MSV, Xalan, Fop, and Batik because they are
free, easy to use, and I was already familiar with them.

Summary#section17

  • Using XML-based markup gives your document structure,
    and makes it readable and open.

  • XML is part of a family of technologies.

  • You can use grammar markup languages like Relax NG
    or XML Schema to validate
    your documents.

  • You can use XSLT transformations to repurpose a document.
    A single document can serve as the source for XHTML, plain text,
    PDF, or other XML markup languages like SVG.

  • Programs which do validation and transformation are freely
    available and easy to use.

These capabilities exist right now, and they are easy to learn and
utilize. That is why XML is good, and why people are so excited about
it once they start to use it.

You may download the
XML files and the resulting HTML, text, and PDF files
.

69 Reader Comments

  1. Quote:
    Why not compromise? Have you read up on the PHP functions relating to XML files? If not, take a look at http://www.php.net/manual/en/ref.xml.php. I’ve begun experimenting, just out of interest, with a CMS using XML for data storage and PHP for parsing and displaying the data. It’s quite nifty.
    ————————

    hmm. See, it’s not the implementation I’m having problems with, it’s the whole concept. I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.

    I really can’t see a reason to change what I already know quite well, that works very well for every instance that I can come up with, in order to use XML.

    And if you say “You don’t always have access to PHP and a DB”.. Well, if you don’t have DB access on your webhost… what are the chances of getting them to add the PHP extensions? It’s not installed by default for PHP..

    (shrug)

  2. The decision to use XML or a relational DB should ideally be based on the nature of the data. Some data fits better into a relational DB, some fits better into XML, some can be represented using either pretty much equally.

    Data that can be represented using just a single table can generally be represented using either technology with no compelling wins either way. (More speed with a DB, more portability with XML, but nothing much beyond that.)

    Data that would be represented in a relational DB using two or more tables that get joined together is probably best kept in a DB. The DB will handle all the join operations, referential integrity, etc., more easily than will be possible using XML. (You could do it with XML, but you’d probably end up writing a lot of code yourself.)

    Data that consists largely of text with markup is best represented using XML or some other form of markup language. If you’ve got a document that could have an arbitrary number of sections, or styles appearing at arbitrary points in the text, you really need to use a markup language of some sort. There isn’t any way to represent that information using rows and columns. That goes double for recursive data structures such as subsections.

    Relational DBs and markup languages represent two different philosophies about the structure of data. Neither of them is suitable for all data.

    Apart from that, I personally like text files that I can hack by hand. I get worried when I have important data that requires a particular program to access.

  3. As a follow-up to my previous post [Post], I’d like to point out that XHTML 2 is incompatible with XHTML 1.0 [XHTML2], and is certainly incompatible with HTML. Further, CSS 2.1 is not backwards compatible with even CSS 2.0 [CSS2.1].

    Further, here’s evidence that even people that arguably “Get it” don’t understand MIME types. [DiveInto XHTML2] “My fresh IE 5.5 install asks to download the page…”. The Save As dialog in IE is popped up for any unknown MIME Type (after IE’s sniffing algorithm fails). [IEMIME]

    (As a side note, it appears the URL Mark references returns text/html now. I am pretty positive that he was getting the dialog because at the time he tested, it was (properly) returning application/xhtml+xml, and it has since been changed to return text/html.)

    It is not my intention to harm anyone in these statements. I am simply trying to call attention to the need for content negotiation, and to the fact that “forward compatibility” can’t be strictly counted on.

    A mechanism for negotiating representations based on client capabilities is -necessary-. In fact, Mark’s closing (sarcastic) note ” Looks great in Opera and Mozilla, though. That does it. I’m converting all my pages to XHTML 2.0. Accessibility be damned. Backward compatibility be damned. IE 5 be damned.” points to this fact, though he may not realize it.

    Please… think about it.

    -Jeremy

    [Post]
    http://www.alistapart.com/stories/usingxml/discuss/2/#ala-731

    [XHTML2]
    http://www.w3.org/TR/xhtml2/
    (Sorry, I can’t point out specific examples of non-conformance here.. They’ve not included a change summary, and I can’t do the research needed to gather evidence just now)

    [CSS2.1]
    http://www.w3.org/TR/2002/WD-CSS21-20020802/about.html#q1

    [DiveInto XHTML2]
    http://diveintomark.org/archives/2002/08/06.html#changes_in_xhtml_20

    [IEMIME]
    http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/moniker/overview/appendix_a.asp

  4. Hi
    I get a Microsoft Jscript runtime error saying Null is not a null object, while I try to run Nutrition.svg. Help!

  5. This XML article was reccomended to me as I am in a hurry to make its R&D for our web division. I was given a 500-page book on the subject, very good nonetheless, but J.David’s article does what that book does in much less time and without any unnecesary jargon speak or hype. Now I can say I really get what XML is about! I just hope I can understand XML’s role in Flash as well, in the future.

  6. I went through the process of creating all of the parsed files. I can be a goon on the computer, but i found it to be a breeze. Very cool application of the technologies. Well written article too!

  7. This was good overview. An example with DTD and XML Schema could also throw some light to those grammars.

  8. Hi,
    Good article – much food for thought (pardon, no pun intended).
    However, the msvalidate reported a JRE clash problem. I’ve got version 1.4.1_01 of the J2sdk & JRE on my machine. Running the msvalidate.bat reported that I needed the JRE1.3.
    I resolved the issue by going into regedit & changing the JRE current version from 1.4 to 1.3. Obviously not ideal but it works.
    Thanks for your hard work,
    Eddie

  9. So that’s how you use XML. I keep hearing how great it is but, up until this point, had no idea why creating your own markup was a good thing. Great article. Thanks.

    Even so, I question the usefulness of it. If you ask me XML seems to be a fancy way of managing data in text files. For someone who uses only text files that may be a good thing. But as Twyst says “I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.”

    I read colin_zr’s response with interest. He said: “There isn’t any way to represent that information using rows and columns. ” Ok, can someone provide an example of this. Any examples I’ve seen could all easily be represented in a DB. In fact, I seem to remember reading a tutorial somewhere that described how to use XML to display data in a HTML table (using php, I think). Kind of pointless, if you ask me.

    Now, I’m not saying all XML is pointless. This article showed the value of XML for those who may not have access to a DB for whatever reason; or those you do not want to go beyond markup (in other words: those who shy away from scripting lanuages such as php, asp, …). I just don’t see it’s value for those who do, such as myself. Maybe, someday, somewhere, someone will provide an example that simply can not be implemented into a relational DB. Until then, I will set XML aside.

  10. I would like to give a nod to Jeremy Dunck for coming to the same conclusion I did about this ariticle. It is a perfect primer for an explanation on the use of content negotiation. Based on a user agent’s (i.e. brower) capabilties you can serve the document as any one of the types listed.

    These capabilties include SUPPORTED MIME TYPES and supported languages. Therefore if a browers says it supports ‘en-us’ (United Sates English) and your site has THE SAME content in two languages, say en (English), fr (French), you can serve the apportiate one to the user (in this case the english one). No need for a new URI or to ask the user which version they prefer.

    In terms of the article you can also serve documents by MIME type based not only on if a type is supported but also by the qualty of that support.

    In a real world example IE supports text/html (HTML) and text/plain (Text) and Mozilla supports text/html, application/xhtml+xml (XHTML) and text/plain (Text) . Mozilla supports XHTML with a quality of 1 (Best) and HTML with a quality of 0.9. Therefore in IE your only options are to transform the xml document into HTML or text based on stated support, but in Mozilla you have more options. You could send the document either as XHTML, HTML or text. Since XHTML has a higher quailty for XHTML you would probably want to transform the document to XHTML and send it as such. Since (X)HTML is usally preffered over raw text we won’t send the text version to either user agent.

    To further extend this exmaple if user agent supports image/svg+xml (SVG) you can send it the SVG document instead or application/x-pdf (PDF) for the PDF document.

    I sure much of this post is somewhat short sided but the bottom line is one URI can serve multple versions of a RESOURCE based on what’s avalible and what the user agent’s capbilties are. For the most part this goes unused, but this is how HTTP is DESGINED to be used. And to be honest this can all be done today and is support unfortunly most servers make this diffucult as they ARE NOT desgined to work this way, but like most things their are ways around it.

  11. Is a DTD file always needed?

    From the xml file I saw something like:

    which seems like to be a template.

    Can we automate the “record” generation process by having a definition file?


  12. fop -xml nutrition.xml -xsl nutrition_fo.xslt -pdf nutrition.pdf

    The result is a PDF file; it produces pages that are approximately 8 centimeters wide and 9 centimeters high, which fits comfortably into a shirt pocket.
    ” — from the article

    Now how can I use this with say ASP or ASP.Net or Java to generate pdf file on the fly… say for example a customer order some item from a e-commerce site… I want to be able to generate a pdf version of pre-designed templated invoice with their details filled in dynamiclly.

    Is this possible is so how?

  13. I’ve only just stumbled across this article, and it has been great help. The software the author linked to is very useful on PC and *nix, but I’m using a Mac. Does anyone know of any comparable software for me? (I suspect the Linux stuff can be made to work in OS X, but I don’t know how). Please email me if you have any ideas.

  14. Grebmil: A response a few months late…

    Ok, fair enough. Most of the examples you see in these introductory articles involve record-oriented data. But that’s not the only kind of data.

    Here’s an example of some data that really needs to be stored in markup rather than in a relational model:

    Hello, my name is colin. I like XML.

    You can’t take data like that, make a field for names and a field for abbreviations, and force it into third normal form. That’s just not the structure of the data.

    To give you another illustration, think about how you’d take an HTML file and represent it in a database. Would you have a table of div elements, a table of h1 elements, a table of p elements? What would the records in those tables look like? How would you indicate all the p elements that belonged within a specific div? And those are the easy bits. Just wait till we get to inline elements…

    Obviously that’s silly.

    What you might well do is take the contents of the HTML file, or perhaps a fragment of it, and put it into a field of a database. But then you’ve still got all the HTML markup within that field. Markup just happens to be the best way to represent that data.

  15. Hello, my name is colin. I like XML.

    Is that the data itself, or is it really a list of people that like abbreviations?

    I personally do not see xml as a way to deal with millions of records over hundreds of tables. I will more than happily export a subset of data to someone else in xml in any way they want it. However, that does not make my application any more a user of xml than if the two of us had agreed to use pig-latin or parenthesis delimited text files with a header row in rot13.

    I do think xml has its place, it just doesn’t overlap with my domain except as another export format. I haven’t really had to deal with importing xml because everybody else uses databases which means they’re just as happy to give me a few csv files or a direct tap into their database.

    For database dumps, a CSV file with a header row is far more space/bandwidth contientious than XML.

    To people like me, who use SQL, XML seems very clunky and broken.
    To people who like XML, I think SQL and RDBs look big and overpowered for their needs.

    IMarv

  16. Twyst(e) is right, I’d say,

    I certainly don’t want to rely on client side functions
    (ever heard of browser quirks? do you really think there’ll be no more in times to come?)
    when I can access reliable server side functions (PHP, MySQL).

    >You don’t always have access to PHP and a DB…
    I guess angelfire/lycos account holders sharing there pastry recipes with the world
    are not the target audience here.
    (no offense: private homepages/pastries are OK)

    Marek

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

I am a creative.

A List Apart founder and web design OG Zeldman ponders the moments of inspiration, the hours of plodding, and the ultimate mystery at the heart of a creative career.
Career