Using XML

by J. David Eisenberg

69 Reader Comments

Back to the Article
  1. Wow! I have had so many questions regarding XML and how it should be effectively used. This “tutorial” answered most of my questions right off. Zeldman would have me believe that the next tutorial was weeks away when it would be unveiled the next day. What a pleasant surprise. My thanks to you.

    Copy & paste the code below to embed this comment.
  2. Kinda thought this is not allowed:
    style=“background-color:green; width:5”>
    Values cannot stand alone in CSS, do they? So it should say 5px, 5% or something.

    Copy & paste the code below to embed this comment.
  3. Nope, the units must be specified. IE in quirks mode automatically assumes you want px, and I suspect other programs do so as well.

    Copy & paste the code below to embed this comment.
  4. Nice article thanks.

    One question, is there an app which takes an XML file and allows for flexible and easy data entry? There is no chance in hell I can get our salesteam or non-techy guys to use Notepad to enter data into the XML file. Sure I could go and write ASP apps for each XML file, but that is a chore. And it seems to me as if XML is quite a qood language to support a flexible generic data-entry app.

    Copy & paste the code below to embed this comment.
  5. Scary case of deja vu here. The new work website that (finally) went on line uses very simliar techniques, xml articles that are transformed by xslt stylesheets (depending on the viewers browser) to produce the output. Reading this reminded me of the fun I had making the system.
    One really good reason I found for self specified makup was it stops people thinking “h1 looks like this” and instead makes them think about the structure.

    Copy & paste the code below to embed this comment.
  6. just been reading loads of stuff about XML / XSL / XSLT / XPATH, this is the clearest and most enjoyable article(yes i used the E word) i have read yet. trust the guys @ alistapart to get the goods. now i just have to learn it….

    Copy & paste the code below to embed this comment.
  7. Nice article, unfortunately the setup instructions for batik don’t lead anywhere!

    Overall though a well written introduction to the power of XML and XSLT.

    Copy & paste the code below to embed this comment.
  8. Paul:
    You want XML-Spy (not just for your salesteam).

    http://www.xmlspy.com/

    Copy & paste the code below to embed this comment.
  9. Use something like:

    serving:after {
    content: attr(units)
    }

    This technique can be especially helpful for, for example, printable stylesheets, where link URI’s, image titles, acronym defenitions and so on can be displayed, ala:

    @media print {
    acronym:after {
    content: ” (” attr(title) “)”
    }
    }

    Failing adding this, Mr Eisenberg should at least close the brackets where he says CSS can’t do this ;)

    Copy & paste the code below to embed this comment.
  10. Mr. Eisenberg, you’ve done a great thing here!

    I’ve a number of friends I’ve been trying to turn on to XML, and I’ve had some grey areas myself.

    While you gave the executive summary on technical details, you have given enough information to really dig in.

    On the positive side, you have demonstrated a large number of great XML facets in a practical manner.

    Thanks for the excellent article! I’ll be sending my friends to read it. :)

    Copy & paste the code below to embed this comment.
  11. Freaky,
    attr(title) is CSS3, which is poorly (or not at all) implemented in even modern browsers. Mr. Eisenberg was trying to give a practical application, and glossed over this detail for the clarity of the point—CSS is not sufficient for all rendering needs.
    Even accepting full CSS3 support, the reflowing and calculations often needed for XML to be presented in various representations requires a more robust language—a programming language.
    CSS is elegant and powerful for it’s purpose, but it (intentionally) can’t do everything. Modularization, baby!

    Copy & paste the code below to embed this comment.
  12. Finally! I’ve been asking for YEARS now if XML meant double-markup (one for XML, one for XHTML), and the answer is NO. Just link to a style sheet with identical tags.

    Dude. Sweet. Good article.

    Copy & paste the code below to embed this comment.
  13. Sorry, I was incorrect. *:content {attr()} is not CSS3, it is CSS2:
    http://www.w3.org/TR/CSS2/generate.html#x16

    Even so, browser support for it is not good, and the need for XSLT stands, regardless of the support for CSS. :P

    Copy & paste the code below to embed this comment.
  14. Great article I had been following XML for a while, and I found this artice to be well-written and informative. One question I have is why he didn’t discuss client-side transformations using modern browsers and XSLT. I know it’s possible, heck I’ve done it, but he didn’t mention it at all. Something else I’m curious about is whether or not browsers for the Palm Pilots can handle XSLT. I know they parse XML but I don’t know how they would handle XSLT. Well great article (once again). I really enjoyed it.

    Copy & paste the code below to embed this comment.
  15. Oops. You are correct.I forgot the “px”. In file nutrition_fancy.xslt, change line 113 of nutrition_fancy.xslt to: <xsl:attribute name=“style”>background-color:green; width:<xsl:value-of select=”$pct”>px</xsl:attribute>

    Change d line 116 to: <xsl:attribute name=“style”>background-color:red; width:<xsl:value-of select=”$pct”>px</xsl:attribute> and run the transform again. That’s another nice thing about generating HTML via XSLT; you don’t have to fix a website’s worth of files by hand – just run a batch file agan.

    Copy & paste the code below to embed this comment.
  16. The setup links under the “Try It!” section (nearer the beginning of the article) do go to the correct places. I’ll make this fix, and then when an editor has time to make the change, it will happen. The correct URLs are http://www.alistapart.com/stories/usingxml/windows_setup.html for Windows and
    http://www.alistapart.com/stories/usingxml/linux_setup.html for Linux.

    Copy & paste the code below to embed this comment.
  17. Cocoon and Axkit! Don’t forget about them!
    Both Apache projects and both great XML server side technologies. I prefer Cocoon, actually—Java or Perl, it’s your choice.

    http://xml.apache.org/cocoon/
    http://axkit.org

    Copy & paste the code below to embed this comment.
  18. I don’t know if I’ll get flamed for adding this but I thought it should be mentioned.

    Flash is also a font-end option for XML data. Just goes to show the beauty of XML, that you can use it anyway you see fit.

    Copy & paste the code below to embed this comment.
  19. Another use for XML: a general purpose container for data.

    It’s easy to add/remove/modify elements in the XML data (on the server side.) Once you have the XML data in a DOM object, you can massage the data quite easily. It’s also easy to save these changes by serializing the document.

    Of course, all of the transformations that David presents can be applied to any updated data.

    For a content management product that I’m working on now, we are using a DOM to cache information from searches and other expensive backend services. Keeping this cached object at the web tier reduces round-trips to the application server and provides a great deal of flexibility in presentation (ie. the cache can be transformed via XSL to different types of views.)

    Another point: many of the backend services (like databases) can be configured to produce output in XML. And any data that’s retrieved over a SOAP connection will be in XML. Creating the cache in XML takes very little effort.

    Copy & paste the code below to embed this comment.
  20. Using XSLT to transform XML documents into (X)HTML for different browsers is mentioned as a possibility. In fact, one of those who posted mentions that they do just that [http://www.alistapart.com/stories/usingxml/discuss/#ala-705].

    How is this done? I assume that some sort of browser detection is required, but I’m not sure how one would implement browser detection on the server side?

    I suppose you could parse the HTTP request’s user agent string before sending a response. But deciding which browser is which from the user agent string is a complex guessing game. How do we simplify it?

    Copy & paste the code below to embed this comment.
  21. I’m getting involved with an XML to HTML project… and this is how I’m doing it.

    A complex newspaper software generates XML documents (like articles) with a lot of great markup.

    I then parse that XML and stuff it into a database, defining fields based on tags.

    Then, on the web, someone comes to my database driven web site and I define styles as I wish, usually based on the customer’s custom specifications. I just don’t see the pre-application of styles to an XML document as efficient. I also think that I’m not quite getting the big picture… but thanks to articles like this one, things are beginning to come together.

    Copy & paste the code below to embed this comment.
  22. Good article, though I find the idea that manufacturer names had been changed to be ridiculous to the extreme. Lawsuits? The innocent? Now we are at the point where plainly descriptive facts cannot even be used. Harumph.

    Anyway, more to the point, I would love it if there was a similar tutorial expanding on the theme of generating multiple documents from a single XML file. There was a tantalizing hint in the mention of cron jobs, but I would love more.

    I have several XML files that include many records, each of which I would like to put on separate pages. (And, of course, also create navigation among the pages.) Right now I use a server-side kludge to dynamically create a page based only on a single part of the XML tree. I would love to generate separate pages to reduce server load, facilitate indexing, etc…

    Anyone know of another good article on this?

    Copy & paste the code below to embed this comment.
  23. The mention of cron jobs was with the intention of running all your 231 XML files through an XSLT transform to create 231 HTML files. On the other hand, you want to take one XML file (say, containing a three-chapter book) and generate four HTML files, one per chapter plus an index file.
    This is easily possible with XSLT extensions. Xalan, Saxon, and XT (three well-known XSLT processors) all ship with extensions that allow you to generate multiple output files from a single input file. This is detailed in Chapter 8 of XSLT, by Doug Tidwell, published by O’Reilly. I’ve used it before, and it works great.

    Copy & paste the code below to embed this comment.
  24. In fact, what you’re asking for is called server-side content negotiation.

    HTTP was designed to allow negotiation of content on both the client and server sides.

    In this case, you wish to negotiate on the basis of whether a browser accepts a particular representational language. MIME Types may be sufficient for this purpose (though they are insufficient for other types of negotiation).

    Here’s an RFC on the topic.
    http://www.ietf.org/rfc/rfc2295.txt

    Unfortunately, content negotiation is still a largely un-standardized and un-implemented facet of the web.

    I feel that it is under-addressed. While it is largely a technical issue (in that no one user will likely cry out for the need for resources to have alternate representations), it is, IMHO, vital to the long-term health of the web.

    It is the geek version of accessibility. I mustn’t crank out purely-XHTML sites, even if it is current and “cool”. I must continue to provide HTML. I mustn’t use PNG exclusively, I must continue to provide JPG. And so on.

    Thus far, browser sniffing has been good enough to get the job done, but as more browsers on various devices come out, and as new web languages proliferate (XSLT, anyone?), it will be necessary to provide alternate representations based on automatically negotiated user agent capabilities.

    I’d like to see WaSP take up this torch. If they are for the long-term health of the web, then surely backwards-compatibility must at some point be established and agreed to.

    It’s fine to say that we will let NN4 quietly die for its maverick implementations, but what about when standards compatible browsers are old, and the standards they implement are no longer the flavor of the week?

    …I’m off the soap box, now.

    Copy & paste the code below to embed this comment.
  25. Excellent article. Timely, well written, and easily understandable.

    One thing I was wondering about: with google and amazon releasing API’s for their services and the same being accessible through SOAP, etc., I was wondering if someone could speak quickly to how the contents of this article relate to the use of XML in service based applications.

    Any help would be greatly appreciated.

    Jason

    Copy & paste the code below to embed this comment.
  26. What are the advantages/disadvantages of using xml over a db?

    Why not just store your data in a db and use whatever you want to produce your doc of choice (php, cf, asp, etc.)? Seems much easier, especially when it comes to browsers. Isn’t that the best solution for 99% of the cases?

    If you really need to pass some generic text representation of the data, you could output an xml doc, though I would think in most cases you could simply output the end product directly. Services, I can see, might need a generic text representation.

    (Is it fair to call xml a text db?)

    Copy & paste the code below to embed this comment.
  27. Because they’re doing different things. A lot of data-types that can be represented well by xml, can’t be done well by dbs and vice-versa. You know, use each tool where it works best?

    Copy & paste the code below to embed this comment.
  28. The one problem I had with this article was that it claims you can’t show attribute values via CSS. That’s wrong (unless you’re using IE).

    For example, `element:after {content: ” [” attr(name) “] “}`, would display the value of the element tag’s name attribute value, in between square brackets. This works in at least Mozilla 1.0, and Opera 6.04. Instead of using the :after pseudo class, one could choose the :before pseudo class, to display the attribute value before the element. There’s a whole section in the CSS 2 spec about text or content generation – http://www.w3.org/TR/CSS2/generate.html

    Overall, I liked this article.

    Copy & paste the code below to embed this comment.
  29. I’ve been struggling to understand the heady world of xml, and this put the final bridges together in my mind to understand the different bits, thank you squire!

    Copy & paste the code below to embed this comment.
  30. “what about when standards compatible browsers are old, and the standards they implement are no longer the flavor of the week?” – Jeremy Dunck

    Standards are made in such a way that when a standards compliant browser recieves a page that contains things it doesn’t understand (such as a new CSS property, or a new style sheet language entirely, or a new scripting language), it will ignore the parts that it doesn’t understand. If the markup is well written, however, then the browser will still be able to display the content of the page, which, I hope we’ll all agree, is the most important part. All the user will miss out on is some nice visuals.

    The standards themselves are what are backwards compatible. For instance, XHTML was created in such a way that a browser that doesn’t understand XML will still recognize it as HTML 4.01.

    About the article: I’m happy to see this article, ‘cause there are a lot of people who only half understand XML, and this shows a lot of its power. Well written =)

    Copy & paste the code below to embed this comment.
  31. Standards are made in such a way that when a standards compliant browser recieves a page that contains things it doesn’t understand (such as a new CSS property, or a new style sheet language entirely, or a new scripting language), it will ignore the parts that it doesn’t understand.—Slime

    Within a particular Recommendation’s evolution, I can agree with this statement. The rules for evolving HTML were well understood. The rules for evolving XML is well understood. The rules for evolving CSS is well understood. Yep.

    But an old standards-compliant browser that doesn’t understand XSLT will be unable to do anything with an XML file which was intended to be transformed. Indeed, the “content” rendered by untransformed XML will likely be unusable, even if the browser chooses to render it, because logical filtering, calculations, and tree restructuring will not have occured.

    Likewise, that browser which doesn’t understand XML namespacing will not render an XHTML document correctly, should it also contain namespaced SVG, for example.

    In such a case, different representations must be presented to the old browser on the basis of client-side or server-side content negotiation. Backwards compatibility of HTML 4.01 with HTML 2.0 is irrelavent in the situation I am describing.

    ==========================
    The standards themselves are what are backwards compatible. For instance, XHTML was created in such a way that a browser that doesn’t understand XML will still recognize it as HTML 4.01.—Slime

    Actually, this is incorrect for a couple of reasons.

    First, XHTML 1.0 can be made to also conform to HTML 4.01, if the XHTML complies with restrictions made in the XHTML spec [XHTML].

    Such a document can be served as either text/html or application/xhtml+xml. [2]

    Any XHTML document which does not adhere to the restrictions made in the previous reference (for whatever reason) may cause problems with HTML browsers.

    In this case, it is not valid to describe the content as text/html, and should be served as application/xhtml+xml [XHTMLMediaTypes]. In this case, a browser which does not understand the application/xhtml+xml MIME type will generally popup a Save As dialog, or some such catch-all behavior.

    Slime, thanks for taking the time to respond, but your rebuttal only illustrates my point—people don’t seem to understand the purpose of MIME types, or the concept and necessity of content negotiation.

    -Jeremy

    [XHTML]
    http://www.w3.org/TR/xhtml1/#guidelines

    [XHTMLMediaTypes]
    http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020430/

    Copy & paste the code below to embed this comment.
  32. >>A lot of data-types that can be represented well by xml, can’t be done well by dbs and vice-versa.

    Can you give some examples? Why can’t you have columns and rows in a db that are the same as the data in XML?

    Looking at the examples in the article, all them seem far easier to draw out of a db using php/cf/asp. I can’t imagine implementing them on any of my sites. When is it really necessary to use XML? (Again, I can see how services might need a generic text representation of the data.)

    Copy & paste the code below to embed this comment.
  33. I’m looking forward to these examples, as well.

    However, another benefit of XML over a DB is that the XML can be repurposed. What I mean is, you’re not going to send your proprietary DB over to anyone that’d like to use your data.

    What XML gives you is a well-structured way to easily transport any old data in a rigorous format, which is well supported by tools.

    CSVs can’t hold a candle to XML, I hope you’ll agree.

    Copy & paste the code below to embed this comment.
  34. I’ve been looking forward to ALA 147 and have not been disappointed. Thanks Mr. Eisenberg and everyone at ALA for their continued good work!

    I’ve been looking (unsuccessfully) for easy validation for Mac OS X, similar to the setups mentioned for Linux and Windows. I wouldn’t be surprised if it’s built into the OS and I just haven’t found it—a lot of the files (e.g., .plists) are XML and there are directories for DTDs. Any suggestions?

    Copy & paste the code below to embed this comment.
  35. Looks like that’s a part of CSS that I hadn’t read yet. You can extract the attributes for the element in question (show the units=“g” on the <serving> element), but you can’t use CSS2 to reach up and grab units from the <sodium> in <daily-values> when showing the <sodium> for the potato chips.

    Copy & paste the code below to embed this comment.
  36. The tools I used in this article are all written in Java; I don’t see a compelling reason that they would not work on OS X with an appropriate Java runtime. http://developer.apple.com/technotes/tn/tn2031.html says you can use the standard JDK command line tools. My suggestion would be to download the tools, create a shell file, and give it a try.

    Copy & paste the code below to embed this comment.
  37. The tools I used in this article are all written in Java; I don’t see a compelling reason that they would not work on OS X with an appropriate Java runtime. http://developer.apple.com/technotes/tn/tn2031.html says you can use the standard JDK command line tools. My suggestion would be to download the tools, create a shell file, and give it a try.

    Copy & paste the code below to embed this comment.
  38. Okay, I’m curious. I keep seeing XML info, and articles devoted to implementing it.. But there’s an underlying question, at least to my mind.

    Why?

    Yes, I’m serious. I currently use a MySQL database for data storage, and PHP to access that data and “transform” it into whatever I need. I can’t see using XML as any better – In fact, it seems to be much more difficult, to me. So why should I switch?

    It seems to me that this is just another “buzzword” for the marketing types: “Ooh! Let’s use XML!”

    Copy & paste the code below to embed this comment.
  39. My bad. I didn’t see the “next” link, for the second page.

    Copy & paste the code below to embed this comment.
  40. Thanks for the help! I’m working through it and msvalidate works fine. I ended up putting the shell scripts in ~/bin where the worked fine. (They didn’t seem to work in while in ~/xmlapps even though I used chmod and rehash. I’m not sure why not, but there’s always something to learn!)

    Copy & paste the code below to embed this comment.
  41. Is anyone else having trouble getting to the Relax NG spec? The link http://www.oasis-open.org/committees/relax-ng/ returns a “Not found” page.

    Copy & paste the code below to embed this comment.
  42. It works for me.

    I suppose you could be the victim of poorly configured server-side content negotiation.

    Using a tool [DelorieResponse] on the OASIS link, I got the following headers:

    HTTP/1.1 200 OK
    Server: Netscape-Enterprise/4.1
    Date: Tue, 23 Jul 2002 20:09:29 GMT
    Content-type: text/html
    Connection: close

    [DelorieResponse]
    http://www.delorie.com/web/headers.html

    [DelorieRequest]
    http://www.delorie.com:81/some/url.html

    Copy & paste the code below to embed this comment.
  43. The key concept to understand is an xml document actually performs the role of a database (or datasource) which can be queried (using xpath) to find what ever data is desired, and then quickly output and formatted (using xsl). XML is always presented as a document format, which is wrong. XML has nothing to do with documents except that it happens to be easy to create an xml-encoded file with a text editor.

    XML is very good at representing recursion, which makes it unique when compared to other database formats or data representations. But resursion does not have much to offer designers. It is the darling of programmer types like me.

    In the real world it is much more difficult to create useful webapps out of XML/XSLT than it might appear from the article, which glosses over many very problematic issues the main one being where you will get your xml-encoded data from. If from a database, you might as well work directly with JDBC/JSP. XML only becomes practical as a middle layer in a very sophisticated project where the participants understand how to plan extensively—something that almost never happens in the real world! Don’t get bogged down with xml unless you know what you are doing and exactly what benefits your project is supposed to derive from the extra effort involved in supporting an additional layer between a SQL database and your presentation layer.

    Copy & paste the code below to embed this comment.
  44. «Yes, I’m serious. I currently use a MySQL database for data storage, and PHP to access that data and “transform” it into whatever I need. I can’t see using XML as any better – In fact, it seems to be much more difficult, to me. So why should I switch?»

    That’s a good question, and one I’ve thought about to great length. As has been mentioned, a little further up the page, the main benefits are in cross-platform compatibility, ease of data transfer, and the benefits of the data entry style.

    Why not compromise? Have you read up on the PHP functions relating to XML files? If not, take a look at http://www.php.net/manual/en/ref.xml.php. I’ve begun experimenting, just out of interest, with a CMS using XML for data storage and PHP for parsing and displaying the data. It’s quite nifty.

    Copy & paste the code below to embed this comment.
  45. I’ve been programming in PHP for the past three years and didn’t used XML since. A college of mine gave me the URL adres of this article because a friend had an idea for something which had to use XML so I start reading.

    I think XML is simple to use (it sounds simple in my ears) and I think it’s easy to do if you have the knowledge about HTML. It’s like ‘something the same’ as HTML but in advantage using your own defined tags. It was a great help using your examples so I could see how it worked and what it did (I’m a visual person, like to see things happening).

    Thanks for offering this great article to us!

    Bas (Netherlands)

    Copy & paste the code below to embed this comment.
  46. seems like the right place to ask for help…

    I am trying to send an XML formatted job to Jobserve.co.uk without success.

    I have the details of how to do it but it doesn’t explain exactly the syntax required for ASP to connect to their server.

    The manual says that you build up the XML string and then says:


    “Using an HTTP POST program add a HTTP header called “˜SOAPMETHODNAME’ with value “˜PostAdvert_IT’ Post the string representing the SOAP XML to the specified URL.”?

    So far I have got:
    Set xmlHTTP = server.Createobject(“MSXML2.ServerXMLHTTP”)
    xmlHTTP.open “POST”, strAddress, False
    xmlHTTP.setRequestHeader “SOAPMethodName”, strMethod
    xmlHTTP.send strXML
    POST = xmlHTTP.responseText
    Set xmlHTTP = Nothing

    when I run the code I get:

    Error Type:
    msxml3.dll (0×80072EE6)
    The URL does not use a recognized protocol
    /xmltest/process.asp, line 84

    Any Ideas anyone ???? Please help

    Copy & paste the code below to embed this comment.
  47. I have been involved with web design, to some varying degree, throughout it’s inception. I have always been “shy” of XML due to it’s lack of real “community” acceptance. After walking through the examples and using the technologies that Mr. Eisenberg presented, I feel LIBERATED!

    Thank you for showing me an example of XML implementation in terms that even someone such as myself could understand. I now share the exuberance that so many of my colleages have felt for some time: XML is the way!

    I have plans to redesign several of the “content management” systems that I have written in ASP, Perl and Java to reflect my new-found wisdom and revelations about this wonderful technology.

    William Dodson

    Copy & paste the code below to embed this comment.
  48. In the article, Mr. Eisenburg says the following :

    “Once you have created the entire stylesheet in the same directory as the XML file, you can open the XML file in a modern browser such as Mozilla, and it will display the information.”

    What other browsers beside Mozilla supports this? I found that Opera 6.0 was the only browser aside from Mozilla that was able to support this option. Explorer 6.0 did display SOME of it, but nothing usable.

    Copy & paste the code below to embed this comment.
  49. This is a really helpful article—thanks for writing it!
    One thing, however—You’ve included links to XML Tools for Linux & Windows (no surprise there), but what about XML Tools for use on Mac OS X?

    Thanks.

    Ethan

    Copy & paste the code below to embed this comment.
  50. Well, I fully stand behind the idea of separating presentation and content, and this can be seen on my website(which you can get to by going to the posted URL), but instead of using someone else’s programs to transform the content into the presentation, I do it myself on my website…or at least I am in the process of doing so on my site…
    I parse the corresponding file for the page, and depending on what content is in between which tags, I display it somewhere, somehow on the page. This approach is time- consuming because it requires the content, and the presentation to be separate, with the programming tying them together(correctly!). Nonetheless, this is my preferred approach…until i can learn to use XSL/XSLT to do the CSS and the PHP/Perl’s work for me!!!

    please feel free to e-mail me

    Copy & paste the code below to embed this comment.
  51. Quote:
    Why not compromise? Have you read up on the PHP functions relating to XML files? If not, take a look at http://www.php.net/manual/en/ref.xml.php. I’ve begun experimenting, just out of interest, with a CMS using XML for data storage and PHP for parsing and displaying the data. It’s quite nifty.
    ————————————

    hmm. See, it’s not the implementation I’m having problems with, it’s the whole concept. I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.

    I really can’t see a reason to change what I already know quite well, that works very well for every instance that I can come up with, in order to use XML.

    And if you say “You don’t always have access to PHP and a DB”.. Well, if you don’t have DB access on your webhost… what are the chances of getting them to add the PHP extensions? It’s not installed by default for PHP..

    (shrug)

    Copy & paste the code below to embed this comment.
  52. The decision to use XML or a relational DB should ideally be based on the nature of the data. Some data fits better into a relational DB, some fits better into XML, some can be represented using either pretty much equally.

    Data that can be represented using just a single table can generally be represented using either technology with no compelling wins either way. (More speed with a DB, more portability with XML, but nothing much beyond that.)

    Data that would be represented in a relational DB using two or more tables that get joined together is probably best kept in a DB. The DB will handle all the join operations, referential integrity, etc., more easily than will be possible using XML. (You could do it with XML, but you’d probably end up writing a lot of code yourself.)

    Data that consists largely of text with markup is best represented using XML or some other form of markup language. If you’ve got a document that could have an arbitrary number of sections, or styles appearing at arbitrary points in the text, you really need to use a markup language of some sort. There isn’t any way to represent that information using rows and columns. That goes double for recursive data structures such as subsections.

    Relational DBs and markup languages represent two different philosophies about the structure of data. Neither of them is suitable for all data.

    Apart from that, I personally like text files that I can hack by hand. I get worried when I have important data that requires a particular program to access.

    Copy & paste the code below to embed this comment.
  53. As a follow-up to my previous post [Post], I’d like to point out that XHTML 2 is incompatible with XHTML 1.0 [XHTML2], and is certainly incompatible with HTML. Further, CSS 2.1 is not backwards compatible with even CSS 2.0 [CSS2.1].

    Further, here’s evidence that even people that arguably “Get it” don’t understand MIME types. [DiveInto XHTML2] “My fresh IE 5.5 install asks to download the page…”. The Save As dialog in IE is popped up for any unknown MIME Type (after IE’s sniffing algorithm fails). [IEMIME]

    (As a side note, it appears the URL Mark references returns text/html now. I am pretty positive that he was getting the dialog because at the time he tested, it was (properly) returning application/xhtml+xml, and it has since been changed to return text/html.)

    It is not my intention to harm anyone in these statements. I am simply trying to call attention to the need for content negotiation, and to the fact that “forward compatibility” can’t be strictly counted on.

    A mechanism for negotiating representations based on client capabilities is necessary. In fact, Mark’s closing (sarcastic) note ” Looks great in Opera and Mozilla, though. That does it. I’m converting all my pages to XHTML 2.0. Accessibility be damned. Backward compatibility be damned. IE 5 be damned.” points to this fact, though he may not realize it.

    Please… think about it.

    -Jeremy

    [Post]
    http://www.alistapart.com/stories/usingxml/discuss/2/#ala-731

    [XHTML2]
    http://www.w3.org/TR/xhtml2/
    (Sorry, I can’t point out specific examples of non-conformance here.. They’ve not included a change summary, and I can’t do the research needed to gather evidence just now)

    [CSS2.1]
    http://www.w3.org/TR/2002/WD-CSS21-20020802/about.html#q1

    [DiveInto XHTML2]
    http://diveintomark.org/archives/2002/08/06.html#changes_in_xhtml_20

    [IEMIME]
    http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/moniker/overview/appendix_a.asp

    Copy & paste the code below to embed this comment.
  54. Hi
    I get a Microsoft Jscript runtime error saying Null is not a null object, while I try to run Nutrition.svg. Help!

    Copy & paste the code below to embed this comment.
  55. This XML article was reccomended to me as I am in a hurry to make its R&D for our web division. I was given a 500-page book on the subject, very good nonetheless, but J.David’s article does what that book does in much less time and without any unnecesary jargon speak or hype. Now I can say I really get what XML is about! I just hope I can understand XML’s role in Flash as well, in the future.

    Copy & paste the code below to embed this comment.
  56. I’ve fixed the XSLT transformation to SVG and the SVG file.
    Until I can get everything uploaded, the new version is at

    http://catcode.com/nutrition.zip
    Copy & paste the code below to embed this comment.
  57. I went through the process of creating all of the parsed files. I can be a goon on the computer, but i found it to be a breeze. Very cool application of the technologies. Well written article too!

    Copy & paste the code below to embed this comment.
  58. Do you know of an engine that would take an xslt style sheet and parse the data into a word doc too?

    Copy & paste the code below to embed this comment.
  59. This was good overview. An example with DTD and XML Schema could also throw some light to those grammars.

    Copy & paste the code below to embed this comment.
  60. Hi,
    Good article – much food for thought (pardon, no pun intended).
    However, the msvalidate reported a JRE clash problem. I’ve got version 1.4.1_01 of the J2sdk & JRE on my machine. Running the msvalidate.bat reported that I needed the JRE1.3.
    I resolved the issue by going into regedit & changing the JRE current version from 1.4 to 1.3. Obviously not ideal but it works.
    Thanks for your hard work,
    Eddie

    Copy & paste the code below to embed this comment.
  61. So that’s how you use XML. I keep hearing how great it is but, up until this point, had no idea why creating your own markup was a good thing. Great article. Thanks.

    Even so, I question the usefulness of it. If you ask me XML seems to be a fancy way of managing data in text files. For someone who uses only text files that may be a good thing. But as Twyst says “I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.”

    I read colin_zr’s response with interest. He said: “There isn’t any way to represent that information using rows and columns. ” Ok, can someone provide an example of this. Any examples I’ve seen could all easily be represented in a DB. In fact, I seem to remember reading a tutorial somewhere that described how to use XML to display data in a HTML table (using php, I think). Kind of pointless, if you ask me.

    Now, I’m not saying all XML is pointless. This article showed the value of XML for those who may not have access to a DB for whatever reason; or those you do not want to go beyond markup (in other words: those who shy away from scripting lanuages such as php, asp, …). I just don’t see it’s value for those who do, such as myself. Maybe, someday, somewhere, someone will provide an example that simply can not be implemented into a relational DB. Until then, I will set XML aside.

    Copy & paste the code below to embed this comment.
  62. I would like to give a nod to Jeremy Dunck for coming to the same conclusion I did about this ariticle. It is a perfect primer for an explanation on the use of content negotiation. Based on a user agent’s (i.e. brower) capabilties you can serve the document as any one of the types listed.

    These capabilties include SUPPORTED MIME TYPES and supported languages. Therefore if a browers says it supports ‘en-us’ (United Sates English) and your site has THE SAME content in two languages, say en (English), fr (French), you can serve the apportiate one to the user (in this case the english one). No need for a new URI or to ask the user which version they prefer.

    In terms of the article you can also serve documents by MIME type based not only on if a type is supported but also by the qualty of that support.

    In a real world example IE supports text/html (HTML) and text/plain (Text) and Mozilla supports text/html, application/xhtml+xml (XHTML) and text/plain (Text) . Mozilla supports XHTML with a quality of 1 (Best) and HTML with a quality of 0.9. Therefore in IE your only options are to transform the xml document into HTML or text based on stated support, but in Mozilla you have more options. You could send the document either as XHTML, HTML or text. Since XHTML has a higher quailty for XHTML you would probably want to transform the document to XHTML and send it as such. Since (X)HTML is usally preffered over raw text we won’t send the text version to either user agent.

    To further extend this exmaple if user agent supports image/svg+xml (SVG) you can send it the SVG document instead or application/x-pdf (PDF) for the PDF document.

    I sure much of this post is somewhat short sided but the bottom line is one URI can serve multple versions of a RESOURCE based on what’s avalible and what the user agent’s capbilties are. For the most part this goes unused, but this is how HTTP is DESGINED to be used. And to be honest this can all be done today and is support unfortunly most servers make this diffucult as they ARE NOT desgined to work this way, but like most things their are ways around it.

    Copy & paste the code below to embed this comment.
  63. :)

    Thank you

    Copy & paste the code below to embed this comment.
  64. Is a DTD file always needed?

    From the xml file I saw something like:

    <!—
    <food>
    <name></name>
    <mfr></mfr>
    <serving units=“g”></serving>
    <calories total=”” fat=”“>
    <total-fat></total-fat>
    <saturated-fat></saturated-fat>
    <cholesterol></cholesterol>
    <sodium></sodium>
    <carb></carb>
    <fiber></fiber>
    </protein>
    <vitamins>
    <a></a>
    <c></c>
    </vitamins>
    <minerals>
    <ca></ca>
    <fe></fe>
    </minerals>
    </food>
    —>

    which seems like to be a template.

    Can we automate the “record” generation process by having a definition file?

    Copy & paste the code below to embed this comment.

  65. fop -xml nutrition.xml -xsl nutrition_fo.xslt -pdf nutrition.pdf

    The result is a PDF file; it produces pages that are approximately 8 centimeters wide and 9 centimeters high, which fits comfortably into a shirt pocket.
    ”—from the article

    Now how can I use this with say ASP or ASP.Net or Java to generate pdf file on the fly… say for example a customer order some item from a e-commerce site… I want to be able to generate a pdf version of pre-designed templated invoice with their details filled in dynamiclly.

    Is this possible is so how?

    Copy & paste the code below to embed this comment.
  66. I’ve only just stumbled across this article, and it has been great help. The software the author linked to is very useful on PC and *nix, but I’m using a Mac. Does anyone know of any comparable software for me? (I suspect the Linux stuff can be made to work in OS X, but I don’t know how). Please email me if you have any ideas.

    Copy & paste the code below to embed this comment.
  67. Grebmil: A response a few months late…

    Ok, fair enough. Most of the examples you see in these introductory articles involve record-oriented data. But that’s not the only kind of data.

    Here’s an example of some data that really needs to be stored in markup rather than in a relational model:


    Hello, my name is <name>colin</name>. I like <abbreviation>XML</abbreviation>.
    </paragraph>

    You can’t take data like that, make a field for names and a field for abbreviations, and force it into third normal form. That’s just not the structure of the data.

    To give you another illustration, think about how you’d take an HTML file and represent it in a database. Would you have a table of div elements, a table of h1 elements, a table of p elements? What would the records in those tables look like? How would you indicate all the p elements that belonged within a specific div? And those are the easy bits. Just wait till we get to inline elements…

    Obviously that’s silly.

    What you might well do is take the contents of the HTML file, or perhaps a fragment of it, and put it into a field of a database. But then you’ve still got all the HTML markup within that field. Markup just happens to be the best way to represent that data.

    Copy & paste the code below to embed this comment.

  68. Hello, my name is <name>colin</name>. I like <abbreviation>XML</abbreviation>.
    </paragraph>

    Is that the data itself, or is it really a list of people that like abbreviations?

    I personally do not see xml as a way to deal with millions of records over hundreds of tables. I will more than happily export a subset of data to someone else in xml in any way they want it. However, that does not make my application any more a user of xml than if the two of us had agreed to use pig-latin or parenthesis delimited text files with a header row in rot13.

    I do think xml has its place, it just doesn’t overlap with my domain except as another export format. I haven’t really had to deal with importing xml because everybody else uses databases which means they’re just as happy to give me a few csv files or a direct tap into their database.

    For database dumps, a CSV file with a header row is far more space/bandwidth contientious than XML.

    To people like me, who use SQL, XML seems very clunky and broken.
    To people who like XML, I think SQL and RDBs look big and overpowered for their needs.

    IMarv

    Copy & paste the code below to embed this comment.
  69. Twyst(e) is right, I’d say,

    I certainly don’t want to rely on client side functions
    (ever heard of browser quirks? do you really think there’ll be no more in times to come?)
    when I can access reliable server side functions (PHP, MySQL).

    >You don’t always have access to PHP and a DB…
    I guess angelfire/lycos account holders sharing there pastry recipes with the world
    are not the target audience here.
    (no offense: private homepages/pastries are OK)

    Marek

    Copy & paste the code below to embed this comment.