Using XML

by J. David Eisenberg

69 Reader Comments

Back to the Article
  1. Quote:
    Why not compromise? Have you read up on the PHP functions relating to XML files? If not, take a look at http://www.php.net/manual/en/ref.xml.php. I’ve begun experimenting, just out of interest, with a CMS using XML for data storage and PHP for parsing and displaying the data. It’s quite nifty.
    ————————————

    hmm. See, it’s not the implementation I’m having problems with, it’s the whole concept. I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.

    I really can’t see a reason to change what I already know quite well, that works very well for every instance that I can come up with, in order to use XML.

    And if you say “You don’t always have access to PHP and a DB”.. Well, if you don’t have DB access on your webhost… what are the chances of getting them to add the PHP extensions? It’s not installed by default for PHP..

    (shrug)

    Copy & paste the code below to embed this comment.
  2. The decision to use XML or a relational DB should ideally be based on the nature of the data. Some data fits better into a relational DB, some fits better into XML, some can be represented using either pretty much equally.

    Data that can be represented using just a single table can generally be represented using either technology with no compelling wins either way. (More speed with a DB, more portability with XML, but nothing much beyond that.)

    Data that would be represented in a relational DB using two or more tables that get joined together is probably best kept in a DB. The DB will handle all the join operations, referential integrity, etc., more easily than will be possible using XML. (You could do it with XML, but you’d probably end up writing a lot of code yourself.)

    Data that consists largely of text with markup is best represented using XML or some other form of markup language. If you’ve got a document that could have an arbitrary number of sections, or styles appearing at arbitrary points in the text, you really need to use a markup language of some sort. There isn’t any way to represent that information using rows and columns. That goes double for recursive data structures such as subsections.

    Relational DBs and markup languages represent two different philosophies about the structure of data. Neither of them is suitable for all data.

    Apart from that, I personally like text files that I can hack by hand. I get worried when I have important data that requires a particular program to access.

    Copy & paste the code below to embed this comment.
  3. As a follow-up to my previous post [Post], I’d like to point out that XHTML 2 is incompatible with XHTML 1.0 [XHTML2], and is certainly incompatible with HTML. Further, CSS 2.1 is not backwards compatible with even CSS 2.0 [CSS2.1].

    Further, here’s evidence that even people that arguably “Get it” don’t understand MIME types. [DiveInto XHTML2] “My fresh IE 5.5 install asks to download the page…”. The Save As dialog in IE is popped up for any unknown MIME Type (after IE’s sniffing algorithm fails). [IEMIME]

    (As a side note, it appears the URL Mark references returns text/html now. I am pretty positive that he was getting the dialog because at the time he tested, it was (properly) returning application/xhtml+xml, and it has since been changed to return text/html.)

    It is not my intention to harm anyone in these statements. I am simply trying to call attention to the need for content negotiation, and to the fact that “forward compatibility” can’t be strictly counted on.

    A mechanism for negotiating representations based on client capabilities is necessary. In fact, Mark’s closing (sarcastic) note ” Looks great in Opera and Mozilla, though. That does it. I’m converting all my pages to XHTML 2.0. Accessibility be damned. Backward compatibility be damned. IE 5 be damned.” points to this fact, though he may not realize it.

    Please… think about it.

    -Jeremy

    [Post]
    http://www.alistapart.com/stories/usingxml/discuss/2/#ala-731

    [XHTML2]
    http://www.w3.org/TR/xhtml2/
    (Sorry, I can’t point out specific examples of non-conformance here.. They’ve not included a change summary, and I can’t do the research needed to gather evidence just now)

    [CSS2.1]
    http://www.w3.org/TR/2002/WD-CSS21-20020802/about.html#q1

    [DiveInto XHTML2]
    http://diveintomark.org/archives/2002/08/06.html#changes_in_xhtml_20

    [IEMIME]
    http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/moniker/overview/appendix_a.asp

    Copy & paste the code below to embed this comment.
  4. Hi
    I get a Microsoft Jscript runtime error saying Null is not a null object, while I try to run Nutrition.svg. Help!

    Copy & paste the code below to embed this comment.
  5. This XML article was reccomended to me as I am in a hurry to make its R&D for our web division. I was given a 500-page book on the subject, very good nonetheless, but J.David’s article does what that book does in much less time and without any unnecesary jargon speak or hype. Now I can say I really get what XML is about! I just hope I can understand XML’s role in Flash as well, in the future.

    Copy & paste the code below to embed this comment.
  6. I’ve fixed the XSLT transformation to SVG and the SVG file.
    Until I can get everything uploaded, the new version is at

    http://catcode.com/nutrition.zip
    Copy & paste the code below to embed this comment.
  7. I went through the process of creating all of the parsed files. I can be a goon on the computer, but i found it to be a breeze. Very cool application of the technologies. Well written article too!

    Copy & paste the code below to embed this comment.
  8. Do you know of an engine that would take an xslt style sheet and parse the data into a word doc too?

    Copy & paste the code below to embed this comment.
  9. This was good overview. An example with DTD and XML Schema could also throw some light to those grammars.

    Copy & paste the code below to embed this comment.
  10. Hi,
    Good article – much food for thought (pardon, no pun intended).
    However, the msvalidate reported a JRE clash problem. I’ve got version 1.4.1_01 of the J2sdk & JRE on my machine. Running the msvalidate.bat reported that I needed the JRE1.3.
    I resolved the issue by going into regedit & changing the JRE current version from 1.4 to 1.3. Obviously not ideal but it works.
    Thanks for your hard work,
    Eddie

    Copy & paste the code below to embed this comment.
  11. So that’s how you use XML. I keep hearing how great it is but, up until this point, had no idea why creating your own markup was a good thing. Great article. Thanks.

    Even so, I question the usefulness of it. If you ask me XML seems to be a fancy way of managing data in text files. For someone who uses only text files that may be a good thing. But as Twyst says “I don’t need to use the XML functions – I already use the MySQL functions. I can’t see any kind of real-world usage for this. If I need XML, I generate it from the DB. If I need to customize the display for a particular browser, I do so with PHP.”

    I read colin_zr’s response with interest. He said: “There isn’t any way to represent that information using rows and columns. ” Ok, can someone provide an example of this. Any examples I’ve seen could all easily be represented in a DB. In fact, I seem to remember reading a tutorial somewhere that described how to use XML to display data in a HTML table (using php, I think). Kind of pointless, if you ask me.

    Now, I’m not saying all XML is pointless. This article showed the value of XML for those who may not have access to a DB for whatever reason; or those you do not want to go beyond markup (in other words: those who shy away from scripting lanuages such as php, asp, …). I just don’t see it’s value for those who do, such as myself. Maybe, someday, somewhere, someone will provide an example that simply can not be implemented into a relational DB. Until then, I will set XML aside.

    Copy & paste the code below to embed this comment.
  12. I would like to give a nod to Jeremy Dunck for coming to the same conclusion I did about this ariticle. It is a perfect primer for an explanation on the use of content negotiation. Based on a user agent’s (i.e. brower) capabilties you can serve the document as any one of the types listed.

    These capabilties include SUPPORTED MIME TYPES and supported languages. Therefore if a browers says it supports ‘en-us’ (United Sates English) and your site has THE SAME content in two languages, say en (English), fr (French), you can serve the apportiate one to the user (in this case the english one). No need for a new URI or to ask the user which version they prefer.

    In terms of the article you can also serve documents by MIME type based not only on if a type is supported but also by the qualty of that support.

    In a real world example IE supports text/html (HTML) and text/plain (Text) and Mozilla supports text/html, application/xhtml+xml (XHTML) and text/plain (Text) . Mozilla supports XHTML with a quality of 1 (Best) and HTML with a quality of 0.9. Therefore in IE your only options are to transform the xml document into HTML or text based on stated support, but in Mozilla you have more options. You could send the document either as XHTML, HTML or text. Since XHTML has a higher quailty for XHTML you would probably want to transform the document to XHTML and send it as such. Since (X)HTML is usally preffered over raw text we won’t send the text version to either user agent.

    To further extend this exmaple if user agent supports image/svg+xml (SVG) you can send it the SVG document instead or application/x-pdf (PDF) for the PDF document.

    I sure much of this post is somewhat short sided but the bottom line is one URI can serve multple versions of a RESOURCE based on what’s avalible and what the user agent’s capbilties are. For the most part this goes unused, but this is how HTTP is DESGINED to be used. And to be honest this can all be done today and is support unfortunly most servers make this diffucult as they ARE NOT desgined to work this way, but like most things their are ways around it.

    Copy & paste the code below to embed this comment.
  13. :)

    Thank you

    Copy & paste the code below to embed this comment.
  14. Is a DTD file always needed?

    From the xml file I saw something like:

    <!—
    <food>
    <name></name>
    <mfr></mfr>
    <serving units=“g”></serving>
    <calories total=”” fat=”“>
    <total-fat></total-fat>
    <saturated-fat></saturated-fat>
    <cholesterol></cholesterol>
    <sodium></sodium>
    <carb></carb>
    <fiber></fiber>
    </protein>
    <vitamins>
    <a></a>
    <c></c>
    </vitamins>
    <minerals>
    <ca></ca>
    <fe></fe>
    </minerals>
    </food>
    —>

    which seems like to be a template.

    Can we automate the “record” generation process by having a definition file?

    Copy & paste the code below to embed this comment.

  15. fop -xml nutrition.xml -xsl nutrition_fo.xslt -pdf nutrition.pdf

    The result is a PDF file; it produces pages that are approximately 8 centimeters wide and 9 centimeters high, which fits comfortably into a shirt pocket.
    ”—from the article

    Now how can I use this with say ASP or ASP.Net or Java to generate pdf file on the fly… say for example a customer order some item from a e-commerce site… I want to be able to generate a pdf version of pre-designed templated invoice with their details filled in dynamiclly.

    Is this possible is so how?

    Copy & paste the code below to embed this comment.
  16. I’ve only just stumbled across this article, and it has been great help. The software the author linked to is very useful on PC and *nix, but I’m using a Mac. Does anyone know of any comparable software for me? (I suspect the Linux stuff can be made to work in OS X, but I don’t know how). Please email me if you have any ideas.

    Copy & paste the code below to embed this comment.
  17. Grebmil: A response a few months late…

    Ok, fair enough. Most of the examples you see in these introductory articles involve record-oriented data. But that’s not the only kind of data.

    Here’s an example of some data that really needs to be stored in markup rather than in a relational model:


    Hello, my name is <name>colin</name>. I like <abbreviation>XML</abbreviation>.
    </paragraph>

    You can’t take data like that, make a field for names and a field for abbreviations, and force it into third normal form. That’s just not the structure of the data.

    To give you another illustration, think about how you’d take an HTML file and represent it in a database. Would you have a table of div elements, a table of h1 elements, a table of p elements? What would the records in those tables look like? How would you indicate all the p elements that belonged within a specific div? And those are the easy bits. Just wait till we get to inline elements…

    Obviously that’s silly.

    What you might well do is take the contents of the HTML file, or perhaps a fragment of it, and put it into a field of a database. But then you’ve still got all the HTML markup within that field. Markup just happens to be the best way to represent that data.

    Copy & paste the code below to embed this comment.

  18. Hello, my name is <name>colin</name>. I like <abbreviation>XML</abbreviation>.
    </paragraph>

    Is that the data itself, or is it really a list of people that like abbreviations?

    I personally do not see xml as a way to deal with millions of records over hundreds of tables. I will more than happily export a subset of data to someone else in xml in any way they want it. However, that does not make my application any more a user of xml than if the two of us had agreed to use pig-latin or parenthesis delimited text files with a header row in rot13.

    I do think xml has its place, it just doesn’t overlap with my domain except as another export format. I haven’t really had to deal with importing xml because everybody else uses databases which means they’re just as happy to give me a few csv files or a direct tap into their database.

    For database dumps, a CSV file with a header row is far more space/bandwidth contientious than XML.

    To people like me, who use SQL, XML seems very clunky and broken.
    To people who like XML, I think SQL and RDBs look big and overpowered for their needs.

    IMarv

    Copy & paste the code below to embed this comment.
  19. Twyst(e) is right, I’d say,

    I certainly don’t want to rely on client side functions
    (ever heard of browser quirks? do you really think there’ll be no more in times to come?)
    when I can access reliable server side functions (PHP, MySQL).

    >You don’t always have access to PHP and a DB…
    I guess angelfire/lycos account holders sharing there pastry recipes with the world
    are not the target audience here.
    (no offense: private homepages/pastries are OK)

    Marek

    Copy & paste the code below to embed this comment.