Validating a Custom DTD

by J. David Eisenberg

25 Reader Comments

Back to the Article
  1. Please bear in mind, I’ve been a standards buff for quite a while, so I’m not ragging on the ideal of clean code that validates. That having been said:

    The idea behind the standards movement was to ensure that developers could code efficiently by leveraging a common set of languages that all browsers would respond predicatably (and identically) to. This is obviously an ideal, and we aren’t there yet, but we’re a heckuva lot closer than we were 5 years ago (thanks in no small part to this magazine and its founders).

    Now that we have these standards, why would we then splinter, and create “personal” standards?

    Validation in and of itself has little practical value. It is a goal I strive for, to be sure, and it certainly feels good (it’s a job well done on large-scale sites, that’s certain).

    But if the reason I don’t make it is either not under my control (CMS output) or if it makes the site better (as in the custom attributes mentioned in the previous article, or hacks required to overcome browser bugs) I’d rather forgo validation and uphold the common standard we’ve fought for.

    Copy & paste the code below to embed this comment.
  2. Tim’s got an important point about the danger of splintering standards that took a lot of work to get in place.  On the other hand, I see a very practical use for this technique.

    My team produces e-learning modules.  The practical value of the standards for us is quality control: if every html page in the module validates, that’s one more assurance the client isn’t getting something that’s broken.

    But: our work requires that we use <embed>.  One of our clients still has some of its employees using Netscape 4.7.  <object> won’t cut it.  So when we validate, we ignore the multiple errors generated by the <embed> tag.

    That means: every time we validate we need to manually read the error messages and make sure that they came from <embed> and not from something else.  Which means that someday, we’re going to trip up and ignore an error we should have fixed.

    Here’s where the custom DTD comes in: we validate against a custom DTD that allows the <embed> tag.  When we validate, <embed> is silently passed by, and we know that any errors we see are errors we need to fix.  We’ll likely do the same for the custom tags generated by Macromedia’s CourseBuilder (although placing them in a custom namespace would be more in the spirit of XHTML).

    That said, it’s still troubling to work this way.  I haven’t tried it out, but I’m guessing that some browsers will remain in quirks mode unless they see one of the public doctypes.  For cross-platform consistency, we need to avoid quirks mode.  For this reason, we may end up validating against our custom DTD but delivering pages with the XHTML Strict or Transitional doctype.  It ain’t perfect, but it’ll do what we need to do for our clients.

    Copy & paste the code below to embed this comment.
  3. Tim Murtaugh wrote:

    >>>Now that we have these standards, why would we then splinter, and create “personal” standards?<<<

    It’s not splintering any more than using classes and id:s in regular HTML is. XML is the standard, and the whole point of XML (well, one of them) is that you can create your own elements and define your own languages. This is not in any way against the ideal of common standards.

    The reason we’ve been taught not to use proprietary markup like <embed> and <marquee> isn’t that inventing elements without W3C’s blessing is evil in itself, but because they weren’t in the doctype. If you write your own doctype, the new elements are in the doctype. That’s the only important difference; the XML reader knows how to treat your elements.

    However, writing your own doctypes is, with a few exceptions like this article, probably of little use if you’re just targetting common web browsers, since the most common of them all, and many others, won’t know what to do with them. “Doctype switching” (browsers changing their rendering depending on the doctype) probably won’t work either.

    Footnote: Of course, what we refer to as “standards” on the web usually aren’t actual standards, but that’s beside the point.

    Copy & paste the code below to embed this comment.
  4. Peterman wrote:
    >Footnote: Of course, what we refer to as “standards” on the web usually aren’t actual standards, but that’s beside the point.

    True. But if The WaSP hadn’t persuaded designers, developers, and browser makers to treat these W3C and ECMA specifications as “baseline standards,” our support for CSS, ECMAScript, XHTML and the DOM would probably be little better than it was in 1998, and we’d be coding our sites 28 ways, to accommodate multiple generations of proprietary browserdom.

    Copy & paste the code below to embed this comment.
  5. What’s the standard here?

    A built-in DTD or extending one in the intended manner?

    As long as everything remains internally consistant I don’t see the problem.

    Copy & paste the code below to embed this comment.
  6. Just as an example of a site that do use custom a DTD.

    The swedish ibm site uses a custom DTD (ibmxhtml1-transitional), the newer american version uses a standard W3C DTD (xhtml1-transitional). When I visit the two different sites in FF and looking at the properties it clearly states what mode the browser is in. So visiting the swedish site FF is in quirk mode. So if using a custon DTD it would make the browser slower and render the pages with quirk mode in mind, right?

    http://www.ibm.com/us/ 
    http://www.ibm.com/se/ 


    Jens Wedin

    Copy & paste the code below to embed this comment.
  7. I’m no XML guru, but why can’t we do this with an XSD? I’m envisioning something like this:

    <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
    <html xmlns:myforms=“somexsd.xsd”>
    <head>

    </head>
    <body>
    <form>
    <input type=“text” name=“bob” myforms:required=“whatever”/>
    </form>
    </body>
    </html>

    To me, that would be the optimal approach, since we’re still including the original DTD, just extending it with our own XSD. Is there a reason this won’t work?

    Copy & paste the code below to embed this comment.
  8. Actually, custom Doctypes will trigger standards mode in Mozilla/Firefox (as well as in most other browsers, although haven’t tested this with safari).

    The IBM doctype in question is listed as recognized by Mozilla and is set to trigger quirks mode.

    http://www.mozilla.org/docs/web-developer/quirks/doctypes.html

    On the same page under Full Standards Mode you will see listed:

    “Any ‘DOCTYPE HTML SYSTEM’ as opposed to ‘DOCTYPE HTML PUBLIC’, except for the IBM doctype noted below”

    Copy & paste the code below to embed this comment.
  9. Coincidentally, I’ve just released a cross-platform graphical tool called GooeySAX that is a wrapper for the Xerces tool referenced in this article. GooeySAX allows you to validate your custom DTDs easily. It is also great for those times when your document is only available on a private network, and thus unreachable by the W3C’s web-based validation tool.

    http://ditchnet.org/gooeysax

    Copy & paste the code below to embed this comment.
  10. Would it be possible to create a custom DTD that will validate a XHTML document containing Flash?

    Copy & paste the code below to embed this comment.
  11. Try to validate this site: http://bednarz.nl Link to validatation: http://validator.w3.org/check?uri=http://bednarz.nl/
    Funny isnt it?

    Copy & paste the code below to embed this comment.
  12. Hey,

    Just off the top of my head, removing the ugly “]>” at the top of documents with internal subsets can be done with a little [removed]

    with (document.body) for (var i=0;i<2;++i) if (/]>/.test(childNodes[ i ].data))removeChild(childNodes[ i ]);

    Of course, it’s not ideal, but seems to do the job. It loops over the first two elements since different browsers place the characters in different positions. Originally I thought to simply strip the first two body child nodes, but realized that’s more prone to worst-case scenarios.

    The script can even be placed in the head tags without being attached to an onload event, since it depends on contents that’re already parsed. Should remove the node as soon as it hits the script.

    Copy & paste the code below to embed this comment.
  13. Why go through all this trouble of creating custom stuff? Is your client really ready to pay for all this? Is it justified over time? Will the site structure stay the same long enough for this to work? I think for smaller websites this takes way to much time.

    Do like the idea though

    Copy & paste the code below to embed this comment.
  14. I’m confused?  Why not use attributes in a different namespace?  EG:

    <input type=“text” name=“yourName” myns:required=“true” />

    That way you aren’t touching the XHTML DTDs at all…

    Copy & paste the code below to embed this comment.
  15. So where can I find a module, library or app that will take some of my database schemas from my content manglement system and generate a valid DTD so I don’t shoot my food clean off (something a bit … well alot less expensive than XMLSpy ya’know) ?

    Copy & paste the code below to embed this comment.