Validating a Custom DTD

by J. David Eisenberg

25 Reader Comments

Back to the Article
  1. For implementation of certain parts Web Forms 2.0 (http://www.whatwg.org/specs/)? Most of these attributes could be processed by javascript and servers instead of the UA. Now we’re starting to see the true benefits of XML. But I wonder what are the benefits here for sites serving text/html? Technically its allowed for XHTML 1.0 and javascript should work the same, but all this effort really does is trick the validating programs—tag soup is tag soup is tag soup. Altering your DTD won’t make a difference one way or another unless browsers look at your page as XML, which only the most zen developers even dare think about.
    Copy & paste the code below to embed this comment.
  2. I agread with Ryan ! This solution is realy interresting with the XML technologie (and not HTML)... unfortunatly, there are few server able to display HTML page as XML ! (it’s a pity because it’s not so difficult) Just a small critic about the article itself, it were better to build your exemples with the XHTML 1.1 recomendation that is technicaly build to make such things. The modularization is an interresting way to customize DTD.
    Copy & paste the code below to embed this comment.
  3. To validate a custom DTD you don’t have to go searching for a stand-alone validator. You can still validate online. I’ve discussed it at ASP.NET Cure: Custom DTD for XHTML. The Web Design Group validator handles custom DTDs just fine.
    Copy & paste the code below to embed this comment.
  4. Of course, does your site validate against your own custom DTD. But I thought, that the Idea behind that validating of your Online-Document against a public Schema should be, that we show, that we care for COMMON standarts. As the local Part of a DTD possibly overwrites ANY element- or Attribute-Declartion with your original DTD and designing your own DTD gives you the same powers, anyone could add, modify or delete ANY Element or Attribute and still your documents validate perfectly.
    As browser tend to ignore unknown tags and even allow you to style them with CSS everyone can happily design his own Markuplanguage. This is, what the XML-idea was all about…. The only interesting question is: “Are there any good reasons to create your docuements in away, that they validate against PUBLIC COMMON Schemas?” Greetings Benjamin Anyway, still the best SGML/XML/DTD Parser/Validator is James Clarks “nsgmls”:
    http://www.jclark.com/sp/index.htm
    Copy & paste the code below to embed this comment.
  5. Support for XHTML and “at-the-browser” XML parsing is very limited right now, which is why in the battle of “HTML vs. XHTML,” HTML is the winner because it is more widely supported. I completely agree that XML is very powerful and flexible, but you should parse your XML and output it as HTML 4.01 strict until at-the-browser parsing is not so iffy. Otherwise, an excellent article at explaining something that is not very well-known.
    Copy & paste the code below to embed this comment.
  6. After I added my
    <!DOCTYPE html SYSTEM “dtd/xhtml1-custom.dtd” >
    tag to the top of my document a curious oddity manifested itself. All my class attribute values became case sensitive. I suppose it’s my own fault for putting capitisation on my class name (coretable vs CoreTable) but it’s still a bit weird though imho.
    Copy & paste the code below to embed this comment.
  7. If I ever wanted to use this tip, I’d really want to use the inline DTD stuff instead of creating a new page and referencing that. But of course the issue with ]> showing up in browsers is bad. I haven’t actually tested this yet, but I would expect that serving your page with a MIME type of ‘text/xml+xhtml’ would fix this issue for at least Safari and Mozilla, since, IIRC, using that MIME type causes those browsers to use a real XML parser. Of course, the downside is if the page isn’t well-formed then it’s not displayed at all, but the upside is you can do whatever you want that’s legal in XML, including things like declaring new entities inline in the DOCTYPE and using them in the page (which might be handy).
    Copy & paste the code below to embed this comment.
  8. Just to tickle Kevin, the fact that a bad-formed XML document is not displayed is not a downside but a plus.  Each and every other programming language won’t let you compile/execute fautive code.  Everything is simpler that way, there’s no guesswork involved.  Yes, at first it seems harder for the programmer, but in the end it makes everything easier because you don’t waste time battling different browser interpretations of a missing </div>.
    Copy & paste the code below to embed this comment.
  9. How do custom DTDs affect doctype sniffing (for purposes of deciding between rendering in quirks or standard mode)? Do browsers sniff for *any* doctype or specific ones? I.e. will using custom DTDs make the rendering model behave differently? Also, why can’t we use namespace switching instead? As long as IE (and other browsers) simply ignore the switching, and the validator is pacified by them, everyone is happy, right?
    Copy & paste the code below to embed this comment.
  10. Ah, that explains why I though I was going crazy in 2003 and could not figure out why the “]>” appeared. Thus I finally use a separate DTD Fragment file and used: <![INCLUDE[
    <!ENTITY % xhtmldtd
      PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
      “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd” >
    %xhtmldtd;]]> However, you’re correct when served as “application/xhtml+xml” no appearance of “]>” on canvas. Though still I am happy with the method I finally used 2003 as it was cleaner for multiple files.
    Copy & paste the code below to embed this comment.
  11. Please bear in mind, I’ve been a standards buff for quite a while, so I’m not ragging on the ideal of clean code that validates. That having been said: The idea behind the standards movement was to ensure that developers could code efficiently by leveraging a common set of languages that all browsers would respond predicatably (and identically) to. This is obviously an ideal, and we aren’t there yet, but we’re a heckuva lot closer than we were 5 years ago (thanks in no small part to this magazine and its founders). Now that we have these standards, why would we then splinter, and create “personal” standards? Validation in and of itself has little *practical* value. It is a goal I strive for, to be sure, and it certainly feels good (it’s a job well done on large-scale sites, that’s certain). But if the reason I don’t make it is either not under my control (CMS output) or if it makes the site better (as in the custom attributes mentioned in the previous article, or hacks required to overcome browser bugs) I’d rather forgo validation and uphold the common standard we’ve fought for.
    Copy & paste the code below to embed this comment.
  12. Tim’s got an important point about the danger of splintering standards that took a lot of work to get in place.  On the other hand, I see a very practical use for this technique. My team produces e-learning modules.  The practical value of the standards for us is quality control: if every html page in the module validates, that’s one more assurance the client isn’t getting something that’s broken. But: our work requires that we use <embed>.  One of our clients still has some of its employees using Netscape 4.7.  <object> won’t cut it.  So when we validate, we ignore the multiple errors generated by the <embed> tag. That means: every time we validate we need to manually read the error messages and make sure that they came from <embed> and not from something else.  Which means that someday, we’re going to trip up and ignore an error we should have fixed. Here’s where the custom DTD comes in: we validate against a custom DTD that allows the <embed> tag.  When we validate, <embed> is silently passed by, and we know that any errors we see are errors we need to fix.  We’ll likely do the same for the custom tags generated by Macromedia’s CourseBuilder (although placing them in a custom namespace would be more in the spirit of XHTML). That said, it’s still troubling to work this way.  I haven’t tried it out, but I’m guessing that some browsers will remain in quirks mode unless they see one of the public doctypes.  For cross-platform consistency, we need to avoid quirks mode.  For this reason, we may end up validating against our custom DTD but delivering pages with the XHTML Strict or Transitional doctype.  It ain’t perfect, but it’ll do what we need to do for our clients.
    Copy & paste the code below to embed this comment.
  13. Tim Murtaugh wrote: >>>Now that we have these standards, why would we then splinter, and create “personal” standards?<<< It’s not splintering any more than using classes and id:s in regular HTML is. XML *is* the standard, and the whole point of XML (well, one of them) is that you can create your own elements and define your own languages. This is *not* in *any* way against the ideal of common standards. The reason we’ve been taught not to use proprietary markup like <embed> and <marquee> isn’t that inventing elements without W3C’s blessing is evil in itself, but because they weren’t in the doctype. If you write your own doctype, the new elements *are* in the doctype. That’s the only important difference; the XML reader knows how to treat your elements. However, writing your own doctypes is, with a few exceptions like this article, probably of little use if you’re just targetting common web browsers, since the most common of them all, and many others, won’t know what to do with them. “Doctype switching” (browsers changing their rendering depending on the doctype) probably won’t work either. Footnote: Of course, what we refer to as “standards” on the web usually *aren’t* actual standards, but that’s beside the point.
    Copy & paste the code below to embed this comment.
  14. Peterman wrote:
    >Footnote: Of course, what we refer to as “standards” on the web usually *aren’t* actual standards, but that’s beside the point. True. But if The WaSP hadn’t persuaded designers, developers, and browser makers to treat these W3C and ECMA specifications as “baseline standards,” our support for CSS, ECMAScript, XHTML and the DOM would probably be little better than it was in 1998, and we’d be coding our sites 28 ways, to accommodate multiple generations of proprietary browserdom.
    Copy & paste the code below to embed this comment.
  15. What’s the standard here? A built-in DTD or extending one in the intended manner? As long as everything remains internally consistant I don’t see the problem.
    Copy & paste the code below to embed this comment.
  16. Just as an example of a site that do use custom a DTD. The swedish ibm site uses a custom DTD (ibmxhtml1-transitional), the newer american version uses a standard W3C DTD (xhtml1-transitional). When I visit the two different sites in FF and looking at the properties it clearly states what mode the browser is in. So visiting the swedish site FF is in quirk mode. So if using a custon DTD it would make the browser slower and render the pages with quirk mode in mind, right? http://www.ibm.com/us/ 
    http://www.ibm.com/se/ 
    Jens Wedin
    Copy & paste the code below to embed this comment.
  17. I’m no XML guru, but why can’t we do this with an XSD? I’m envisioning something like this: <!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
    <html xmlns:myforms=“somexsd.xsd”>
    <head>
    ...
    </head>
    <body>
    <form>
    <input type=“text” name=“bob” myforms:required=“whatever”/>
    </form>
    </body>
    </html> To me, that would be the optimal approach, since we’re still including the original DTD, just extending it with our own XSD. Is there a reason this won’t work?
    Copy & paste the code below to embed this comment.
  18. Actually, custom Doctypes will trigger standards mode in Mozilla/Firefox (as well as in most other browsers, although haven’t tested this with safari). The IBM doctype in question is listed as recognized by Mozilla and is set to trigger quirks mode. http://www.mozilla.org/docs/web-developer/quirks/doctypes.html On the same page under Full Standards Mode you will see listed: “Any ‘DOCTYPE HTML SYSTEM’ as opposed to ‘DOCTYPE HTML PUBLIC’, except for the IBM doctype noted below”
    Copy & paste the code below to embed this comment.
  19. Coincidentally, I’ve just released a cross-platform graphical tool called GooeySAX that is a wrapper for the Xerces tool referenced in this article. GooeySAX allows you to validate your custom DTDs easily. It is also great for those times when your document is only available on a private network, and thus unreachable by the W3C’s web-based validation tool. http://ditchnet.org/gooeysax
    Copy & paste the code below to embed this comment.
  20. Would it be possible to create a custom DTD that will validate a XHTML document containing Flash?
    Copy & paste the code below to embed this comment.
  21. Try to validate this site: http://bednarz.nl Link to validatation: http://validator.w3.org/check?uri=http://bednarz.nl/
    Funny isnt it?
    Copy & paste the code below to embed this comment.
  22. Hey, Just off the top of my head, removing the ugly “]>” at the top of documents with internal subsets can be done with a little [removed] with (document.body) for (var i=0;i<2;++i) if (/]>/.test(childNodes[ i ].data))removeChild(childNodes[ i ]); Of course, it’s not ideal, but seems to do the job. It loops over the first two elements since different browsers place the characters in different positions. Originally I thought to simply strip the first two body child nodes, but realized that’s more prone to worst-case scenarios. The script can even be placed in the head tags without being attached to an onload event, since it depends on contents that’re already parsed. Should remove the node as soon as it hits the script.
    Copy & paste the code below to embed this comment.
  23. Why go through all this trouble of creating custom stuff? Is your client really ready to pay for all this? Is it justified over time? Will the site structure stay the same long enough for this to work? I think for smaller websites this takes way to much time. Do like the idea though
    Copy & paste the code below to embed this comment.
  24. I’m confused?  Why not use attributes in a different namespace?  EG: <input type=“text” name=“yourName” myns:required=“true” /> That way you aren’t touching the XHTML DTDs at all…
    Copy & paste the code below to embed this comment.
  25. So where can I find a module, library or app that will take some of my database schemas from my content manglement system and generate a valid DTD so I don’t shoot my food clean off (something a bit ... well alot less expensive than XMLSpy ya’know) ?
    Copy & paste the code below to embed this comment.