More About Custom DTDs

by The W3C QA Group

34 Reader Comments

Back to the Article
  1. A real-world example of the interoperability issues discussed here would be the W3C markup validator itself. Take some random German HTML document. Chances are that it’s in ISO-8895-1, ISO-8895-15 or Windows-1251 encoding. All three encodings are not HTML standard and must therefore be made known somehow.
    The best option is always the server, which should send an encoding along with the content type. It may not be able to do so, however, especially on large servers with many accounts for individual users (e.g. Geocities). There, each user might have a different encoding in their files, so the server can’t make assumptions.
    In XML files, the next place is the xml declaration.
    <?xml encoding=”...”?>
    But the file in question is supposed to be HTML, so there’s no such thing.
    The validator accepts a third source, and that’s the meta tag.
    <meta http-equiv=“Content-type” value=“text/html; charset=ISO-8895-1”> Works fine, as long as you use a default HTML DTD. Modify or even extend it with a local subset:
    <!DOCTYPE html PUBLIC “...” “...” [
      <!ENTITY words “some words”>
    ]>
    and suddenly the validator will no longer use the meta data - because the document is no longer HTML.
    Copy & paste the code below to embed this comment.
  2. It seems to me that what Peter-Paul Koch did was to make a compelling reason for other developers to use his techniques. They work well, and if many developers use the same technique it may put pressure on the members of W3C to incorporate those attributes or tags, or to provide a similar way to do the same thing in the next release of a particular recommendation. I’m not entirely familiar with how the W3C is run, but I suspect it is a slow moving beast and its probably quite difficult for a sole developer with a useful idea to encourage any changes from within. By doing what he did, PPK got some developers excited and the W3C nervous. I think the article is well founded, but PPK showed that perhaps if a technique that is philosophically wrong becomes popular, the standards makers will take notice.
    Copy & paste the code below to embed this comment.
  3. Good to hear W3C say validation is not a goal in itself! But was it a warning or an opportunity when the article said: «The content of attributes, for instance, are generally defined in the DTD as being CDATA (more or less equivalent to “any kind of text”) [... ] Nothing in a DTD, for instance, can enforce that the content of the lang attribute must be a language code from RFC1766…»? According to this, and to relate it to Koch’s article about JavaScript triggers one should be able to “invent” things like style=”[removed]triggervalue; and still get validated XHTML documents. And, yes, the W3Validator does indeed valdiate such documents. So, is it a limitation of the validator or a feature of (X)HTML? I tend to think it is the latter.
    Copy & paste the code below to embed this comment.
  4. Very complete and thoroughly thought out article! While DTD’s can be custom defined, it doesn’t mean browsers will support them all. Pretty soon, as each platform is setout, such as on Mobile, PSP, etc, the custom DTD’s will soon become standard.
    Copy & paste the code below to embed this comment.
  5. I wrote a (pretty much unknown) article on using modular xhtml for an internal project several months ago (http://jeremy.marzhillstudios.com/programming/archives/000067.html). It was really useful internally although not so much to the outside world. I used modular xhtml to build a templating engine.
    Copy & paste the code below to embed this comment.
  6. While I appreciate the authors’ point of view and recognize their expertise, I’d like to take a step back and consider the points of the original articles by Mr Koch and Mr Eisenberg. The original Koch article was about separating behavior from structure and style—a desirable W3C objective if I ever heard one. Mr Koch provided a quick way to achieve that W3C objective. But achieving a W3C objective at the cost of validation wouldn’t be right, so Mr Eisenberg stepped in with a way to validate pages that use Mr Koch’s method. The authors of the present article seem to have ignored or discounted the original objective (separating behavior from presentation and structure). Unless I’m being thick and slow, the authors’ alternative (modularizing XHTML) doesn’t seem to address the original objective of separating behavior from presentation and structure. It also seems harder than using JavaScript triggers, at least to me. Just one designer’s opinion.
    Copy & paste the code below to embed this comment.
  7. Look, this article goes completely off the rails when it uses a hypothetical example of a poetry element. (Typical of the W3C to use a hypothetical example rather than solving a problem we actually are dealing with.) Where standardistas want a custom DTD to make an element valid is the case of embed and pretty much nothing else. W3C made a mistake in not incorporating embed into the spec. (It works everywhere but Lynx, while object barely works anywhere. More theory over practice.) We don’t really care that a resulting XHTML+embed DTD won’t be XHTML Transitional or one of the cherished few family members in the W3C. It doesn’t have to be. The W3C doesn’t write all the rules; by the nature of XHTML, *we* can write the rules. Under WCAG Level AA, all we have to do is produce valid documents according to a published specification, not the specification that W3C nabobs want us to use. So why don’t we have a nice tidy article—an *applied* article—from the W3C on how to make embed a valid element in our pages? By the way, I’m at the point where if the only validation error on your page is the use of embed, as far as I’m concerned your page *is* valid because every reasonable device is going to understand it and you’ve already avoided tag soup.
    Copy & paste the code below to embed this comment.
  8. My own interest in this is for adding custom HTML attributes that are recognised by javascript. Essentially, I want to serve my “customised XHTML” to the big wide world, as is.  Should I be using a custom DTD or not? It seems to me that using namespaces would be the most “semantically valid” way of adding such functionality, but these still require a custom DTD to validate correctly. In the light of this new information, what do people think?
    Copy & paste the code below to embed this comment.
  9. The problem is that the W3C is not moving fast enough to meet the needs of web developers, so we’re forced to invent things like <input required=“1” /> to solve today’s problems now. You assert that these extensions are “proprietry” and should be avoided, but fail to credit the previous author for openly documenting his extensions. In fact, these extensions could hardly be considered “proprietry” since the author is not imposing ownership of the technique. Rather, it’s out there, available for anybody to use, and available to any user-agent developer to learn its semantics. Yes, it was a simple XSLT solution you provided. Show me how you would transform <input required=“1” minvalue=“10 maxvalue=“100” /> into standard XHTML and then I’ll be impressed.
    Copy & paste the code below to embed this comment.
  10. err, “proprietary” :)
    Copy & paste the code below to embed this comment.
  11. Sanchez said: «The original Koch article was about separating behavior from structure and style — a desirable W3C objective if I ever heard one.» Koch’s point sounds desirable – and may be is. But what’s the diff between behaviour and style? Much done through JavaScript manipulations can be done with CSS. So why call it ‘behaviour’ when it uses JavaScript instead of CSS?
    Copy & paste the code below to embed this comment.
  12. Personally I don’t know if custom DTDs and namespaces make sense when you could just use an XML document with a proper schema or relaxNG and go nuts on your attributes elements. This would also allow you to create proper sections of the document rather than relying on H1-H6 to introduce but not encompass your sections.
    An XSL transformation can spit out the proper xHTML, a JavaScript with the right validation and instruct the backend how to work out the fallback validation should JavaScript not be available.
    Putting our own inventions on the backend will also make it safer - if I can see all required information in the markup I have a chance to spoof them a lot easier. By the way - Interesting spot for a typo:
    “It does, however, remind us of a very important point: validation only one step in checking the correctness of” Correctness is achieved by adding andother “is” after validation :-)
    Copy & paste the code below to embed this comment.
  13. another of course, not andother
    Copy & paste the code below to embed this comment.
  14. CSS cannot create and remove elements from the document. That is what the DOM is for.
    Visual changes are not the same as changes in the structure. JavaScript also allows you to test a lot more than CSS even with the CSS3 selectors allows you to. How would you validate a form entry in CSS? This is what behaviour is about.  
    Copy & paste the code below to embed this comment.
  15. There are some examples around, I gather, of elements created through pseudo-elements and generated content. And for CSS3 selectors you yourself note similarity. More advanced, but still much of the same. View it like that should have implications for how to continue. New content [new CDATA] for e.g. the style attribute would then be quite logical: new css attributes instead of new (X)HTML attributes. E.g. style=”-jsrequired:‘true’;”. Valid XHTML but invalid CSS, until the new CSS attributes are accepted. Is that so bad? Eventually you could create a /* behaviour */ section in your stylesheet.
    Copy & paste the code below to embed this comment.
  16. This is a bit off topic, but given any apparent confusion of CSS with behavior, perhaps ALA should look into publishing an article on why :hover is a bad idea (or not) in CSS. A lot of people complain that IE doesn’t support :hover for anything but a link but shouldn’t we be looking at javascript for that anyway? What does the W3C have to say about this? (I’ll admit, I haven’t looked)
    Copy & paste the code below to embed this comment.
  17. While I agree with most of your post, the use of ::before and ::after, as well as the (poorly supported) content attribute do allow for the creation and modification of simple elements (anything complicated will likely break cross-browser compatibility). Opera’s content attribute is faily powerful and I believe Opera is closest to achieving CSS counters (if not alrady achieved) thanks to it. That said, yes, DOM manipulation is much more reliable, as all browsers have their respective problems with CSS generated content (for example, I stayed up until 2:30 last night trying—and failing—to figure out problems with ::before and ::after in Opera. I used them to generate content in my default Style Sheet, and in another Style Sheet that is only loaded when JavaScript is available they were set to display: none;, but Opera would not remove these elements). Or am I missing something here?
    Copy & paste the code below to embed this comment.
  18. One person solving a real world problem and then exposing it to the world is good. 15 people solving the real world problem exposing it to the world is confusing. The W3C is only cautioning you that Custom DTD’s might make your page validate but won’t necessarily make you page readable by your targeted audience. If a useable implementation already exists then use it. If one doesn’t then yes build one. Custom DTD’s are not a panacea though. That’s why standards are important. It increases the chances that your document will be readable by the widest range of client platforms.
    Copy & paste the code below to embed this comment.
  19. «Behaviour» seems to be simply another word for «script» — or is it a particular way of using the scripts? Anyway, Koch got his inspiration from the way CSS stylesheets work. And to use «script selectors», seems very much in line with the thinking behind CSS stylesheets. To introduce new markup attributes —even introducing more than one such markup «behaviour» attribute instead of having just one «behaviour» attribute with many content (CDATA) options— in my mind is not in line with the thinking behind CSS stylesheets. That is how I evaluate Koch’s article against the background fo W3’s article.
    Copy & paste the code below to embed this comment.
  20. The fact of the matter is that support for arbitrary attributes is totally supported by (X)HTML - in the sense that all browsers have supported it since the DOM was introduced. The fact that the standards have NO clear way of recognising this is a huge shortcoming.  My own solution is simply to ignore those validation errors; this is hardly ideal, particularly when we are trying to promote our CMS as standards compliant.
    Copy & paste the code below to embed this comment.
  21. It’s not clear to me why creating custom attributes for scripting is bad. They are just hooks to simplify JavaScript scripts. Web browser rendering engines don’t need to do anything with them since only the scripts will pay any attention to them. I think :hover is fine. It’s being used to change styles when an event occurs. The lines are blurred between presentation and behavior anyway since JavaScript can make modifications to CSS through the style and styleSheets objects. P.S. The quote from the HTML 4.01 Specs is here <http://www.w3.org/TR/1998/REC-html40-19980424/appendix/notes.html>.
    Copy & paste the code below to embed this comment.
  22. The line between presentation and behaviour *can* be blurred, if you choose to do so.  The whole point about the “is :hover bad?” line of arugment is that they *shouldn’t* be. i.e., don’t use javascript to modify the style attribute / styleSheets object.  Use javascript to manipulate className instead.  Doing this will *prevent* the line from being blurred. Regarding hover, however, this isn’t a case of CSS implementing behaviour, so much as the browser coming with some behaviours built in.  As such, :hover isn’t I mean, ultimately you’re going to have li:hover or li.hover, which aren’t substantially different. Of course, IE6 support is still important :P
    Copy & paste the code below to embed this comment.
  23. I want the “target” attribute back so I don’t have to fuss with javascript to open new windows.
    I also want the “wrap” attribute in textarea tags because it defines how to do line-wrapping and there are at least 3 ways the browsers deal with it! IMarv
    Copy & paste the code below to embed this comment.
  24. They are all in there.
    Copy & paste the code below to embed this comment.
  25. I’m so happy to see ALA back with a new article.  Keep ‘em coming!
    Copy & paste the code below to embed this comment.
  26. (i) respect the work of the w3c, yet wish that we’d gain a little ambition and a greater vision of what together we sellers/users/actual custodians can deliver the world. Tags and attributes are nothing if not identity attached to various well defined semantics, but clearly it is a schoolboy error to forget that identity exists, and needs to exist, seperate from defined semantics. Arbitrary attribute, prefixed with O_, ie. O_required=“1” would solve they’re philosophical interoperability wobbles. We thousands demand you fix the spec by tomorrow. Go on then. I have nails and walls and a hammer. You are the masters of my windows and doors. Tell me I can’t bang these in! Tell me I can’t hang my picture! Anon stomps off grumbling with a murderous look in his eyes.
    Copy & paste the code below to embed this comment.
  27. *Their* just in it to lie with the i’s
    Copy & paste the code below to embed this comment.
  28. Just a few words about the obvious confusion between behaviour and presentation. It is wrong to say that CSS=presentation and JS=behaviour. Both CSS and JS are tools to make the browser do what you want. Behaviour and presentation are concepts. It is true that CSS is intended to define presentation, and that JS is most suited for behaviour. Does that mean, though, that everything else is anathema? Am I not allowed to introduce a little style through JS? The important part is separation, not language binding. If I make a behaviour.js and a presentation.js, I have properly separated the two concepts. I probably have also done a lot of unnecessary work, such as trying to monitor changes to the class attribute so that changes from the behaviour side will reflect in the visible page. In other words, you should use CSS for presentation because it is better at it than JS. Similarly, you should use JS for behaviour because it is better at it. There was, for example, a W3C effort called XML Events. (I believe the effort has stopped due to lack of interest.) It allowed binding of events by using XML tags instead of some script language. Was that wrong? Isn’t it said that markup is for structure? No. The main document is for structure. Putting XML Events tags into the main file would have been wrong, because it would have mingled structure and behaviour. Using, for example, XInclude to separate the XML Events stuff would have been fine. Another example: SVG. Web sites could look extremely cool if we used SVG for their layout. With the advent of native SVG support in Mozilla and Opera, this even becomes possible. Is it a good idea, though? Sure, why not? Just because SVG is an XML language doesn’t mean it can’t be used for presentation. The problem is once again in that the naive approach would be to mix SVG elements into the main XHTML document. Mixing of structure and presentation. Bad. The key in this case would be to use a technology such as XBL to separate the SVG out of the structure. And we’re fine once again. All this has very little to do with the article in question. The article is about when to and when not to create a custom DTD. I think the point of the article is that Peter-Paul Koch’s idea of custom attributes is good, but creating a DTD just to make them validate is not. The point is that, due to the document no longer being HTML (not even invalid HTML), the downsides outweigh the questionable benefit of correct validation. Use your custom attributes. (As long as they don’t go into the behavioural area - in that case, bind from the JS side.) Just make sure that these attributes do not collide with possible future HTML attibutes: give them unique prefixes, or if you’re using XHTML, put them into their own namespace. Let that namespace refer to your own site, so that nobody is tempted to copy it. In recognition of Appendix C, make the namespace prefix reasonably unique as well, so that HTML parsers won’t get confused. Your document may no longer validate. But it is understood by everything but validating XML parsers. And quite frankly, these are not of much concern in the reality of the web.
    Copy & paste the code below to embed this comment.
  29. What this article tells me is that the W3C are scared that people will not use the ability to create custom DTDs properly, and thus put up big neon “this is bad” signs. Custom DTDs in the real world are like guns. If they are used carefully and responsibly then they present no threat, but when used without thought they have the potential to do a lot of damage. If you are going to create custom attributes, then think about what will happen to the user’s experience in user agents where the custom attributes won’t work. A good example of this would be JavaScript validation attributes in browsers that don’t support JavaScript. Will the absence of support prevent use of your site? If it doesn’t then go ahead and use those custom attributes. If absence of support breaks the site then you’d better rethink what you are doing. Now, if you’ve used custom attributes then you’d better tell the browser you are doing so and what it can expect to see. This is the purpose of DTDs after all, to tell programs reading our document what it can expect to find there in terms of structure. This includes both validation programs and browsers, even if the popular ones don’t actually bother to look at it. Of course the document won’t strictly be a 100% W3C standard XHTML document, but if you construct the DTD correctly it will still, for all intents and purposes, still be XHTML. Don’t just use random prefixes for your attributes, put them in a seperate XML namespace title appropriately. Your company name would be a good namespace. This seperates your additions cleanly from XHTML with no threat of conflicting with anything and has more meaning to it than random prefixes. If a lot of people are using the same attributes which have *exactly* the same meaning, then it would be wise to use the same namespace (but still seperate from the XHTML namespace) as it would identify to supporting user agents that they can expect the same thing from all these uses of the attribute. Ultimately the way forward is treat extensions to W3C standards like treading on egg shells. It can be done safely, but only if you are very careful not to break anything. ;-)
    Copy & paste the code below to embed this comment.
  30. What people seem to keep missing is that behavior means a change of the information immediately available to the user, while style means the presentation of all information. Style should not affect the way the information is perceived, only give an aesthetically pleasing surrounding for it. As an example, a hover button with a different-colored background is not behavior, but style—no matter whether you do it with CSS or JavaScript. A drop-down menu with options is behavior, as it presents more information than the original view. Someone is going to ask “Well, smartypants, where’s the line between content and behavior then?” The line is that behavior controls the displayed content i.e. information. In the most degraded form (as in no css, js, etc for web pages) of the media, all content that can be made available through behavior should be shown. (Disclaimer: These are all my own thoughts and strictly IMO, but if you feel like giving constructive criticism, feel free.)
    Copy & paste the code below to embed this comment.
  31. If you want to add new tags and attributes, take a look at WAHTWG specs (proposed to W3C): http://whatwg.org/specs/web-apps/current-work/
    http://whatwg.org/specs/web-forms/current-work/ ...at least you can have some (non-)standard for your non-standard elements.
    Copy & paste the code below to embed this comment.
  32. This is preposterous… XML means eXtensible Markup language. As in you should extend it and create new vocabularies accroding to it’s syntax rules to fit your business need. To suggest that we should not use XML to do the things XML was created for just doesn’t make sense.
    Copy & paste the code below to embed this comment.
  33. Great! let us be thankful for XHTML’s modular design, oh the wisdom, and add a module describing a flag attribute using our own DTD. It’s not-trivial, we are told. Not-trivial, repeat that. And then hear the truth. “It’s not trivial either because we’ve obfuscated it, or because there isn’t a teacher/writer/actualuser amongst us”. “But at least its jobs for the boys for a few years yet.” Practical accessible reading material on the subject appears not to exist. Give up now and save yourself some time. Some messiah will come along with the subject on a stick one day, maybe. You lucky lazy jobless web mechanics. The use of language belies the w3c as technocratic elitists more inclined to ponder their own cleverness than encapsulating XHTMLs logic for anyone with more than one thing to do. Of course it’s astonishingly simple when you know how, but don’t worry your pretty little heads about it.
    Copy & paste the code below to embed this comment.
  34. ALA Writers, I have read this web site religiously since the late 90’s and I am a believer and follower of web standards. As I do understand some of the lingo and jargon in this article, some of it is just alien to me, and thus i open and quickly close the document. How techy does an article really need to be. Content is key, and if you want to really educate people, start talking to them in a manner that a broad range of people can understand, and not just us code geeks. Sure this site is a great resource, but is it really at its full potential??  I reccomend comparing the time a user spends on a page that is easy to understand vs. pages like this. I am sure there is some drastic results. Thanks,
    Anthony Armendariz
    Partner/Creative Director
    Design Dialogue
    Houston, TX
    Copy & paste the code below to embed this comment.