Great article and great discussion. I feel like I’m in a conference workshop with some of the most insightful web monkeys in the world.
I like the concept of creating mechanisms over widgets as a philosophy in general. They enable. They allow for creativity, individuality, and cleverness that might never have existed otherwise. Its the difference between building a Lego city with roads for your Hot Wheels rather than cruising Need for Speed. Its knowing how to sear a pork chop, cut and saute an onion and make a pan sauce with some leftover white wine vs. following a recipe step by step (and having to read it a few times over to make sure you’re doing it just right).
I do design and front-end development solely, working around CMSs and back-end logic to make things look pretty. So here is a perspective from someone who is less inclined to care much about DTDs, schemas and whatnots.
I’ve written enough sites where I have to ID divs as header, content, footer, nav, sidebar, etc, but I don’t always. I’ve done plenty of top-level, secondary, and tertiary navs, but I’ve also designed sites that were 10 pages that could have been 1 (if the client was hipper, it would have been), where the concept of navigation would be meaningless. I strive for as valid and accessible as possible.
So what good is it to add tags that I don’t need to use all the time? What problem are we solving? I see canned code being handed out instead of taking the time to learn how you can get a lot done just by tossing in some IDs and classes.
Let’s standardize the vocabulary, sorta how microformats is doing. Its data about data and its ready to go. Build smarter parsers.
That dataset attribute in HTML5 is about the only thing I’m really excited for.
Oh, and the argument that new tags clean up divitis is a terrible one; you’ve cured one ailment and injected tagitis. At least a div is a div and when I think its special, I can add an ID to it.
Firefox 2 seems to be able to style HTML5 elements just fine when being served XHTML. That’s what we at “Shepherd Interactive”:http://shepherd-interactive.com/ have done on our own site and on client sites like “ReBath of Oregon”:http://rebathoregon.com/ and “CCAA”:http://theccaa.net/
Copy & paste the code below to embed this comment.
Benjamin Hawkes-Lewis
“Eric Fields”:http://eric@ericdfields.com/ asks
So what good is it to add tags that I don’t need to use all the time? What problem are we solving?
It’s a good question. :) If you look at the “WHATWG process for adding new features to HTML5”:http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_the_spec.3F , you’ll find that contributors are supposed to begin with a user problem not a technical solution. For example, a recent contributor suggested an ‘author’ element for use in citations without explaining what user problem it would solve, and so was told to go back and explain the actual problem. I believe the new structural elements do solve important user problems, but if you disagree, there’s also a “process for removing current features from HTML5”:http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_removing_bad_ideas_from_the_spec.3F .
Every site doesn’t need to feature a semantic for it to be in common use and benefit from explicit markup: not every site has a table, but I’d assume you wouldn’t suggest replacing table markup with class and ID names? These semantics for the new structural elements are in common use, as the “Google Web Authoring Statistics”:http://code.google.com/webstats/2005-12/classes.html show.
‘h’ solves the problem of expressing a seventh-level heading in HTML (I’ve run across this problem once or twice). It also makes it easier to copy and paste code from one place to another (since you don’t have to adjust the heading numbers) or to allow contributors to your site to include headings in posts, comments, or wiki articles without hard-coding particular heading levels.
One simple use-case for the other new structural elements (‘header’, ‘footer’, ‘nav’, ‘article’, ‘section’, ‘aside’, ‘dialog’) is for user-agents to provide keyboard or verbal shortcuts for moving around the page quickly. For example, marking up your navigation with a ‘nav’ element means that user agents can implement a more reliable “Skip to Content” command (at the moment they can only guess where content begins by looking at link density, looking at visual blocks, or looking for common class and ID names). Rather than remembering what heading level this site uses for article titles, you can simply use the ‘Next Article’ command. Conversely, when you have finished going through the site, you can use a ‘Jump to Navigation’ command to explore the rest of the site. These represent substantial improvements for people with mobility or visual disabilities.
Another simple use-case is user-agents being able to supply alternate presentation options or users being able to customize their user experience directly. Rather than styling an array of class and ID names, you can style ‘nav’ as a dropdown menu. You can rely on ‘title’ to tell you where you are, hide the main ‘header’ and main ‘footer’, and move the main ‘nav’ to the bottom of the screen to put content first.
Yet another simple use-case is making it easier to spider content. For example, if you want the articles from a site that doesn’t syndicate, you can simply extract each ‘article’.
Let’s standardize the [class and ID] vocabulary, sorta how microformats is doing.
Depends what you mean by “standardize”. HTML5 initially tried to standardize certain class names such as a ‘copyright’. However, it turned out that sites actually use the class in different ways; microformats tend to rely on opaque ancestor classes like ‘hcard’ or ‘hatom’ to distinguish them from similar class sets. The microformats community said they didn’t require standardization of class names. So predefinition of class names was dropped from the specification.
I’m all in favour of extensibility via microformats, but the microformats community is a spec-writing not a standards organization. Microformats are never going to be a requirement for writing conforming HTML; they are never going to be taught as an intrinsic part of the HTML standard. If we want a standard encoding of a semantic to be used as widely as possible – if it serves a common, important user need – it must be an HTML element or attribute in the HTML standard.
For many specialized vocabularies, what’s needed is simply the ability to add specific semantics through the addition of elements, attributes or attribute values which can then be presented with a default stylesheet the spans all media types. So the specialized vocabulary (links, vector graphics, mathematics, user interface) would still be developed in a centralized way. However, the distributed extensibility would take place in a way that made significant re-use of those abstracted vocabularies for other more concrete vocabularies.
Hmm. Assuming that SVG and MathML are provided with text/html serializations, can you give a concrete example of the distributed extensibility you mean and how it would be more usefully, efficiently, and accessibly implemented using an XML element than a set of HTML ‘class’ and ‘content’/‘equivalent’ attributes?
(Incidentally, I don’t really agree that a ‘datagrid’ is simply another presentation of a list; I think it actually represents an editable dataset; but that’s not crucial to this discussion.)
Copy & paste the code below to embed this comment.
Benjamin Hawkes-Lewis
I wrote:
“˜h’ solves the problem of expressing a seventh-level heading in HTML (I’ve run across this problem once or twice). It also makes it easier to copy and paste code from one place to another (since you don’t have to adjust the heading numbers) or to allow contributors to your site to include headings in posts, comments, or wiki articles without hard-coding particular heading levels.
Ahem. Some little birds have just reminded me that there is no ‘h’ element in HTML5 (that’s XHTML2). Instead the heading algorithm has been redefined to allow the combination of ‘section’ and ‘h1’ to solve the problems I mentioned. Sorry for the confusion. See the “draft spec”:http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#headings-and-sections for details.
“If we lay aside the considerations that HTML5 includes “˜canvas’ and may include a text/html serialization of SVG in the future, and pretend that it also included a “˜content’/‘equivalent’ attribute, I don’t see why it would be an order of magnitude harder to implement a language with the same meanings and functionality as SVG in HTML plus CSS plus JS than in (a) HTML plus attributes for different semantic types, or (b) a brand new XML language. I think all would be sort-of possible, but none would be performant (element per pixel anyone?). I think what made SVG practical was user agents implementing the necessary high-performance drawing code; it’s not an example of distributed extensibility.”
You’re conflating the canvas element and SVG and the two share nothing other than they are graphical, and can be used in web pages.
And when you say one can emulate SVG with HTML, CSS, and JavaScript, I have to assume you haven’t worked with SVG overmuch. No insult intended, but there is a large difference between a vector markup with support for declarative animation, and a script-based bitmap element. Just turn off JavaScript to see what I mean.
And when you say it’s not an example of distributed extensibility, again, I’m not sure where you’re coming from. The concept of distributed extensibility has nothing to do with semantics, or graphics for that matter, and everything to do with incorporating a capacity for change without having to modify the underlying parser, and mitigating the effects of naming collisions through the use of some sort of namespace mechanism.
With XHTML, I can incorporate SVG inline, RDFa, MathML, and whatever other extension I want to incorporate at some future time. Currently I use SVG for design, and RDFa for semantics.
The browser may process the data, as most do with SVG. Or it may not, as most don’t with RDFa. That’s not important. What is important is that when a new extension comes along that’s formatted in whatever format is valid for browser consumption, it doesn’t have to go to committee. Doesn’t have to be somewhat merged into the underlying data domain of the web page specification. Can be used immediately.
With this concept, not only is the underlying page markup kept clean, and as simple as possible, we don’t have to wait years in order to make use of the extension. Most browsers don’t do anything with RDFa, but there are Firefox and other extensions that can use it.
I use XHTML, but it has come with a cost in time, because it’s unforgiving. I’m willing to make the time, but a lot of people are not. Unfortunately some browsers like Firefox take the concept of returning an XML error literally, and provide the most awful error page in the world. Others, though, like Opera and Safari, return errors that are much more helpful. That’s why many of us were hoping that we could have the best of both worlds—the forgiveness of HTML, with the extensibility of XHTML.
(And it’s hard to write all of this in dinky comment box, so apologies in advance for typos.)
Copy & paste the code below to embed this comment.
Benjamin Hawkes-Lewis
And when you say it’s not an example of distributed extensibility, again, I’m not sure where you’re coming from.
If all you’re saying is that vector graphics markup is a useful feature to have in any web technology stack, then I entirely agree as do the people who want a text/html serialization of SVG to be included in HTML5.
I was trying to use SVG as an example of the practical limitations of distributed extensibility. Just to recap, my argument in a nutshell is that:
1. Neither HTML nor XML (plus JS plus CSS plus ARIA) yet give you distributed extensibility that provides meaning or functionality to users of user agents not specifically programmed with an interface for those meanings or functions, except for such meanings and functions as can be derived additively.
2. Nobody’s identified any practical advantages – from a pure distributed extensibility perspective – that XML plus JS plus CSS plus ARIA has over HTML plus JS plus CSS plus ARIA.
SVG is an example of a technology that might have been possible, but would not have been very practical, for developer communities beyond the standards organizations and consuming software vendors to have designed and implemented in webpages in popular browsers using HTML, JS, and CSS, or using XML, JS, and CSS. You’ve pointed to just one of the problems that would have entailed: dependence on publisher JS.
Today, SVG has already been implemented – by browsers and plugins. Perhaps accessible 3D environments might be a more current example of a technology which are not very practical in an XML language without implementation by consuming software.
With XHTML, I can incorporate SVG inline, RDFa, MathML, and whatever other extension I want to incorporate at some future time. Currently I use SVG for design, and RDFa for semantics.
The browser may process the data, as most do with SVG. Or it may not, as most don’t with RDFa. That’s not important. What is important is that when a new extension comes along that’s formatted in whatever format is valid for browser consumption, it doesn’t have to go to committee.
“¦ That’s why many of us were hoping that we could have the best of both worlds—the forgiveness of HTML, with the extensibility of XHTML.
But you haven’t established that text/html is less extensible than XHTML! You’ve just pointed out that some XML languages have some functionality (vector graphics and mathematical typesetting) implemented in browsers that text/html doesn’t.
XML formats don’t have vector graphics or mathematical typesetting abilities becauseXML is better for distributed extensibility, but because W3C only defined XML serializations of MathML and SVG. W3C once drafted “mathematical extensions for text/html”:http://www.w3.org/MarkUp/html3/maths.html and Microsoft originally proposed and implemented a “vector graphics language using XML data islands within text/html”:http://www.w3.org/TR/NOTE-VML.html . If the text/html serialization of HTML5 included SVG and MathML, as some people want, that difference would disappear. Note that the current draft includes “embedded MathML”:http://www.whatwg.org/specs/web-apps/current-work/#mathml and “embedded SVG”:http://www.whatwg.org/specs/web-apps/current-work/#svg .
Depending on your point of view, draconian error handling of XML is either a major benefit (‘Yay it’s stricter’) or a major cost (‘but now my site is broke’). If you think, it’s a major cost, you might be interested in the “XML5 project”:http://code.google.com/p/xml5/ .
text/html parsing will likely always be more complicated than XML parsing, although by the time HTML5 is finished it will hopefully be better specified. But this complication is something you don’t need to worry about if all you’re doing is adding class names and hidden values.
Formats serialized via HTML5’s existing extension points plus ‘content’/‘equivalent’ wouldn’t have to “go to committee” either.
My issue is not about whether all these new formats are “valid”; but whether they actually work for end-users. It seems to me they run a serious risk of making the accessibility problems with machine data in microformats look like small potatoes.
Copy & paste the code below to embed this comment.
Rob Burns
Hmm. Assuming that SVG and MathML are provided with text/html serializations, can you give a concrete example of the distributed extensibility you mean and how it would be more usefully, efficiently, and accessibly implemented using an XML element than a set of HTML “˜class’ and “˜content’/‘equivalent’ attributes?
I’m glad you asked. I mean to provide some examples in the earlier post and forgot. What I’m thinking of with distributed extensibility is various specialized disciplines providing extended semantics on top of HTML. For example a society of poets or musicians or astrophysicists might add new elements, new attributes and new attribute values to express the semantics particular to their respective specialties. In addition, the society of Einsteinian astrophysicists might develop a vocabulary of elements, attributes and attribute values that uses similar or identical names to the quantum physicists. Any of these vocabularies can make use of SVG, MathML, XLink, XInclude, XForms and other behaviors provided through other vocabularies.
The idea then is that we not only have a namespacing mechanism (which is very cumbersome when relying on class names), but that we have a rich vocabulary of abstracted behaviors for namespaced vocabularies to draw upon. That means any arbitrary vocabulary can include vector graphics, hyperlink activation, frames / split views, mathematics, embedded content, and user interface widgets. Moreover, with a better extensibility mechanism new extended abstracted behaviors could be added to the host language without spending years or decades waiting for a new HTML recommendation. None of this is possible with the HTML5 approach.
However, with a namespace mechanism (and one has already been implemented in IE for text/html), and abstracted and namedspaced behaviors, new semantics can be added to the host vocabulary (and the host serialization) while using ARIA and CSS to provide any accessibility and presentational mappings. Again, this cannot be accomplished by simply adding MathML and SVG to text/html as the only two anointed behaviors in HTML.
(Incidentally, I don’t really agree that a “˜datagrid’ is simply another presentation of a list; I think it actually represents an editable dataset; but that’s not crucial to this discussion.)
Certainly datagrid is editable, but I don’t think we should start adding separate elements for editable and read-only semantics for lists or anything else. We already have an attribute to change any element to an editable element. and if lists have a need to fine-tune that editing it would be better handled through an attribute (another excellent example of the point this article makes). Another example is OUTPUT which is borrowed from XForms and XHTML2. This is really a read-only version of the INPUT element that has a different presentation in its read only state. Is there really a need to add more elements to the vocabulary for that subtle distinction? I think not. A read-only attribute could work in that case (perhaps on other UI controls too). For example a METER and PROGRESS element are two more examples already included in the article where the UA provides a graphical presentation of a proportion or fraction. From a device independent point of view, these elements are the same thing. This functionality really belongs at the presentation level in CSS or other such specifications.
With regard to Eric Fields’ remarks in comment @50/51, we should really be working towards the goal where Eric does all of his front-end work in CSS (supplemented by SVG, bitmap graphic editing and an occasional XSL). There may be some need to reorder the backend elements though this could be done using XSLT as another front-end tool. Eric doesn’t want to take the time to understand the semantics of the back-end vocabulary and that’s fine. It is an excellent division of labor. However, its important to understand that using a DIV and giving it an ID does not substitute for the diverse device-independent, accessible and abstracted capabilities of the specialized elements and attributes. But as a front-end author, Eric shouldn’t even need to deal with those issues.
Copy & paste the code below to embed this comment.
Rob Burns
Depending on your point of view, draconian error handling of XML is either a major benefit (”˜Yay it’s stricter’) or a major cost (”˜but now my site is broke’). If you think, it’s a major cost, you might be interested in the XML5 project .
I don’t really think that’s the fair way to put this. The major benefit of XML error-handling is that errors are discovered immediately (a relief) rather than after deployment (embarrassing). Also, any error-free element can be made a child of any other error-free element without creating any new errors (at least well-fomedness errors which is what this discussion is referring to).
if (say) the Swiss society really cannot agree on a common vocabulary with the American Astronomical Society, who naturally prefer
american-astronomical-society:wave-variety
- but I don’t want to debate the aesthetics, only the technical potential. The key point here is that you agree that it is possible to ‘namespace’ with class names alone, just as JS libraries manage with simple formulations like ‘YAHOO.util.Dom’ to ‘namespace’ variables and functionality, even if you feel it is “cumbersome”.
“¦ but that we have a rich vocabulary of abstracted behaviors for namespaced vocabularies to draw upon.
This seems fundamentally the same as Shelley Powers’s argument in favour of distributed extensibility using XML: you want to make use of functionality only available in XML languages. However, if you can make use of the same functionality (vector graphics, mathematics, sophisticated forms) in text/html, then that difference disappears – especially if when this functionality finally appears as native features in IE (the browser most people use) they are available in text/html.
Moreover, with a better extensibility mechanism new extended abstracted behaviors could be added to the host language without spending years or decades waiting for a new HTML recommendation.
Who is going to specify these “new extended abstracted behaviors”? Who is going to implement them? Who is going to ensure they safeguard security and accessibility? How are pages using them going to degrade gracefully in user agents that have not implemented them? Why should “a new HTML recommendation” take significantly longer to produce than specifications for “new extended abstracted behaviors”? In the meantime, why couldn’t these “new extended abstracted behaviors” be attached using the HTML5 text/html extensibility mechanisms plus ‘content’/‘equivalent’, rather than attached to XML elements?
Copy & paste the code below to embed this comment.
Rob Burns
This seems fundamentally the same as Shelley Powers’s argument in favour of distributed extensibility using XML: you want to make use of functionality only available in XML languages. However, if you can make use of the same functionality (vector graphics, mathematics, sophisticated forms) in text/html, then that difference disappears — especially if when this functionality finally appears as native features in IE (the browser most people use) they are available in text/html.
That is certainly not what I am saying. I’m not sure about Shelley Powers. I want to see the same (or similar) namespace extensibility mechanism brought to text/html. It is the WhatWG that opposes this.
Yes, I do agree that class names can express new semantics. However, why are we sitting around trying to justify the increasing cumbersome process (using class names and negotiating potential conflicts for authors who want to draw on two different vocabularies). We already have an XML namespace solution that IE has largely implemented for text/html. And as you say this it he browser used by the majority of users. So why aren’t we inviting the other browsers makers to implement XML namespaces in text/html and then authors can use them whichever serialization they choose to use. So what is gained by using class names instead of the much more elegant and much more flexible solution of namespaces (XML or otherwise).
Incidentally this again speaks to the predatory monopolistic practices I spoke about before. Why would anyone in their right mind be quibbling over serializations? Who cares whether it is XML or text/html? Well the reason these issues are up for quibbling is that some predatory monopolies want to make it difficult to develop these standard format which they do not own (which also partly explains why it takes decades by the HTML5 editor’s own estimation to develop an incrementally updated standard ).
Who is going to specify these “new extended abstracted behaviors”?? Who is going to implement them? Who is going to ensure they safeguard security and accessibility? How are pages using them going to degrade gracefully in user agents that have not implemented them? Why should “a new HTML recommendation”? take significantly longer to produce than specifications for “new extended abstracted behaviors”?? In the meantime, why couldn’t these “new extended abstracted behaviors”? be attached using the HTML5 text/html extensibility mechanisms plus “˜content’/‘equivalent’, rather than attached to XML elements?
The idea behind distributed extensibility is that any community of authors could implement a new vocabulary. Any author can then opt to join in that community and make use of that vocabulary and mix it with other vocabularies without any concern for conflicts. By improving text/html paring, we will generally not have the same graceful degradation problems we have today (where element’s will not even parse correctly). I’m not entering into a debate over which serialization an author should use. However, XML namespaces is now widely implemented (in every major browser except that IE implemented it for the HTML namespace only in the text/html serialization and all of the other browsers support the HTML namespace only in the XML serialization). We need to make it available in either (or any) serialization and allow author communities to make use of it. Authors and authoring communities would still have the option to use class names and other mechanisms, but my guess that given the choice and widespread interoperability, they would choose to use XML or XML-like namespaces.
As for the abstracted vocabularies, my sense is that most of what we need has already been provided by the W3C. We just need broader implementation of those recommendations. I think more could be done with CSS so that we reach the goal I suggested before where front end work is done almost entirely with CSS, SVG, and bitmap images and semantics are properly handled by the rest of the recommendations (HTML needs to be rounded out a bit for semantics too). So this means: 1) incrementally better CSS, 2) incrementally better HTML, and 3) better text/html parsing algorithms. With that much of these debates over serializations or extensibility mechanisms, etc. would all be moot (though we’d undoubtedly have something else to discuss).
Copy & paste the code below to embed this comment.
Benjamin Hawkes-Lewis
Yes, I do agree that class names can express new semantics. However, why are we sitting around trying to justify the increasing cumbersome process (using class names and negotiating potential conflicts for authors who want to draw on two different vocabularies).
I think you can solve most of the vocabulary-mixing problem without adding new features to HTML5 by using prefixes, just like JS libraries do, and sharing information about what class names you are using with the rest of the web community.
What I wanted to verify is that new features are not absolutely required for distributed extensibility of vocabulary, and from what you’re saying it seems they aren’t.
It’s perfectly reasonable to go on from that conclusion (as you do) to argue for XML-namespaces-in-HTML, but on the basis of trying to automate vocabulary isolation when vocabularies are mixed, rather than on the basis of actually enabling distributed extensibility of vocabulary.
I don’t have that much enthusiasm for that argument, partly because I’m not sure XML-namespaces-in-HTML are the simplest way to implement such ‘namespacing’ and partly because I think other issues are more urgent, like the omission of a generic machine-data attribute from HTML5.
We already have an XML namespace solution that IE has largely implemented for text/html. And as you say this it he browser used by the majority of users. So why aren’t we inviting the other browsers makers to implement XML namespaces in text/html and then authors can use them whichever serialization they choose to use. So what is gained by using class names instead of the much more elegant and much more flexible solution of namespaces (XML or otherwise).
Isn’t one gain that class names can be parsed by, styled, and scripted in all current popular user agents, whereas XML-namespaces-in-HTML can only be parsed by one very popular current user agent? (The article proposed HTML5 should add features using attributes rather than elements for precisely this sort of backwards compatibility reasoning.)
As for the abstracted vocabularies, my sense is that most of what we need has already been provided by the W3C.
But if we really don’t need to add new behaviors to the XML behavior set, and implementing vector graphics, mathematical typesetting, and more powerful forms in text/html removes the functionality gap between text/html and the XML world, then the subsequent speed of adding “new extended abstracted behaviors “¦ to the host language” isn’t an important consideration when asking what we need to enable distributed extensibility – since all we really want to extend is vocabulary without changing interface.
Copy & paste the code below to embed this comment.
John Allsopp
Firstly, thanks for the thoughtful, detailed responses, and apologies for being so slow to participate in the conversation. It’s incredibly gratifying after having put considerable effort into a piece to have such an intelligent and in depth conversation emerge from it. Aboveall the goal of the piece was not to prove the point I was making and, but rather start an important conversation that is not taking place and which I think should be.
Some responses to the intelligent various thoughts, observations and so on.
To Jeremy Keith – thanks for the JS workaround. I became aware of that toward the end of the process of putting the article together. I think the general position of my argument holds regardless - having to use JavaScript in this way is not a general solution to the problem.
Here to me is the key problem (and I clearly didn’t articulate this nearly well enough in the article, as it is the crux of my focus on the importance of backwards compatibility).
Technologies flourish when adopted by developers, and die when not. If you look at the “chasm” model of technology adoption, often technologies appear to take off like wildfire - among early adopters. Where technologies really struggle is with their adoption by mainstream users – and the way in which mainstream adopters decide whether to adopt a technology is very different from the early adopters – early adopters are experimenters, they like to try cool stuff, see what works, and so on. Mainstream adopters simply aren’t like that. They are far more pragmatic. The speed and even the extent to which a technology is taken up by by early adopters doesn’t correlate with it’s “crossing the chasm” to mainstream adoption.
With web technologies (and here CSS is a very interesting and relevant prior example) a key determinant of their adoption among mainstream, pragmatic developers is that they work ubiquitously. After all, even as of late 2008, over 25% of early adopter profile web developers stated clearly that “Pages should look as near to identical as possible across browsers” – despite a decade or more of advocating for adaptive designs. Based on the experience of the slow adoption of CSS in the 1990s – time and time and time again, among developers, educators, writers, you would hear the phrase “but CSS doesn’t work”. This in my opinion undoubtedly held back the uptake of even the CSS that worked very well by years – and given ongoing prevalence of the use the font element [1], coupled with conversations I’ve had with professional developers in the last year or two, to this day, this belief is not entirely eradicated, and continues to have its effect.
Now, given CSS had effectively no competition (a good deal of what CSS provided was not possible with presentational HTML), whereas HTML5 does (it’s called HTML/XHTML) – if there are perceived or actual backwards compatibility issues for even the most simple aspects of the language (new elements like section) – I’d predict the chances of its widespread adoption happening anytime soon is pretty much non-existent. You only have to look to XHTML2 for a very near parallel example.
That there’s a JavaScript workaround that extremely well informed and skilled web developers might be aware of is simply not going to address that issue. I foresee everyday web developers trying to use the simplest aspects of HTML5, such as using the section element (which introduces the problem of a semantic mismatch between the meaning of H1 in HTML5 and in older versions of HTML, for what it is worth), try styling it, see that it doesn’t work in IE7, and then simply abandon any attempt to get up to speed with HTML5, as, a la CSS, “it doesn’t work”.
Quite a few of you suggested that we have the solution – XHTML with DTDs, or XML. The problem is that these have been around for coming on a decade, and are little if at all adopted by mainstream web developers (in fact, one of the reasons I focussed so specifically on backwards compatibility in HTML5 is that the lack of compatibility for most of the last 10 years with most browsers in common use is probably the single most important reason for the failure of these solutions to take off. Well, that and their complexity, in comparison with good old HTML.)
But keep in mind that the focus of this article is HTML5, and I’ve taken it as a given that the momentum for HTML5 to be the next major iteration of HTML more or less guarantees that will be the case. If nothing else, if it doesn’t make it, we will have wasted years and an enormous amount of energy and resources and have still not addressed significant shortcomings in HTML. So, my concern is to address what I consider to be a serious shortcoming in HTML5’s approach to an important aspect of the language – how it supports semantic markup.
A number of folks took the fsck IE6 approach ;-), or argued that it’s rapidly diminishing in use.
My argument is that we simply can’t ignore IE6 and backwards compatibility more generally, because as we’ve seen with many other very good technologies (SVG for instance), they simply won’t be adopted by the majority of developers because of that lack of widespread compatibility.
If we are concerned about the adoption of HTML5, then we really need to ensure that there are as few impediments to its adoption as possible. If the meme of HTML5 is not compatible with IE6” (which will soon become simply “IE” then “HTML5 is not supported in any browsers” takes root, then it can take years (as CSS advocates will attest) for those memes to be eradicated.
Jeremy Jarratt puts it most strongly
Does anyone seriously believe that we can get by just on minor modifications to the current HTML spec for the next decade?? Of course not! Sooner or later, we’re going to wish we had a continuous supply of new elements, where such things make sense, universally speaking.
While in many ways a very attractive idea, I think XML is an object lesson in exactly why this is hard if not impossible. XML was in many ways the way to start afresh. We’ve seen how well that’s worked out (at least when it comes to the web).
If we do wish we had “a continuous supply of new elements, where such things make sense”, then the current HTML5 proposal doesn’t provide that at all.
Regarding RDFa, it definitely should have got a mention. As I developed these ideas over the last couple of years, and the article (which has been in gestation for just about 12 months now), RDFa was coming together. I see the proposal that I’ve put together as being able to work in conjunction with RDFa, but more akin to providing a better framework within which existing widespread developer practices of using HTML class and id attributes to add pseudo-semantics to their markup, exemplified by microformats.
But RDFa is a quite radical departure from this existing common semantic practice. As such there’s no great guarantee that it will catch on, and in many cases, will be overkill for the purposes that most developers markup their content “semantically”.
data attribute, content attribute
A number of folks raised the HTML5 data attribute – but this is simply a bucket for applications to store their own data in. It’s expressly not for generalized uses –
“User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values”
Thanks for the reference to the content attribute Mark Birbeck. As with role, I don’t see why it wouldn’t make sense to adopt that as these existing attributes from XHTML2.
As to why class, meta, id and rel alone don’t suffice – there’s a number of arguments.
Firstly, I’ve tried to make the case that we’ve simply pushed these rudimentary semantic extensibility features of HTML past breaking point – the BBC microformats saga is pretty strong evidence of this.
Class is simply a bucket for strings which can be used for “general processing” – which can mean just about anything. id is essentially a bucket for a page level GUID – any semantics we layer on top of these attributes is really by tenuous convention. In short, they haven’t worked. In fact they weren’t really designed for semantics in the way they are commonly used now at all.
A number of respondents asked (at times as advocatus diaboli) whether the proposal wasn’t a “solution in search of a problem?”. I always think that’s a very good question to ask.
I think the fact that so many developers are using class and id as a mechanism for adding psuedo semantics to their documents that you’d really have to call it a standard practice among professional web developers, and the the considerable success of microformats despite the technical limitations of HTML, and even the addition of new semantic elements to HTML5! are indicators of the need for the semantics of HTML to be further enriched.
Craig Sharkey raised the issue of semantic libraries analogous to JS libraries – and it’s a good question as to why they haven’t really occurred to date (you could argue that this is to some extent what microformats are). I’d argue that the lack of any real mechanism for creating such libraries, other than simply using class and id, is one reason why we’ve yet to see the widespread development of such things.
Thanks again for the excellent conversation, and I do hope that it might lead to the reconsideration of aspects of HTML5.
Copy & paste the code below to embed this comment.
Benjamin Hawkes-Lewis
As to why class, meta, id and rel alone don’t suffice — there’s a number of arguments.
Firstly, I’ve tried to make the case that we’ve simply pushed these rudimentary semantic extensibility features of HTML past breaking point — the BBC microformats saga is pretty strong evidence of this. Class is simply a bucket for strings which can be used for “general processing”? — which can mean just about anything. id is essentially a bucket for a page level GUID – any semantics we layer on top of these attributes is really by tenuous convention. In short, they haven’t worked. In fact they weren’t really designed for semantics in the way they are commonly used now at all.
There are two arguments here.
1. The BBC dropping microformats is evidence that existing extension mechanisms are insufficient. This is true, but “the BBC were very clear that the only reason to drop microformats was their use of the ‘title’ attribute for human unfriendly data”:http://www.bbc.co.uk/blogs/radiolabs/2008/06/removing_microformats_from_bbc.shtml ; a problem solvable with a ‘content’/‘equivalent’ attribute. Additional attributes for different semantic modes doesn’t help towards solving this problem.
2. ‘class’ can be used for things other than semantic labeling. This is true, but this isn’t evidence that it doesn’t work for semantic labeling. That’s like saying JS can be used for form validation and therefore doesn’t work for dropdown menus. I’d say that microformats are actually strong evidence that ‘class’ works reasonably well for semantic labeling. I guess the underlying argument here is actually that a multipurpose attribute increases the chance of naming collisions? But introducing further attributes for different semantic modes would only reduce the chance of naming collisions, and they wouldn’t do so more than defensive class naming practice (e.g. ‘rhetoric-irony’ rather than ‘irony’). To actually prevent naming collisions you need a system like XML namespaces or a central registry of names.
Copy & paste the code below to embed this comment.
Richard Cotton
Part of the core of the problem is that we have become so enured to hacks to fix things, that it has become almost legitimate to hijack elements of markup to do arbitrary things.
The BBC decision lays bare a clear example; its a collision between two (ab)uses of the same attribute; neither of which is actually the intended use.
This is why I consider HTML5 to be a mistake; xhtml was a step forward. It has issues; why aren’t we fixing them instead of taking 2 steps backward?
With Google now redirecting IE6 users to download Firefox or Chrome, I predict that IE6 will soon disappear.
I very much doubt it. A high proportion of IE6 users have stuck with it because they can’t upgrade to IE7 – because they are on a corporate network and/or are using an older version of Windows. As Chrome has the same system requirements as IE7, the majority of people using IE6 will be unable to install it.
Firefox has been around and highly publicised for years – and can be run on older versions of Windows. Anyone who is using IE6 on their own computer, and has not chosen to install Firefox (or upgrade to IE7, if on XP), is unlikely to install Chrome.
Pants, that messed up the formatting a bit. Let’s try again.
With Google now redirecting IE6 users to download Firefox or Chrome, I predict that IE6 will soon disappear.
I very much doubt it. A high proportion of IE6 users have stuck with it because they can’t upgrade to IE7 – because they are on a corporate network and/or are using an older version of Windows. As Chrome has the same system requirements as IE7, the majority of people using IE6 will be unable to install it.
Firefox has been around and highly publicised for years — and can be run on older versions of Windows. Anyone who is using IE6 on their own computer, and has not chosen to install Firefox (or upgrade to IE7, if on XP), is unlikely to install Chrome.
“My argument is that we simply can’t ignore IE6 and backwards compatibility more generally, because as we’ve seen with many other very good technologies (SVG for instance), they simply won’t be adopted by the majority of developers because of that lack of widespread compatibility.”
I don’t know how to say this politely, but this is pure bunk. Google has already taken the first step to eradicate IE6, and if others would be as equally brave, we might actually finally get rid of this albatross.
Perhaps instead of a new specification, we need new attitudes, at least in the web development/design community. Where are the risk takers? The people who used to push and actively promote the best, rather than tenderly support the absolute worst of the web?
All I can say is bunk. If we all, actively worked to finally put this old piece of “refuse” to its long, overdue sleep, we could eliminate it as a problem in less than a year.
Instead, we play it safe. We tippy toe. We clasp our hands to our breasts and wash and re-wash our fingers over and over again, in a tizzy of anxiety, as we murmur, in mortified terms, “Oh, we can’t ignore IE6.”
Yes, we can. Maybe change needs to find a home in places other than just politics.
Now is exactly when we can force this absolutely essential change. Corporations have other concerns than browser usage, and the people you’re worried about using IE6 are being laid off.
IE7 not supported in older Windows versions? Well, guess what—Firefox and Opera work on older operating systems. Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.
Best of all from a change perspective, minimalist design is very hip right now. So, let’s provide a minimalist design for the IE users, and use the nifty CSS3 tricks and SVG for the rest. Then the few IE6 corporate users still employed will still be able to access your site. And, if they want to get the best effect, they can access it, again, when they get home, where they’re using a decent browser.
But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.
“My argument is that we simply can’t ignore IE6 and backwards compatibility more generally, because as we’ve seen with many other very good technologies (SVG for instance), they simply won’t be adopted by the majority of developers because of that lack of widespread compatibility.”
I don’t know how to say this politely, but this is pure bunk. Google has already taken the first step to eradicate IE6, and if others would be as equally brave, we might actually finally get rid of this albatross.
Perhaps instead of a new specification, we need new attitudes, at least in the web development/design community. Where are the risk takers? The people who used to push and actively promote the best, rather than tenderly support the absolute worst of the web?
All I can say is bunk. If we all, actively worked to finally put this old piece of “refuse” to its long, overdue sleep, we could eliminate it as a problem in less than a year.
Instead, we play it safe. We tippy toe. We clasp our hands to our breasts and wash and re-wash our fingers over and over again, in a tizzy of anxiety, as we murmur, in mortified terms, “Oh, we can’t ignore IE6.”
Yes, we can. Maybe change needs to find a home in places other than just politics.
Now is exactly when we can force this absolutely essential change. Corporations have other concerns than browser usage, and the people you’re worried about using IE6 are being laid off.
IE7 not supported in older Windows versions? Well, guess what—Firefox and Opera work on older operating systems. Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.
Best of all from a change perspective, minimalist design is very hip right now. So, let’s provide a minimalist design for the IE users, and use the nifty CSS3 tricks and SVG for the rest. Then the few IE6 corporate users still employed will still be able to access your site. And, if they want to get the best effect, they can access it, again, when they get home, where they’re using a decent browser.
But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.
Copy & paste the code below to embed this comment.
Benjamin Hawkes-Lewis
Perhaps instead of a new specification, we need new attitudes, at least in the web development/design community. Where are the risk takers? The people who used to push and actively promote the best, rather than tenderly support the absolute worst of the web?
Or maybe we need some hardheaded cost/benefit analysis to establish the “best”, by thinking about the effects of our technical decisions on users, customers, and friends.
It’s one thing to adopt practices that significantly improve the user experience of some users without significantly impairing the user experience of any large group of users.
It’s another thing to adopt practices that significantly impair the user experience of a large group of users, especially if there are other ways to achieve benefits for the smaller group of users.
For example, if developers can encode roughly the same semantics as ‘header’, ‘aside’, ‘footer’, and ‘section’ with ARIA attributes that might improve the experience for the same small group of users without breaking their layout in the most widely used browsers with JS disabled, then maybe it’s rational to use the ARIA attributes instead of the new elements?
So what if we have to write ‘<div class=“header” aria-landmark=“banner”>’ instead of ‘<header>’? Is the later better language design? Absolutely. Is it worth the cost to end-users? Not necessarily. On its own, is it worth hassling Granny to switch to a “happy” browser? Perhaps not.
I feel that the test of a good article is the extent to which it makes me think. Your article, sir, has me thinking in spades.
I work for a sizable company, doing front-end web development, so the topic of HTML5, and all the subsequent topics of backwards-compatibility and forward thinking has me pondering what could be on a day-to-day basis. The advances in JavaScript engines lately has me on the edge of my seat, as well as advances in CSS specification implementation in the major browsers.
Here’s my thought. I feel that the current HTML spec has us backed in a corner. And until we get ourselves out of it, there really isn’t any hope for us. HTML5, while I applaud the addition of new elements, such additions won’t be enough in the end. Backwards compatibility is great, but in my personal opinion, takes a backseat to future possibilities.
Please understand, this idea is minutes old, but what’s preventing the W3C and browser vendors from implementing an extensible architecture as opposed to mere elements? Elements we have, but the ability to tailor them with custom attributes should be the future. For example, would it make more sense to be able to define custom attributes like “structure” and be able to add it to the div element (or any element for that matter), with the custom definition containing the information necessary to tell the browser how to interpret it?
Elements should be objects and attributes should be extended objects, to put the idea in perspective. Yes, there will be an initial hump while older browsers died off, but such a solution would eventually pay for itself. No matter what custom attribute we defined, we also enclose the means necessary for the browser to handle the custom definition. HTML would be like any other object-oriented language. Extensible and scalable. We would load attribute definitions like we load JavaScript. Style with CSS would be as simple as your article suggested.
It’s just an idea and yes, it completely ignores the backwards-compatibility facet of the conversation. But we live in a dynamic world and this seems like the only out from a future circumstance where we’ll still talking about this very same subject.
I’m glad someone is really talking about these new tags. Personally, I was stoked to see <nav> and <section> and <header> and <footer> being added. I’m glad you mentioned DocBook because it solved a problem for print publications that I think HTML needs to solve for digital books, but unfortunately, I don’t think trying to borrow from DocBook is the right approach. I do think your attribute solution is thoughtful and has a lot of merit. For now, the tags make sense to me. In your example of the problem of applying a font color to a section via CSS rule, more complex selectors could be used to apply that style to sub elements of the <section> element. Since <section> is structural, it seems more aimed at the non-human consumers of the page, such as e-book reading systems which otherwise have no way of telling what a section of a book is (the div tag was supposed to do it, but it’s now used for layout). These new tags should not be styled, although they can be part of a CSS selector, as ‘hooks’ in more selective selector strings. The nav is another good one, especially in the case of books, because in many cases a tertiary web app may want to access only the text of the book (or doc, or whatever is at the heart of the web page), and skip any ads, extraneous navigational elements, or other structural scaffolding unnecessary and problematic for rendering the book itself. There are plenty of cases outside of the e-book application scenario—I use that because it’s the area I primarily struggle with. Backwards compatibility is really important, and Ian Hickson is one of the strongest proponents I’ve seen on the WHAT-WG list on keeping things backwards compatible. That said, I do agree we can’t keep inventing new tags through the long involved processes of the W3C and WHAT WG every time the Web evolves on its own.
This was a thoughtful article and I really agree with its stance. Since John opens by taking such a long view on the web, and since I don’t think most people think about that enough, I thought this TED Talk would be interesting for anyone reading:
Imagine if HTML had been invented in Shakespeare’s day. Would we still be using tags like < shoppe type=“ye oldde” >? Clearly our existing language may change in hundreds of years (or less if texting has anything to do with it.) So there may be a need to update the words used for tags, even if just to improve them. (Personally I have always found < blockquote > ridiculously long, especially when there is < q >. Why not just have < quote > and use an attribute?)
Also why should HTML be written in English? Why not have African or Arabic tag names? Perhaps localised versions can be created?
I say keep improving the existing tags and attributes and of course add new things as the web moves forward. Backwards compatibility is simply a matter of browsers converting any changed tags. Anything new, well hey, one day tables were new and browsers had to cope back then. And CSS! Did we refuse to use it because of pre-CSS browsers? No, we all moved forward by downloading new versions of Netscape. This is the way it has been and should be in the future.
Having said that Zachary’s idea above is good. How about tags that didn’t refer to something specific, but the user then applied the relevant attribute? Eg:
< box use=“sidebar” >
< list use=“menu” >
< box use=“header” >
< text use=“paragraph” >
< text use=“quote” >
< text use=“email” >
It might make reading documents harder, but there’d be no need to battle over the names of the tags. A standard set would suffice for everything. All the browser needs to know is if the tags are block or inline, floated or not and so on. The stylesheet would provide that.
Surely the guys at W3CHTML5WG already have a vision for adding semantics to HTML, don’t they? And it can’t just be adding ad-hock elements every few years, can it? There are already too many solutions in use out there – microformats, rdfa, embedded rdf, xhtml etc – and clearly few people share a single vision for how this is all going to pan out. But more worryingly, we don’t know what the W3C’s vision is, so we have to make up solutions to prompt them into action. I’m looking forward to the day when they get their act together so we can just build really cool things. But I’m not holding my breath.
Shelley said, “Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.”
I was a corporate web developer until June last year,and I too hate IE 6. But to blame the econominc downturn and corporate bankruptcies on it feels a little over-exaggerated.
Now I work for Opera, so have every reason to diss Microsoft, but it’s wrong to ignore IE 6. It’s temptimng to take the “f**k IE 6” approach, but lots of companies have windows 2000 machines as it’s supported until 2010. Will they upgrade those machines to windows XP or Vista now, in a credit crunch, just to look at sexy web sites that don’t support IE 6?
Lots of people in the developing world use older machines out of economic necessity. Sure, they could install Linux and then Opera or Firefox, but are we really back to the era of requiring users install operating systems and certain browsers for the privilege of viewing our super-special sites?
I was shocked yesterday to find a major university in England has IE6 on its default drive image that all PCs have to have. So that’s hundreds of machines all stuck on an old browser. I feel this may be typical for IT at other campuses too as they are never bang up to date due to security concerns of upgrading. But I thought they’d at least have IE7 on there.
Copy & paste the code below to embed this comment.
Rob Burns
I have to say the difference between IE6 and IE7 seem so miniscule (especially when one considers the 7 years spent in development) that this sub-thread about how horrible IE6 is looks like a marketing ploy for Internet Explorer and Windows sales :-). Probably just the conspiracy theorist in me, but its worth pointing that out for anyone feeling they need to upgrade.
Copy & paste the code below to embed this comment.
Ben Rowe
I see html5 as a opportunity to help encourage people to update their browsers. While I agree with the article (semantics & all), and despite the shortcomings of html5, this is a browser-marketers dream to encourage people to upgrade to a browser that supports the full html5 spec. Of course it will take the completion of ie8 (if they fully support the current html5 spec). However what better excuse could you think of to give your visitors incentive to upgrade their browsers then something like “This website utilizes ‘marketing term’ technology. To use the site to its full potential please upgrade your browser”? The whole industry could push this new ‘marketing term’, making it the next Web2.0 if you will…
Of course the only downside of this is that there isn’t much substance to this, from a user’s point of view (the canvas tag is one of the few tags that will give users an actual reason to upgrade). As developers we get html5, css3, etc. The user gets not a lot. This strategy needs a lot of refinement, but it’s certainly something that could work.
It will be a bumpy road, but we need to balls up as an industry and not take the chicken shit way out all the time.
I think we should be focusing on getting browsers to work more consistently and getting rid of old browsers like ie6 that have no place in today’s world. Tech moves fast and yet ie6 lingers on. No matter how you try and make things backwards compatible you will always be limited by decaying technology, there is only so far you can go before you have to stop and address the existence of obstacles like old browsers.
Ignoring them and creating new languages is great but don’t expect not to run into the same problems a few years later.
I understand your concerns of semantic limitations. I have already solved this problem in the language I created, mail markup language. You can download the schema in order to play with it or read the specification for documentation. I solve the problem through the use of the “role” attribute which is compatible with XHTML and HTML 5. Since my language is inherently XMLRDF and OWL are expected to use the role attribute for semantic processing.
Find everything and more about mail markup language at http://mailmarkup.org/
Copy & paste the code below to embed this comment.
Rob Burns
Since the topic keeps getting raised, I have to ask what are the significant differences in standard support between IE6 and IE7. As far as I can tell they are minimal to non-existent. IE8 promises better support such as CSS :before and :after pseudo element support and the associated generated content properties. However, the big expectation for standard support in IE7, after 7 years of development, was that it would add XHTML support and CSS generated content support. Neither materialized. Nor did other features such as SVG or complete Ruby support. So what are the major problems with IE6 compared to IE7 in terms of standard support that posters keep referring to.
The campus I was referring to have sent round an upgrade to IE7! Now everyone is complaining about the toolbar. (One guy thought the Refresh button had disappeared completely.) Still, at least they now have tabs – one big difference between IE6 and IE7. And better CSS and HTML support. And a heap of bug fixes. And you can zoom in graphics not just text (which also cures the long-standing fonts-set-in-pixels-can’t-be-resized problem). So there’s quite a lot of improvements if you ask me.
Copy & paste the code below to embed this comment.
Rob Burns
And better CSS and HTML support.
This is the part that my question was about. The claims about IE6 being horrible have mostly related to standards support. Yet with all the things I expected to arrive in IE7 (after many years in development) I can’t really think of many things that IE7 improved. On the other hand, the IE8 beta does offer some CSS and HTML improvements, but what does IE7 offer over IE6 in this area?
Copy & paste the code below to embed this comment.
Les Kobayashi
Sorry in advance for going slightly off topic…
@71
“But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.”
I don’t know how to put this politely either, but that’s just arrogant. And Elitist. Most designers are getting paid by clients who have a very real bottom line in TODAY’S reality where IE6 still represents 20% – 25% of the mainstream market. Front line web developers / designers need to deal with IE6. What they don’t need is to take the blame for it’s over-extended shelf life. That’s like blaming the road designer because your aunt’s old K-car still gets her to Walmart every Saturday.
Yes, of course there needs to be continued progression and risk taking. And there is. And, IE6 really will die a quiet little death one day. In the meantime, it’s not nearly the catastrophic issue some make it out to be.
@89
IE6 has multiple display issues with CSS borders, margins, floats, png transparency and more—most of those display issues were corrected in IE7. There’s this thing called google where you can dig up all kinds of clarification ;)
There were a lot of bug fixes and improvements made to IE with version 7. Even simple stuff like adding <abbr> was welcome. (I personally don’t think generated content, while useful, is an essential addition.)
The problem with IE7 is that it also introduced new bugs. And there were still plenty of unfixed bugs.
Anyone wanting to know more about the true horror of IE bugs might wish to peruse the following sites:
“Position Is Everything”:http://www.positioniseverything.net/
Copy & paste the code below to embed this comment.
Matteo Cajani
I really appreciate your idea about attributes. I don’t think that we need more than one html attribute: “semantic”. Then anyone can define all the semantic classes he needs. Of course we need to define the properties such as “rhetoric”, “structure”, etc… and their values as we have “background-color”, “font-family”, etc… in CSS
Here an example of HTML and CSE (Cascading SEmantic sheet):
Is the browser or the structure (HTML) here really the issue. It would seem that the major hangup for new tags and features in HTML is the backward compatibility.
What blows my mind is that we as a community continue to perpetuate the problem. Get off HTML and develop something new. Maintain a legacy object capable of rendering HTML but move something to a new open standard. Then, make that easy to upgrade.
Look at “Flash”. When something new comes out what do people do. Upgrade. What do websites say. “Upgrade to the latest”.
People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress. We use HTML and CSS as a container for the wiz bang we do with flash and javascript. We want more than text from the browser, so lets finally do what it takes to get there.
We need to develop an open standard that allows for upgrades and force everyone to abide by the upgrade path. If you don’t, then you cant expect to get serviced.
Just imagine where you OS would be if we still had to support 8 bit executables!
Remember this before you respond with the “What about the other devices”. Ok, lets talk about that WAP, Mobile CSS, etc. Your argument is what? That they read the standard HTML you code? They don’t. The only standard is that there is not one. We need one and HTML sure is not it.
Sure, there will be pain, but once the pain is gone you’ll be much happier in a world where you can do more than just place a few lines of text in a document.
I found the article encouraged me to think more about HTML5 than I have so far. And while I sympathise with the author, I’m in agreement with those commenters that this is a solution in search of a problem.
It is perhaps a little ironic that the article is a critique of HTML5’s inflexibility when, at least as far as I know, HTML5 was proposed as a pragmatic solution to the seemingly intractable solution of “what comes after HTML4”. This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss. This has important consequences primarily for those developing browsers so that HTML5 support is both robust and fast. Although this means that the semantics must be frozen, it in no way constrains attributes from being extensible. The difference is, however, one of scope: extensible attributes will have an application (website) specific purpose whereas the specified tags will always have the same purpose.
The supplied use cases illustrate this misunderstanding:
While <span role=“2009-05-01”>May Day next year</span> really doesn’t make sense, something along the lines of <span equivalent=“2009-05-01”>May Day next year</span> would.
Neither of the tags is semantically satisfying particularly when the <date> tag is available.
1) Ideally the datetime value should be the content of the tag and not an attribute of it. Confusing the two is a common mistake particularly in XML. Of course, being too restrictive here would prevent anyone from incorrectly using the tag and provoking exactly the kind of errors that HTML5 tries to avoid.
2) How the date is displayed is a matter of presentation and, therefore, something that may be controlled by meta-data: a format attribute or CSS declaration or browser option. This avoids all the problems of localisation like when is 12/1/2009? In the example supplied format=“holiday” or format=“short”
Making this definition part of the specification allows browsers to handle content intelligently – offer to add the date to a user’s calender or do a search on the date- in a way the suggested “equivalent” simply could not.
In the same vein we have the new <video> and <audio> tags to handle the now well-established practice of including audio and video in websites.
Extensibilty here is not the solution; it simply shifts the problem to the namespace.
Copy & paste the code below to embed this comment.
Rob Burns
In comment #94, Charlie Clark says:
This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss.
This comment reflects a confusion repeated in this discussion that ascribes magical properties to either the XML serialization or the traditional text/html serialization of HTML (I can’t tell which one has bee ascribed magical powers, but neither serialization has any).
HTML5 is basically three things. 1) it is a parsing and serialization specification that attempts to codify the parsing performed by the major browsers with respect to traditional HTML serializations (and perhaps incrementally improve or at least select the best traits of existing browser parsing operations); 2) A specification of browser (and some other UA) behavior with the results a parsed (however parsed) or DOM created document; 3) a specification of a vocabulary of elements and attributes for authoring documents that are ostensibly semantic in nature.
The topic of discussion here is about #3. It is not about the parsing and serialization of traditional text/html serializations. HTML5 (in the #3 sense) can arise from the traditional serialization, as a solely DOM creation or from an XML serialization (which despite what we are told is not so drastically different with respect to the topic at hand). So I’m not sure why the use XML or use XHTML line keeps arising in this conversation. It has nothing to do with the topic of the conversation.
Also this is not about confusing whether data belongs in an attribute or in the contents of an element. There is not one right way to do this. The point of RDFa is that one can easily add properly parsed attributes to existing HTML elements (or any SGML or XML or otherwise elements) that add machine readable metadata about the natural language expression as the contents of the element. That means the presentation can be left as is (with the contents of the element appearing) or the UA could replace or augment that presentation with a localizable expression for the date. And whereas the HTML5 attempt to copy RDFa introduces a single purpose date element, RDFa provides a way to add precise machine readable equivalents to an element for any imaginable data type that can be expressed as the contents of an attribute (including anyURI values).
Get off HTML and develop something new. Maintain a legacy object capable of rendering HTML but move something to a new open standard. Then, make that easy to upgrade.
Look at “Flash”?…
People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress.
I’ve been thinking for years that Flash should be the way forward. It makes perfect sense. A massive market penetration of users, smooth font rendering that can use any font, all the vector goodness you could want, Photoshop-style filters, what’s not to love?
And I don’t mean full-blown Flash sites with rotating objects. I mean a plain renderer that improves on what the poor browser has to render. And best of all…
Identical cross-platform rendering!
Think about it. No more broken layouts due to so many different browsers to test in. No more holding back on things that only work in 1 or 2 browsers. No more having to code to the bare minimum because of ancient browsers still in use today.
Unless something like this happens we will be stuck in browser hell forever.
Copy & paste the code below to embed this comment.
Artem Ploujnikov
We already have a lot of tools that can provide extensible semantics. The whole point of XHTML (even version 1.0) was to allow users to introduce custom namespaces into xhtml documents and then handle them programmatically using scripts, plug-ins or XSLT. Unfortunately, these technologies are now being bullied out of existence by the simplicity-oriented majority who won’t touch them with a ten-foot pole…
This is a recurring issue in programming. It appears that, for most people, the introduction of a new concept creates a nearly insurmountable psychological barrier. They say things like “I can’t do it. I can’t understand it. It’s complicated. It’s too abstract”. To me, and a few others (like those who invented object-oriented programming and (XML/XHTML/XSLT/XPath/XQuery), ease of use means fewer clicks and less hand-coding, even if it introduces more concepts I have to learn. To “normal” people, ease of use IS simplicity. New concepts lead to an immediate mental block.
Just take a look at the whole motivation of behind the move to HTML5. The folks who invented the “html serialization” don’t want doctypes, schemas or even a version number. What they’re trying to do is to make “tag soup handling” part of the spec. Also, they explicitly objected to the use of namespaces.
The original spec even said
“Generally speaking, authors are discouraged from trying to use XML on the Web, because XML has much stricter syntax rules than the “HTML5”? variant described above, and is relatively newer and therefore less mature”
Fortunately, the W3C didn’t take a stand on the issue, and they removed the above paragraph from their version of the spec.
We have other strict technologies with extensible semantics in mainstream use, which have a strict, clearly defined syntax (e.g. C#). Very few people ever complained about the whole app not compiling if you miss a semicolon because they have tools that pick up the semicolons for them. I grew up on BASIC, Pascal and Delphi, and I also learned C/C++, so I don’t take error messages personally. But as long as relatively people who don’t know how to escape an ampersand continue hand-coding their HTML in notepad, fault tolerance will remain a requirement, even though after accumulating a certain critical number of errors, it makes tracing nearly impossible, and validators become completely useful because there’s an “error” on every line, but the page displayed fine just an hour ago. For instance, in one of the apps I developed at work, I put up an XSLT post-processor with a syntax checker, and a lot of the other developers would swear at it because they had no clue as to how they can make their code output well-formed XML.
I believe the Web community needs to split. Just like the desktop application world has Visual Basic for simplicity-oriented developers and C# for those who like structure, we need to have two separate stacks, one for tag soup hand-coders who think namespaces are evil, and another one who prefer a stricter syntax. The split can be made on top of the XHTML5 (the XML version of HTML5), with namespaced elements used to define semantics. Those who prefer the tag-soup-friendly version also happen to be the ones who adhere to the KISS principle and, therefore, don’t need extensible semantics anyway. They would rather copy&paste; a block of html ten times over than introduce a new concept for a frequently used element. It doesn’t matter if the majority does tag soup as long as there’s enough community support for the “complex and extensible” version to keep it alive.
I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.” It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.” The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street. No one should ever slight reading the specs and knowing the proper way to code either HTML or XML, but when it comes to parsing the Web, see the recent Opera study, with MAMA, or the Google study that Ian Hickson did, and it’s clear that less than 10% of what we deal with out there is XHTML, much less proper XHTML.
Copy & paste the code below to embed this comment.
Rob Burns
@Aaron Miller
bq. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.”? It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.”? The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street.
Tag soup gets used in two different ways which you’re confusing in this comment. 1) tag soup sometimes refers to the serialized source content of a document where tags are potentially misnested, content models invented out of thin air and attribute values requiring quotations not quoted: in general vended content not conforming to any specification anywhere. 2) tag soup parsing refers to a parser that is capable of parsing tag soup (a Herculean task).
When you say that XHTML parsed as text/html will be tag soup you’re confusing these two definitions. The XHTML is certainly not tag soup as it adheres to the XHTML syntax and sometimes even other syntactic requirements on top of that (such as XHTML 1.0 appendix C), so there is no sense in which that content can be considered tag soup. However, if such an XHTML document is vended as text/html it will be parsed by the UA’s tag soup parsing just like any other conforming or non-conforming HTML 2-4.0.1. So in this sense both: not using XHTML; and not vending as applicaiton/xhtml+xml means that the content is parsed by the tag soup parsing processor (just like any other HTML).
It’s important to keep these two meanings of tag soup separate to understand the conversation.
@Rob, looks like we’re talking about the same distinction. The only difference is that I’m saying the phrase “tag soup” makes it sound anomalous, when in fact from the content side it refers to over 95% of the web, and from the browser (parser) side, it’s SOP. See the Opera MAMA study and Ian Hickson’s Google report if you don’t know what I mean.
This is not directly related to John’s article, but it reminded me of the following problem I’ve been posing in my head for some time: How is that the syntax of HTML or any other machine-readable grammar is constructed using English? More specifically US English? Has anyone ever tried to construct a language, of even a very light grammar, that allows multi-[Human]languages to describe headers, footers, loops, lists etc? I appreciate that many of these machine-language were first composed in the US and thus US-English has become the Lingua-Franca of programming – but this is the 21st century and not everyone on the Planet who wishes to write code knows how to speak/write English never mind to a specific sub-grammar of it.
Copy & paste the code below to embed this comment.
blackdog
I agree with all the principal points in the article, i have the same opinion about a preferrable use of attributes; but i think they’re breaking compatibility on purpose, we all agree it is stupid to still have concerns about IE6 in 2009 (and soon ‘10). If they break the cordon everybody will be happier. And in fact the big push on HTML5 came from browser makers, and looks to me MS wants to be in the game.
In the aftermath we will all have a common base to discuss upon.
Afterall i think some new tags would come in handy.
For eg i think that for something as ubiquitous as a calendar, there should exist a tag, it would end the debate wich solution is more semantic (table vs list, that oddly relies on the kind of visualization we want to give), it would spare a lot of code and give more artistic freedom to designers that could target a parameter/class with a simple javascript to radically modify the visualization.
108 Reader Comments
Back to the ArticleEric Fields
Great article and great discussion. I feel like I’m in a conference workshop with some of the most insightful web monkeys in the world.
I like the concept of creating mechanisms over widgets as a philosophy in general. They enable. They allow for creativity, individuality, and cleverness that might never have existed otherwise. Its the difference between building a Lego city with roads for your Hot Wheels rather than cruising Need for Speed. Its knowing how to sear a pork chop, cut and saute an onion and make a pan sauce with some leftover white wine vs. following a recipe step by step (and having to read it a few times over to make sure you’re doing it just right).
I do design and front-end development solely, working around CMSs and back-end logic to make things look pretty. So here is a perspective from someone who is less inclined to care much about DTDs, schemas and whatnots.
I’ve written enough sites where I have to ID divs as header, content, footer, nav, sidebar, etc, but I don’t always. I’ve done plenty of top-level, secondary, and tertiary navs, but I’ve also designed sites that were 10 pages that could have been 1 (if the client was hipper, it would have been), where the concept of navigation would be meaningless. I strive for as valid and accessible as possible.
So what good is it to add tags that I don’t need to use all the time? What problem are we solving? I see canned code being handed out instead of taking the time to learn how you can get a lot done just by tossing in some IDs and classes.
Let’s standardize the vocabulary, sorta how microformats is doing. Its data about data and its ready to go. Build smarter parsers.
That dataset attribute in HTML5 is about the only thing I’m really excited for.
Oh, and the argument that new tags clean up divitis is a terrible one; you’ve cured one ailment and injected tagitis. At least a div is a div and when I think its special, I can add an ID to it.
Weston Ruter
Firefox 2 seems to be able to style HTML5 elements just fine when being served XHTML. That’s what we at “Shepherd Interactive”:http://shepherd-interactive.com/ have done on our own site and on client sites like “ReBath of Oregon”:http://rebathoregon.com/ and “CCAA”:http://theccaa.net/
Benjamin Hawkes-Lewis
“Eric Fields”:http://eric@ericdfields.com/ asks
It’s a good question. :) If you look at the “WHATWG process for adding new features to HTML5”:http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_adding_new_features_to_the_spec.3F , you’ll find that contributors are supposed to begin with a user problem not a technical solution. For example, a recent contributor suggested an ‘author’ element for use in citations without explaining what user problem it would solve, and so was told to go back and explain the actual problem. I believe the new structural elements do solve important user problems, but if you disagree, there’s also a “process for removing current features from HTML5”:http://wiki.whatwg.org/wiki/FAQ#Is_there_a_process_for_removing_bad_ideas_from_the_spec.3F .
Every site doesn’t need to feature a semantic for it to be in common use and benefit from explicit markup: not every site has a table, but I’d assume you wouldn’t suggest replacing table markup with class and ID names? These semantics for the new structural elements are in common use, as the “Google Web Authoring Statistics”:http://code.google.com/webstats/2005-12/classes.html show.
‘h’ solves the problem of expressing a seventh-level heading in HTML (I’ve run across this problem once or twice). It also makes it easier to copy and paste code from one place to another (since you don’t have to adjust the heading numbers) or to allow contributors to your site to include headings in posts, comments, or wiki articles without hard-coding particular heading levels.
One simple use-case for the other new structural elements (‘header’, ‘footer’, ‘nav’, ‘article’, ‘section’, ‘aside’, ‘dialog’) is for user-agents to provide keyboard or verbal shortcuts for moving around the page quickly. For example, marking up your navigation with a ‘nav’ element means that user agents can implement a more reliable “Skip to Content” command (at the moment they can only guess where content begins by looking at link density, looking at visual blocks, or looking for common class and ID names). Rather than remembering what heading level this site uses for article titles, you can simply use the ‘Next Article’ command. Conversely, when you have finished going through the site, you can use a ‘Jump to Navigation’ command to explore the rest of the site. These represent substantial improvements for people with mobility or visual disabilities.
Another simple use-case is user-agents being able to supply alternate presentation options or users being able to customize their user experience directly. Rather than styling an array of class and ID names, you can style ‘nav’ as a dropdown menu. You can rely on ‘title’ to tell you where you are, hide the main ‘header’ and main ‘footer’, and move the main ‘nav’ to the bottom of the screen to put content first.
Yet another simple use-case is making it easier to spider content. For example, if you want the articles from a site that doesn’t syndicate, you can simply extract each ‘article’.
Benjamin Hawkes-Lewis
“Eric Fields”:http://www.alistapart.com/comments/semanticsinhtml5?page=6#51 suggests instead:
Depends what you mean by “standardize”. HTML5 initially tried to standardize certain class names such as a ‘copyright’. However, it turned out that sites actually use the class in different ways; microformats tend to rely on opaque ancestor classes like ‘hcard’ or ‘hatom’ to distinguish them from similar class sets. The microformats community said they didn’t require standardization of class names. So predefinition of class names was dropped from the specification.
I’m all in favour of extensibility via microformats, but the microformats community is a spec-writing not a standards organization. Microformats are never going to be a requirement for writing conforming HTML; they are never going to be taught as an intrinsic part of the HTML standard. If we want a standard encoding of a semantic to be used as widely as possible – if it serves a common, important user need – it must be an HTML element or attribute in the HTML standard.
Benjamin Hawkes-Lewis
“Rob Burns”:http://www.alistapart.com/comments/semanticsinhtml5?page=5#49 says:
Hmm. Assuming that SVG and MathML are provided with text/html serializations, can you give a concrete example of the distributed extensibility you mean and how it would be more usefully, efficiently, and accessibly implemented using an XML element than a set of HTML ‘class’ and ‘content’/‘equivalent’ attributes?
(Incidentally, I don’t really agree that a ‘datagrid’ is simply another presentation of a list; I think it actually represents an editable dataset; but that’s not crucial to this discussion.)
Benjamin Hawkes-Lewis
I wrote:
Ahem. Some little birds have just reminded me that there is no ‘h’ element in HTML5 (that’s XHTML2). Instead the heading algorithm has been redefined to allow the combination of ‘section’ and ‘h1’ to solve the problems I mentioned. Sorry for the confusion. See the “draft spec”:http://www.whatwg.org/specs/web-apps/current-work/multipage/semantics.html#headings-and-sections for details.
Shelley Powers
@41
“If we lay aside the considerations that HTML5 includes “˜canvas’ and may include a text/html serialization of SVG in the future, and pretend that it also included a “˜content’/‘equivalent’ attribute, I don’t see why it would be an order of magnitude harder to implement a language with the same meanings and functionality as SVG in HTML plus CSS plus JS than in (a) HTML plus attributes for different semantic types, or (b) a brand new XML language. I think all would be sort-of possible, but none would be performant (element per pixel anyone?). I think what made SVG practical was user agents implementing the necessary high-performance drawing code; it’s not an example of distributed extensibility.”
You’re conflating the canvas element and SVG and the two share nothing other than they are graphical, and can be used in web pages.
And when you say one can emulate SVG with HTML, CSS, and JavaScript, I have to assume you haven’t worked with SVG overmuch. No insult intended, but there is a large difference between a vector markup with support for declarative animation, and a script-based bitmap element. Just turn off JavaScript to see what I mean.
And when you say it’s not an example of distributed extensibility, again, I’m not sure where you’re coming from. The concept of distributed extensibility has nothing to do with semantics, or graphics for that matter, and everything to do with incorporating a capacity for change without having to modify the underlying parser, and mitigating the effects of naming collisions through the use of some sort of namespace mechanism.
With XHTML, I can incorporate SVG inline, RDFa, MathML, and whatever other extension I want to incorporate at some future time. Currently I use SVG for design, and RDFa for semantics.
The browser may process the data, as most do with SVG. Or it may not, as most don’t with RDFa. That’s not important. What is important is that when a new extension comes along that’s formatted in whatever format is valid for browser consumption, it doesn’t have to go to committee. Doesn’t have to be somewhat merged into the underlying data domain of the web page specification. Can be used immediately.
With this concept, not only is the underlying page markup kept clean, and as simple as possible, we don’t have to wait years in order to make use of the extension. Most browsers don’t do anything with RDFa, but there are Firefox and other extensions that can use it.
I use XHTML, but it has come with a cost in time, because it’s unforgiving. I’m willing to make the time, but a lot of people are not. Unfortunately some browsers like Firefox take the concept of returning an XML error literally, and provide the most awful error page in the world. Others, though, like Opera and Safari, return errors that are much more helpful. That’s why many of us were hoping that we could have the best of both worlds—the forgiveness of HTML, with the extensibility of XHTML.
(And it’s hard to write all of this in dinky comment box, so apologies in advance for typos.)
Brian LePore
To answer Bruce’s question way back in comment #14:
Microformats are used in many more places than you think. ALA uses the copyright Microformat as part of its design.
Expect to see more usage of the hAtom/hfeed Microformats as they are being used in IE8 to accomplish their new Web Slices feature.
I might recommend the Tails Export extension for Firefox to alert you what Microformats are present on a page.
Benjamin Hawkes-Lewis
If all you’re saying is that vector graphics markup is a useful feature to have in any web technology stack, then I entirely agree as do the people who want a text/html serialization of SVG to be included in HTML5.
I was trying to use SVG as an example of the practical limitations of distributed extensibility. Just to recap, my argument in a nutshell is that:
1. Neither HTML nor XML (plus JS plus CSS plus ARIA) yet give you distributed extensibility that provides meaning or functionality to users of user agents not specifically programmed with an interface for those meanings or functions, except for such meanings and functions as can be derived additively.
2. Nobody’s identified any practical advantages – from a pure distributed extensibility perspective – that XML plus JS plus CSS plus ARIA has over HTML plus JS plus CSS plus ARIA.
SVG is an example of a technology that might have been possible, but would not have been very practical, for developer communities beyond the standards organizations and consuming software vendors to have designed and implemented in webpages in popular browsers using HTML, JS, and CSS, or using XML, JS, and CSS. You’ve pointed to just one of the problems that would have entailed: dependence on publisher JS.
Today, SVG has already been implemented – by browsers and plugins. Perhaps accessible 3D environments might be a more current example of a technology which are not very practical in an XML language without implementation by consuming software.
But you haven’t established that text/html is less extensible than XHTML! You’ve just pointed out that some XML languages have some functionality (vector graphics and mathematical typesetting) implemented in browsers that text/html doesn’t.
XML formats don’t have vector graphics or mathematical typesetting abilities because XML is better for distributed extensibility, but because W3C only defined XML serializations of MathML and SVG. W3C once drafted “mathematical extensions for text/html”:http://www.w3.org/MarkUp/html3/maths.html and Microsoft originally proposed and implemented a “vector graphics language using XML data islands within text/html”:http://www.w3.org/TR/NOTE-VML.html . If the text/html serialization of HTML5 included SVG and MathML, as some people want, that difference would disappear. Note that the current draft includes “embedded MathML”:http://www.whatwg.org/specs/web-apps/current-work/#mathml and “embedded SVG”:http://www.whatwg.org/specs/web-apps/current-work/#svg .
Depending on your point of view, draconian error handling of XML is either a major benefit (‘Yay it’s stricter’) or a major cost (‘but now my site is broke’). If you think, it’s a major cost, you might be interested in the “XML5 project”:http://code.google.com/p/xml5/ .
text/html parsing will likely always be more complicated than XML parsing, although by the time HTML5 is finished it will hopefully be better specified. But this complication is something you don’t need to worry about if all you’re doing is adding class names and hidden values.
Formats serialized via HTML5’s existing extension points plus ‘content’/‘equivalent’ wouldn’t have to “go to committee” either.
My issue is not about whether all these new formats are “valid”; but whether they actually work for end-users. It seems to me they run a serious risk of making the accessibility problems with machine data in microformats look like small potatoes.
Rob Burns
I’m glad you asked. I mean to provide some examples in the earlier post and forgot. What I’m thinking of with distributed extensibility is various specialized disciplines providing extended semantics on top of HTML. For example a society of poets or musicians or astrophysicists might add new elements, new attributes and new attribute values to express the semantics particular to their respective specialties. In addition, the society of Einsteinian astrophysicists might develop a vocabulary of elements, attributes and attribute values that uses similar or identical names to the quantum physicists. Any of these vocabularies can make use of SVG, MathML, XLink, XInclude, XForms and other behaviors provided through other vocabularies.
The idea then is that we not only have a namespacing mechanism (which is very cumbersome when relying on class names), but that we have a rich vocabulary of abstracted behaviors for namespaced vocabularies to draw upon. That means any arbitrary vocabulary can include vector graphics, hyperlink activation, frames / split views, mathematics, embedded content, and user interface widgets. Moreover, with a better extensibility mechanism new extended abstracted behaviors could be added to the host language without spending years or decades waiting for a new HTML recommendation. None of this is possible with the HTML5 approach.
However, with a namespace mechanism (and one has already been implemented in IE for text/html), and abstracted and namedspaced behaviors, new semantics can be added to the host vocabulary (and the host serialization) while using ARIA and CSS to provide any accessibility and presentational mappings. Again, this cannot be accomplished by simply adding MathML and SVG to text/html as the only two anointed behaviors in HTML.
Certainly datagrid is editable, but I don’t think we should start adding separate elements for editable and read-only semantics for lists or anything else. We already have an attribute to change any element to an editable element. and if lists have a need to fine-tune that editing it would be better handled through an attribute (another excellent example of the point this article makes). Another example is OUTPUT which is borrowed from XForms and XHTML2. This is really a read-only version of the INPUT element that has a different presentation in its read only state. Is there really a need to add more elements to the vocabulary for that subtle distinction? I think not. A read-only attribute could work in that case (perhaps on other UI controls too). For example a METER and PROGRESS element are two more examples already included in the article where the UA provides a graphical presentation of a proportion or fraction. From a device independent point of view, these elements are the same thing. This functionality really belongs at the presentation level in CSS or other such specifications.
With regard to Eric Fields’ remarks in comment @50/51, we should really be working towards the goal where Eric does all of his front-end work in CSS (supplemented by SVG, bitmap graphic editing and an occasional XSL). There may be some need to reorder the backend elements though this could be done using XSLT as another front-end tool. Eric doesn’t want to take the time to understand the semantics of the back-end vocabulary and that’s fine. It is an excellent division of labor. However, its important to understand that using a DIV and giving it an ID does not substitute for the diverse device-independent, accessible and abstracted capabilities of the specialized elements and attributes. But as a front-end author, Eric shouldn’t even need to deal with those issues.
Rob Burns
I don’t really think that’s the fair way to put this. The major benefit of XML error-handling is that errors are discovered immediately (a relief) rather than after deployment (embarrassing). Also, any error-free element can be made a child of any other error-free element without creating any new errors (at least well-fomedness errors which is what this discussion is referring to).
Benjamin Hawkes-Lewis
“Rob Burns”:http://www.alistapart.com/comments/semanticsinhtml5?page=6#60 writes:
The merits of XML namespaces are debatable; I don’t have a strong opinion about them. Personally, I don’t see what’s so bad about:
swiss-society-of-astrophysics-and-astronomy:particle-type
if (say) the Swiss society really cannot agree on a common vocabulary with the American Astronomical Society, who naturally prefer
american-astronomical-society:wave-variety
- but I don’t want to debate the aesthetics, only the technical potential. The key point here is that you agree that it is possible to ‘namespace’ with class names alone, just as JS libraries manage with simple formulations like ‘YAHOO.util.Dom’ to ‘namespace’ variables and functionality, even if you feel it is “cumbersome”.
This seems fundamentally the same as Shelley Powers’s argument in favour of distributed extensibility using XML: you want to make use of functionality only available in XML languages. However, if you can make use of the same functionality (vector graphics, mathematics, sophisticated forms) in text/html, then that difference disappears – especially if when this functionality finally appears as native features in IE (the browser most people use) they are available in text/html.
Who is going to specify these “new extended abstracted behaviors”? Who is going to implement them? Who is going to ensure they safeguard security and accessibility? How are pages using them going to degrade gracefully in user agents that have not implemented them? Why should “a new HTML recommendation” take significantly longer to produce than specifications for “new extended abstracted behaviors”? In the meantime, why couldn’t these “new extended abstracted behaviors” be attached using the HTML5 text/html extensibility mechanisms plus ‘content’/‘equivalent’, rather than attached to XML elements?
Rob Burns
That is certainly not what I am saying. I’m not sure about Shelley Powers. I want to see the same (or similar) namespace extensibility mechanism brought to text/html. It is the WhatWG that opposes this.
Yes, I do agree that class names can express new semantics. However, why are we sitting around trying to justify the increasing cumbersome process (using class names and negotiating potential conflicts for authors who want to draw on two different vocabularies). We already have an XML namespace solution that IE has largely implemented for text/html. And as you say this it he browser used by the majority of users. So why aren’t we inviting the other browsers makers to implement XML namespaces in text/html and then authors can use them whichever serialization they choose to use. So what is gained by using class names instead of the much more elegant and much more flexible solution of namespaces (XML or otherwise).
Incidentally this again speaks to the predatory monopolistic practices I spoke about before. Why would anyone in their right mind be quibbling over serializations? Who cares whether it is XML or text/html? Well the reason these issues are up for quibbling is that some predatory monopolies want to make it difficult to develop these standard format which they do not own (which also partly explains why it takes decades by the HTML5 editor’s own estimation to develop an incrementally updated standard ).
The idea behind distributed extensibility is that any community of authors could implement a new vocabulary. Any author can then opt to join in that community and make use of that vocabulary and mix it with other vocabularies without any concern for conflicts. By improving text/html paring, we will generally not have the same graceful degradation problems we have today (where element’s will not even parse correctly). I’m not entering into a debate over which serialization an author should use. However, XML namespaces is now widely implemented (in every major browser except that IE implemented it for the HTML namespace only in the text/html serialization and all of the other browsers support the HTML namespace only in the XML serialization). We need to make it available in either (or any) serialization and allow author communities to make use of it. Authors and authoring communities would still have the option to use class names and other mechanisms, but my guess that given the choice and widespread interoperability, they would choose to use XML or XML-like namespaces.
As for the abstracted vocabularies, my sense is that most of what we need has already been provided by the W3C. We just need broader implementation of those recommendations. I think more could be done with CSS so that we reach the goal I suggested before where front end work is done almost entirely with CSS, SVG, and bitmap images and semantics are properly handled by the rest of the recommendations (HTML needs to be rounded out a bit for semantics too). So this means: 1) incrementally better CSS, 2) incrementally better HTML, and 3) better text/html parsing algorithms. With that much of these debates over serializations or extensibility mechanisms, etc. would all be moot (though we’d undoubtedly have something else to discuss).
Benjamin Hawkes-Lewis
I think you can solve most of the vocabulary-mixing problem without adding new features to HTML5 by using prefixes, just like JS libraries do, and sharing information about what class names you are using with the rest of the web community.
What I wanted to verify is that new features are not absolutely required for distributed extensibility of vocabulary, and from what you’re saying it seems they aren’t.
It’s perfectly reasonable to go on from that conclusion (as you do) to argue for XML-namespaces-in-HTML, but on the basis of trying to automate vocabulary isolation when vocabularies are mixed, rather than on the basis of actually enabling distributed extensibility of vocabulary.
I don’t have that much enthusiasm for that argument, partly because I’m not sure XML-namespaces-in-HTML are the simplest way to implement such ‘namespacing’ and partly because I think other issues are more urgent, like the omission of a generic machine-data attribute from HTML5.
Isn’t one gain that class names can be parsed by, styled, and scripted in all current popular user agents, whereas XML-namespaces-in-HTML can only be parsed by one very popular current user agent? (The article proposed HTML5 should add features using attributes rather than elements for precisely this sort of backwards compatibility reasoning.)
But if we really don’t need to add new behaviors to the XML behavior set, and implementing vector graphics, mathematical typesetting, and more powerful forms in text/html removes the functionality gap between text/html and the XML world, then the subsequent speed of adding “new extended abstracted behaviors “¦ to the host language” isn’t an important consideration when asking what we need to enable distributed extensibility – since all we really want to extend is vocabulary without changing interface.
John Allsopp
Firstly, thanks for the thoughtful, detailed responses, and apologies for being so slow to participate in the conversation. It’s incredibly gratifying after having put considerable effort into a piece to have such an intelligent and in depth conversation emerge from it. Aboveall the goal of the piece was not to prove the point I was making and, but rather start an important conversation that is not taking place and which I think should be.
Some responses to the intelligent various thoughts, observations and so on.
To Jeremy Keith – thanks for the JS workaround. I became aware of that toward the end of the process of putting the article together. I think the general position of my argument holds regardless - having to use JavaScript in this way is not a general solution to the problem.
Here to me is the key problem (and I clearly didn’t articulate this nearly well enough in the article, as it is the crux of my focus on the importance of backwards compatibility).
Technologies flourish when adopted by developers, and die when not. If you look at the “chasm” model of technology adoption, often technologies appear to take off like wildfire - among early adopters. Where technologies really struggle is with their adoption by mainstream users – and the way in which mainstream adopters decide whether to adopt a technology is very different from the early adopters – early adopters are experimenters, they like to try cool stuff, see what works, and so on. Mainstream adopters simply aren’t like that. They are far more pragmatic. The speed and even the extent to which a technology is taken up by by early adopters doesn’t correlate with it’s “crossing the chasm” to mainstream adoption.
With web technologies (and here CSS is a very interesting and relevant prior example) a key determinant of their adoption among mainstream, pragmatic developers is that they work ubiquitously. After all, even as of late 2008, over 25% of early adopter profile web developers stated clearly that “Pages should look as near to identical as possible across browsers” – despite a decade or more of advocating for adaptive designs. Based on the experience of the slow adoption of CSS in the 1990s – time and time and time again, among developers, educators, writers, you would hear the phrase “but CSS doesn’t work”. This in my opinion undoubtedly held back the uptake of even the CSS that worked very well by years – and given ongoing prevalence of the use the font element [1], coupled with conversations I’ve had with professional developers in the last year or two, to this day, this belief is not entirely eradicated, and continues to have its effect.
Now, given CSS had effectively no competition (a good deal of what CSS provided was not possible with presentational HTML), whereas HTML5 does (it’s called HTML/XHTML) – if there are perceived or actual backwards compatibility issues for even the most simple aspects of the language (new elements like section) – I’d predict the chances of its widespread adoption happening anytime soon is pretty much non-existent. You only have to look to XHTML2 for a very near parallel example.
That there’s a JavaScript workaround that extremely well informed and skilled web developers might be aware of is simply not going to address that issue. I foresee everyday web developers trying to use the simplest aspects of HTML5, such as using the section element (which introduces the problem of a semantic mismatch between the meaning of H1 in HTML5 and in older versions of HTML, for what it is worth), try styling it, see that it doesn’t work in IE7, and then simply abandon any attempt to get up to speed with HTML5, as, a la CSS, “it doesn’t work”.
Quite a few of you suggested that we have the solution – XHTML with DTDs, or XML. The problem is that these have been around for coming on a decade, and are little if at all adopted by mainstream web developers (in fact, one of the reasons I focussed so specifically on backwards compatibility in HTML5 is that the lack of compatibility for most of the last 10 years with most browsers in common use is probably the single most important reason for the failure of these solutions to take off. Well, that and their complexity, in comparison with good old HTML.)
But keep in mind that the focus of this article is HTML5, and I’ve taken it as a given that the momentum for HTML5 to be the next major iteration of HTML more or less guarantees that will be the case. If nothing else, if it doesn’t make it, we will have wasted years and an enormous amount of energy and resources and have still not addressed significant shortcomings in HTML. So, my concern is to address what I consider to be a serious shortcoming in HTML5’s approach to an important aspect of the language – how it supports semantic markup.
A number of folks took the fsck IE6 approach ;-), or argued that it’s rapidly diminishing in use.
My argument is that we simply can’t ignore IE6 and backwards compatibility more generally, because as we’ve seen with many other very good technologies (SVG for instance), they simply won’t be adopted by the majority of developers because of that lack of widespread compatibility.
If we are concerned about the adoption of HTML5, then we really need to ensure that there are as few impediments to its adoption as possible. If the meme of HTML5 is not compatible with IE6” (which will soon become simply “IE” then “HTML5 is not supported in any browsers” takes root, then it can take years (as CSS advocates will attest) for those memes to be eradicated.
Jeremy Jarratt puts it most strongly
While in many ways a very attractive idea, I think XML is an object lesson in exactly why this is hard if not impossible. XML was in many ways the way to start afresh. We’ve seen how well that’s worked out (at least when it comes to the web).
If we do wish we had “a continuous supply of new elements, where such things make sense”, then the current HTML5 proposal doesn’t provide that at all.
Regarding RDFa, it definitely should have got a mention. As I developed these ideas over the last couple of years, and the article (which has been in gestation for just about 12 months now), RDFa was coming together. I see the proposal that I’ve put together as being able to work in conjunction with RDFa, but more akin to providing a better framework within which existing widespread developer practices of using HTML class and id attributes to add pseudo-semantics to their markup, exemplified by microformats.
But RDFa is a quite radical departure from this existing common semantic practice. As such there’s no great guarantee that it will catch on, and in many cases, will be overkill for the purposes that most developers markup their content “semantically”.
data attribute, content attribute
A number of folks raised the HTML5 data attribute – but this is simply a bucket for applications to store their own data in. It’s expressly not for generalized uses –
“User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values”
Thanks for the reference to the content attribute Mark Birbeck. As with role, I don’t see why it wouldn’t make sense to adopt that as these existing attributes from XHTML2.
As to why class, meta, id and rel alone don’t suffice – there’s a number of arguments.
Firstly, I’ve tried to make the case that we’ve simply pushed these rudimentary semantic extensibility features of HTML past breaking point – the BBC microformats saga is pretty strong evidence of this.
Class is simply a bucket for strings which can be used for “general processing” – which can mean just about anything. id is essentially a bucket for a page level GUID – any semantics we layer on top of these attributes is really by tenuous convention. In short, they haven’t worked. In fact they weren’t really designed for semantics in the way they are commonly used now at all.
A number of respondents asked (at times as advocatus diaboli) whether the proposal wasn’t a “solution in search of a problem?”. I always think that’s a very good question to ask.
I think the fact that so many developers are using class and id as a mechanism for adding psuedo semantics to their documents that you’d really have to call it a standard practice among professional web developers, and the the considerable success of microformats despite the technical limitations of HTML, and even the addition of new semantic elements to HTML5! are indicators of the need for the semantics of HTML to be further enriched.
Craig Sharkey raised the issue of semantic libraries analogous to JS libraries – and it’s a good question as to why they haven’t really occurred to date (you could argue that this is to some extent what microformats are). I’d argue that the lack of any real mechanism for creating such libraries, other than simply using class and id, is one reason why we’ve yet to see the widespread development of such things.
Thanks again for the excellent conversation, and I do hope that it might lead to the reconsideration of aspects of HTML5.
[1] http://dev.opera.com/articles/view/mama-key-findings/
Benjamin Hawkes-Lewis
There are two arguments here.
1. The BBC dropping microformats is evidence that existing extension mechanisms are insufficient. This is true, but “the BBC were very clear that the only reason to drop microformats was their use of the ‘title’ attribute for human unfriendly data”:http://www.bbc.co.uk/blogs/radiolabs/2008/06/removing_microformats_from_bbc.shtml ; a problem solvable with a ‘content’/‘equivalent’ attribute. Additional attributes for different semantic modes doesn’t help towards solving this problem.
2. ‘class’ can be used for things other than semantic labeling. This is true, but this isn’t evidence that it doesn’t work for semantic labeling. That’s like saying JS can be used for form validation and therefore doesn’t work for dropdown menus. I’d say that microformats are actually strong evidence that ‘class’ works reasonably well for semantic labeling. I guess the underlying argument here is actually that a multipurpose attribute increases the chance of naming collisions? But introducing further attributes for different semantic modes would only reduce the chance of naming collisions, and they wouldn’t do so more than defensive class naming practice (e.g. ‘rhetoric-irony’ rather than ‘irony’). To actually prevent naming collisions you need a system like XML namespaces or a central registry of names.
Richard Cotton
Part of the core of the problem is that we have become so enured to hacks to fix things, that it has become almost legitimate to hijack elements of markup to do arbitrary things.
The BBC decision lays bare a clear example; its a collision between two (ab)uses of the same attribute; neither of which is actually the intended use.
This is why I consider HTML5 to be a mistake; xhtml was a step forward. It has issues; why aren’t we fixing them instead of taking 2 steps backward?
Stephen Down
Firefox has been around and highly publicised for years – and can be run on older versions of Windows. Anyone who is using IE6 on their own computer, and has not chosen to install Firefox (or upgrade to IE7, if on XP), is unlikely to install Chrome.
Stephen Down
Pants, that messed up the formatting a bit. Let’s try again.
I very much doubt it. A high proportion of IE6 users have stuck with it because they can’t upgrade to IE7 – because they are on a corporate network and/or are using an older version of Windows. As Chrome has the same system requirements as IE7, the majority of people using IE6 will be unable to install it.
Firefox has been around and highly publicised for years — and can be run on older versions of Windows. Anyone who is using IE6 on their own computer, and has not chosen to install Firefox (or upgrade to IE7, if on XP), is unlikely to install Chrome.
Shelley Powers
@65
“My argument is that we simply can’t ignore IE6 and backwards compatibility more generally, because as we’ve seen with many other very good technologies (SVG for instance), they simply won’t be adopted by the majority of developers because of that lack of widespread compatibility.”
I don’t know how to say this politely, but this is pure bunk. Google has already taken the first step to eradicate IE6, and if others would be as equally brave, we might actually finally get rid of this albatross.
Perhaps instead of a new specification, we need new attitudes, at least in the web development/design community. Where are the risk takers? The people who used to push and actively promote the best, rather than tenderly support the absolute worst of the web?
All I can say is bunk. If we all, actively worked to finally put this old piece of “refuse” to its long, overdue sleep, we could eliminate it as a problem in less than a year.
Instead, we play it safe. We tippy toe. We clasp our hands to our breasts and wash and re-wash our fingers over and over again, in a tizzy of anxiety, as we murmur, in mortified terms, “Oh, we can’t ignore IE6.”
Yes, we can. Maybe change needs to find a home in places other than just politics.
Now is exactly when we can force this absolutely essential change. Corporations have other concerns than browser usage, and the people you’re worried about using IE6 are being laid off.
IE7 not supported in older Windows versions? Well, guess what—Firefox and Opera work on older operating systems. Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.
Best of all from a change perspective, minimalist design is very hip right now. So, let’s provide a minimalist design for the IE users, and use the nifty CSS3 tricks and SVG for the rest. Then the few IE6 corporate users still employed will still be able to access your site. And, if they want to get the best effect, they can access it, again, when they get home, where they’re using a decent browser.
But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.
Shelley Powers
@65
“My argument is that we simply can’t ignore IE6 and backwards compatibility more generally, because as we’ve seen with many other very good technologies (SVG for instance), they simply won’t be adopted by the majority of developers because of that lack of widespread compatibility.”
I don’t know how to say this politely, but this is pure bunk. Google has already taken the first step to eradicate IE6, and if others would be as equally brave, we might actually finally get rid of this albatross.
Perhaps instead of a new specification, we need new attitudes, at least in the web development/design community. Where are the risk takers? The people who used to push and actively promote the best, rather than tenderly support the absolute worst of the web?
All I can say is bunk. If we all, actively worked to finally put this old piece of “refuse” to its long, overdue sleep, we could eliminate it as a problem in less than a year.
Instead, we play it safe. We tippy toe. We clasp our hands to our breasts and wash and re-wash our fingers over and over again, in a tizzy of anxiety, as we murmur, in mortified terms, “Oh, we can’t ignore IE6.”
Yes, we can. Maybe change needs to find a home in places other than just politics.
Now is exactly when we can force this absolutely essential change. Corporations have other concerns than browser usage, and the people you’re worried about using IE6 are being laid off.
IE7 not supported in older Windows versions? Well, guess what—Firefox and Opera work on older operating systems. Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.
Best of all from a change perspective, minimalist design is very hip right now. So, let’s provide a minimalist design for the IE users, and use the nifty CSS3 tricks and SVG for the rest. Then the few IE6 corporate users still employed will still be able to access your site. And, if they want to get the best effect, they can access it, again, when they get home, where they’re using a decent browser.
But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.
Benjamin Hawkes-Lewis
Or maybe we need some hardheaded cost/benefit analysis to establish the “best”, by thinking about the effects of our technical decisions on users, customers, and friends.
It’s one thing to adopt practices that significantly improve the user experience of some users without significantly impairing the user experience of any large group of users.
It’s another thing to adopt practices that significantly impair the user experience of a large group of users, especially if there are other ways to achieve benefits for the smaller group of users.
For example, if developers can encode roughly the same semantics as ‘header’, ‘aside’, ‘footer’, and ‘section’ with ARIA attributes that might improve the experience for the same small group of users without breaking their layout in the most widely used browsers with JS disabled, then maybe it’s rational to use the ARIA attributes instead of the new elements?
So what if we have to write ‘<div class=“header” aria-landmark=“banner”>’ instead of ‘<header>’? Is the later better language design? Absolutely. Is it worth the cost to end-users? Not necessarily. On its own, is it worth hassling Granny to switch to a “happy” browser? Perhaps not.
Zachary Forrest
I feel that the test of a good article is the extent to which it makes me think. Your article, sir, has me thinking in spades.
I work for a sizable company, doing front-end web development, so the topic of HTML5, and all the subsequent topics of backwards-compatibility and forward thinking has me pondering what could be on a day-to-day basis. The advances in JavaScript engines lately has me on the edge of my seat, as well as advances in CSS specification implementation in the major browsers.
Here’s my thought. I feel that the current HTML spec has us backed in a corner. And until we get ourselves out of it, there really isn’t any hope for us. HTML5, while I applaud the addition of new elements, such additions won’t be enough in the end. Backwards compatibility is great, but in my personal opinion, takes a backseat to future possibilities.
Please understand, this idea is minutes old, but what’s preventing the W3C and browser vendors from implementing an extensible architecture as opposed to mere elements? Elements we have, but the ability to tailor them with custom attributes should be the future. For example, would it make more sense to be able to define custom attributes like “structure” and be able to add it to the div element (or any element for that matter), with the custom definition containing the information necessary to tell the browser how to interpret it?
Elements should be objects and attributes should be extended objects, to put the idea in perspective. Yes, there will be an initial hump while older browsers died off, but such a solution would eventually pay for itself. No matter what custom attribute we defined, we also enclose the means necessary for the browser to handle the custom definition. HTML would be like any other object-oriented language. Extensible and scalable. We would load attribute definitions like we load JavaScript. Style with CSS would be as simple as your article suggested.
It’s just an idea and yes, it completely ignores the backwards-compatibility facet of the conversation. But we live in a dynamic world and this seems like the only out from a future circumstance where we’ll still talking about this very same subject.
Aaron Miller
I’m glad someone is really talking about these new tags. Personally, I was stoked to see <nav> and <section> and <header> and <footer> being added. I’m glad you mentioned DocBook because it solved a problem for print publications that I think HTML needs to solve for digital books, but unfortunately, I don’t think trying to borrow from DocBook is the right approach. I do think your attribute solution is thoughtful and has a lot of merit. For now, the tags make sense to me. In your example of the problem of applying a font color to a section via CSS rule, more complex selectors could be used to apply that style to sub elements of the <section> element. Since <section> is structural, it seems more aimed at the non-human consumers of the page, such as e-book reading systems which otherwise have no way of telling what a section of a book is (the div tag was supposed to do it, but it’s now used for layout). These new tags should not be styled, although they can be part of a CSS selector, as ‘hooks’ in more selective selector strings. The nav is another good one, especially in the case of books, because in many cases a tertiary web app may want to access only the text of the book (or doc, or whatever is at the heart of the web page), and skip any ads, extraneous navigational elements, or other structural scaffolding unnecessary and problematic for rendering the book itself. There are plenty of cases outside of the e-book application scenario—I use that because it’s the area I primarily struggle with. Backwards compatibility is really important, and Ian Hickson is one of the strongest proponents I’ve seen on the WHAT-WG list on keeping things backwards compatible. That said, I do agree we can’t keep inventing new tags through the long involved processes of the W3C and WHAT WG every time the Web evolves on its own.
Linus Rachlis
This was a thoughtful article and I really agree with its stance. Since John opens by taking such a long view on the web, and since I don’t think most people think about that enough, I thought this TED Talk would be interesting for anyone reading:
http://www.ted.com/index.php/talks/kevin_kelly_on_the_next_5_000_days_of_the_web.html
Chris Hester
Imagine if HTML had been invented in Shakespeare’s day. Would we still be using tags like < shoppe type=“ye oldde” >? Clearly our existing language may change in hundreds of years (or less if texting has anything to do with it.) So there may be a need to update the words used for tags, even if just to improve them. (Personally I have always found < blockquote > ridiculously long, especially when there is < q >. Why not just have < quote > and use an attribute?)
Also why should HTML be written in English? Why not have African or Arabic tag names? Perhaps localised versions can be created?
I say keep improving the existing tags and attributes and of course add new things as the web moves forward. Backwards compatibility is simply a matter of browsers converting any changed tags. Anything new, well hey, one day tables were new and browsers had to cope back then. And CSS! Did we refuse to use it because of pre-CSS browsers? No, we all moved forward by downloading new versions of Netscape. This is the way it has been and should be in the future.
Having said that Zachary’s idea above is good. How about tags that didn’t refer to something specific, but the user then applied the relevant attribute? Eg:
< box use=“sidebar” >
< list use=“menu” >
< box use=“header” >
< text use=“paragraph” >
< text use=“quote” >
< text use=“email” >
It might make reading documents harder, but there’d be no need to battle over the names of the tags. A standard set would suffice for everything. All the browser needs to know is if the tags are block or inline, floated or not and so on. The stylesheet would provide that.
no bighair
Surely the guys at W3CHTML5WG already have a vision for adding semantics to HTML, don’t they? And it can’t just be adding ad-hock elements every few years, can it? There are already too many solutions in use out there – microformats, rdfa, embedded rdf, xhtml etc – and clearly few people share a single vision for how this is all going to pan out. But more worryingly, we don’t know what the W3C’s vision is, so we have to make up solutions to prompt them into action. I’m looking forward to the day when they get their act together so we can just build really cool things. But I’m not holding my breath.
Stephen Down
How so? I can still use all the aspects of Google that I need to with IE6, including search, maps and email.
bruce lawson
Shelley said, “Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.”
I was a corporate web developer until June last year,and I too hate IE 6. But to blame the econominc downturn and corporate bankruptcies on it feels a little over-exaggerated.
Now I work for Opera, so have every reason to diss Microsoft, but it’s wrong to ignore IE 6. It’s temptimng to take the “f**k IE 6” approach, but lots of companies have windows 2000 machines as it’s supported until 2010. Will they upgrade those machines to windows XP or Vista now, in a credit crunch, just to look at sexy web sites that don’t support IE 6?
Lots of people in the developing world use older machines out of economic necessity. Sure, they could install Linux and then Opera or Firefox, but are we really back to the era of requiring users install operating systems and certain browsers for the privilege of viewing our super-special sites?
Chris Hester
I was shocked yesterday to find a major university in England has IE6 on its default drive image that all PCs have to have. So that’s hundreds of machines all stuck on an old browser. I feel this may be typical for IT at other campuses too as they are never bang up to date due to security concerns of upgrading. But I thought they’d at least have IE7 on there.
Rob Burns
I have to say the difference between IE6 and IE7 seem so miniscule (especially when one considers the 7 years spent in development) that this sub-thread about how horrible IE6 is looks like a marketing ploy for Internet Explorer and Windows sales :-). Probably just the conspiracy theorist in me, but its worth pointing that out for anyone feeling they need to upgrade.
carla san gaspar
Impressive.. very well said… i really love the idea…
Ben Rowe
I see html5 as a opportunity to help encourage people to update their browsers. While I agree with the article (semantics & all), and despite the shortcomings of html5, this is a browser-marketers dream to encourage people to upgrade to a browser that supports the full html5 spec. Of course it will take the completion of ie8 (if they fully support the current html5 spec). However what better excuse could you think of to give your visitors incentive to upgrade their browsers then something like “This website utilizes ‘marketing term’ technology. To use the site to its full potential please upgrade your browser”? The whole industry could push this new ‘marketing term’, making it the next Web2.0 if you will…
Of course the only downside of this is that there isn’t much substance to this, from a user’s point of view (the canvas tag is one of the few tags that will give users an actual reason to upgrade). As developers we get html5, css3, etc. The user gets not a lot. This strategy needs a lot of refinement, but it’s certainly something that could work.
It will be a bumpy road, but we need to balls up as an industry and not take the chicken shit way out all the time.
Ben Rowe
I think we’re missing the point of html5, and what it could achieve…
darren alawi
I think we should be focusing on getting browsers to work more consistently and getting rid of old browsers like ie6 that have no place in today’s world. Tech moves fast and yet ie6 lingers on. No matter how you try and make things backwards compatible you will always be limited by decaying technology, there is only so far you can go before you have to stop and address the existence of obstacles like old browsers.
Ignoring them and creating new languages is great but don’t expect not to run into the same problems a few years later.
austin cheney
I understand your concerns of semantic limitations. I have already solved this problem in the language I created, mail markup language. You can download the schema in order to play with it or read the specification for documentation. I solve the problem through the use of the “role” attribute which is compatible with XHTML and HTML 5. Since my language is inherently XML RDF and OWL are expected to use the role attribute for semantic processing.
Find everything and more about mail markup language at http://mailmarkup.org/
Rob Burns
Since the topic keeps getting raised, I have to ask what are the significant differences in standard support between IE6 and IE7. As far as I can tell they are minimal to non-existent. IE8 promises better support such as CSS :before and :after pseudo element support and the associated generated content properties. However, the big expectation for standard support in IE7, after 7 years of development, was that it would add XHTML support and CSS generated content support. Neither materialized. Nor did other features such as SVG or complete Ruby support. So what are the major problems with IE6 compared to IE7 in terms of standard support that posters keep referring to.
Chris Hester
The campus I was referring to have sent round an upgrade to IE7! Now everyone is complaining about the toolbar. (One guy thought the Refresh button had disappeared completely.) Still, at least they now have tabs – one big difference between IE6 and IE7. And better CSS and HTML support. And a heap of bug fixes. And you can zoom in graphics not just text (which also cures the long-standing fonts-set-in-pixels-can’t-be-resized problem). So there’s quite a lot of improvements if you ask me.
Rob Burns
This is the part that my question was about. The claims about IE6 being horrible have mostly related to standards support. Yet with all the things I expected to arrive in IE7 (after many years in development) I can’t really think of many things that IE7 improved. On the other hand, the IE8 beta does offer some CSS and HTML improvements, but what does IE7 offer over IE6 in this area?
Les Kobayashi
Sorry in advance for going slightly off topic…
@71
“But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.”
I don’t know how to put this politely either, but that’s just arrogant. And Elitist. Most designers are getting paid by clients who have a very real bottom line in TODAY’S reality where IE6 still represents 20% – 25% of the mainstream market. Front line web developers / designers need to deal with IE6. What they don’t need is to take the blame for it’s over-extended shelf life. That’s like blaming the road designer because your aunt’s old K-car still gets her to Walmart every Saturday.
Yes, of course there needs to be continued progression and risk taking. And there is. And, IE6 really will die a quiet little death one day. In the meantime, it’s not nearly the catastrophic issue some make it out to be.
@89
IE6 has multiple display issues with CSS borders, margins, floats, png transparency and more—most of those display issues were corrected in IE7. There’s this thing called google where you can dig up all kinds of clarification ;)
Chris Hester
There were a lot of bug fixes and improvements made to IE with version 7. Even simple stuff like adding <abbr> was welcome. (I personally don’t think generated content, while useful, is an essential addition.)
The problem with IE7 is that it also introduced new bugs. And there were still plenty of unfixed bugs.
Anyone wanting to know more about the true horror of IE bugs might wish to peruse the following sites:
“Position Is Everything”:http://www.positioniseverything.net/
“Browser Bugs Section”:http://www.gtalbot.org/BrowserBugsSection/
Matteo Cajani
I really appreciate your idea about attributes. I don’t think that we need more than one html attribute: “semantic”. Then anyone can define all the semantic classes he needs. Of course we need to define the properties such as “rhetoric”, “structure”, etc… and their values as we have “background-color”, “font-family”, etc… in CSS
Here an example of HTML and CSE (Cascading SEmantic sheet):
HTML:
….
An elderly lady
phoned…
….
CSE:
….
joke_of_the_day{
rhetoric:ironic;
structure:aside;
}
….
Cascading mechanism may also solve the problem of nestled semantic annotations.
What do you think about it?
Regards,
Matteo (matteo.cajani@alice.it)
Rob Mech
Is the browser or the structure (HTML) here really the issue. It would seem that the major hangup for new tags and features in HTML is the backward compatibility.
What blows my mind is that we as a community continue to perpetuate the problem. Get off HTML and develop something new. Maintain a legacy object capable of rendering HTML but move something to a new open standard. Then, make that easy to upgrade.
Look at “Flash”. When something new comes out what do people do. Upgrade. What do websites say. “Upgrade to the latest”.
People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress. We use HTML and CSS as a container for the wiz bang we do with flash and javascript. We want more than text from the browser, so lets finally do what it takes to get there.
We need to develop an open standard that allows for upgrades and force everyone to abide by the upgrade path. If you don’t, then you cant expect to get serviced.
Just imagine where you OS would be if we still had to support 8 bit executables!
Remember this before you respond with the “What about the other devices”. Ok, lets talk about that WAP, Mobile CSS, etc. Your argument is what? That they read the standard HTML you code? They don’t. The only standard is that there is not one. We need one and HTML sure is not it.
Sure, there will be pain, but once the pain is gone you’ll be much happier in a world where you can do more than just place a few lines of text in a document.
Charlie Clark
I found the article encouraged me to think more about HTML5 than I have so far. And while I sympathise with the author, I’m in agreement with those commenters that this is a solution in search of a problem.
It is perhaps a little ironic that the article is a critique of HTML5’s inflexibility when, at least as far as I know, HTML5 was proposed as a pragmatic solution to the seemingly intractable solution of “what comes after HTML4”. This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss. This has important consequences primarily for those developing browsers so that HTML5 support is both robust and fast. Although this means that the semantics must be frozen, it in no way constrains attributes from being extensible. The difference is, however, one of scope: extensible attributes will have an application (website) specific purpose whereas the specified tags will always have the same purpose.
The supplied use cases illustrate this misunderstanding:
Neither of the tags is semantically satisfying particularly when the <date> tag is available.
1) Ideally the datetime value should be the content of the tag and not an attribute of it. Confusing the two is a common mistake particularly in XML. Of course, being too restrictive here would prevent anyone from incorrectly using the tag and provoking exactly the kind of errors that HTML5 tries to avoid.
2) How the date is displayed is a matter of presentation and, therefore, something that may be controlled by meta-data: a format attribute or CSS declaration or browser option. This avoids all the problems of localisation like when is 12/1/2009? In the example supplied format=“holiday” or format=“short”
Making this definition part of the specification allows browsers to handle content intelligently – offer to add the date to a user’s calender or do a search on the date- in a way the suggested “equivalent” simply could not.
In the same vein we have the new <video> and <audio> tags to handle the now well-established practice of including audio and video in websites.
Extensibilty here is not the solution; it simply shifts the problem to the namespace.
Rob Burns
In comment #94, Charlie Clark says:
This comment reflects a confusion repeated in this discussion that ascribes magical properties to either the XML serialization or the traditional text/html serialization of HTML (I can’t tell which one has bee ascribed magical powers, but neither serialization has any).
HTML5 is basically three things. 1) it is a parsing and serialization specification that attempts to codify the parsing performed by the major browsers with respect to traditional HTML serializations (and perhaps incrementally improve or at least select the best traits of existing browser parsing operations); 2) A specification of browser (and some other UA) behavior with the results a parsed (however parsed) or DOM created document; 3) a specification of a vocabulary of elements and attributes for authoring documents that are ostensibly semantic in nature.
The topic of discussion here is about #3. It is not about the parsing and serialization of traditional text/html serializations. HTML5 (in the #3 sense) can arise from the traditional serialization, as a solely DOM creation or from an XML serialization (which despite what we are told is not so drastically different with respect to the topic at hand). So I’m not sure why the use XML or use XHTML line keeps arising in this conversation. It has nothing to do with the topic of the conversation.
Also this is not about confusing whether data belongs in an attribute or in the contents of an element. There is not one right way to do this. The point of RDFa is that one can easily add properly parsed attributes to existing HTML elements (or any SGML or XML or otherwise elements) that add machine readable metadata about the natural language expression as the contents of the element. That means the presentation can be left as is (with the contents of the element appearing) or the UA could replace or augment that presentation with a localizable expression for the date. And whereas the HTML5 attempt to copy RDFa introduces a single purpose date element, RDFa provides a way to add precise machine readable equivalents to an element for any imaginable data type that can be expressed as the contents of an attribute (including anyURI values).
Chris Hester
Rob Mech above wrote:
I’ve been thinking for years that Flash should be the way forward. It makes perfect sense. A massive market penetration of users, smooth font rendering that can use any font, all the vector goodness you could want, Photoshop-style filters, what’s not to love?
And I don’t mean full-blown Flash sites with rotating objects. I mean a plain renderer that improves on what the poor browser has to render. And best of all…
Identical cross-platform rendering!
Think about it. No more broken layouts due to so many different browsers to test in. No more holding back on things that only work in 1 or 2 browsers. No more having to code to the bare minimum because of ancient browsers still in use today.
Unless something like this happens we will be stuck in browser hell forever.
John Allsopp
For those Russian speakers out there, there’s a “russian version”:http://habrahabr.ru/blogs/webdev/49734/
With quite a thriving discussion from what I can tell as well – not that I speak a word.
Thanks to the translator!
preved medved
http://d-o-b.ru/test/x-html/xhtml-dtd.htm
http://validator.w3.org/check?uri=http://d-o-b.ru/test/x-html/xhtml-dtd.htm;ss=1
http://browsershots.org/http://d-o-b.ru/test/x-html/xhtml-dtd.htm
preved medved
> custom DTDs (AFAIK) run in quirks mode
no. afaik.
> Identical cross-platform rendering! (flash)
no. afaik. identical only on windows platforms. he have many bugs on linuxes…
Artem Ploujnikov
We already have a lot of tools that can provide extensible semantics. The whole point of XHTML (even version 1.0) was to allow users to introduce custom namespaces into xhtml documents and then handle them programmatically using scripts, plug-ins or XSLT. Unfortunately, these technologies are now being bullied out of existence by the simplicity-oriented majority who won’t touch them with a ten-foot pole…
This is a recurring issue in programming. It appears that, for most people, the introduction of a new concept creates a nearly insurmountable psychological barrier. They say things like “I can’t do it. I can’t understand it. It’s complicated. It’s too abstract”. To me, and a few others (like those who invented object-oriented programming and (XML/XHTML/XSLT/XPath/XQuery), ease of use means fewer clicks and less hand-coding, even if it introduces more concepts I have to learn. To “normal” people, ease of use IS simplicity. New concepts lead to an immediate mental block.
Just take a look at the whole motivation of behind the move to HTML5. The folks who invented the “html serialization” don’t want doctypes, schemas or even a version number. What they’re trying to do is to make “tag soup handling” part of the spec. Also, they explicitly objected to the use of namespaces.
The original spec even said
“Generally speaking, authors are discouraged from trying to use XML on the Web, because XML has much stricter syntax rules than the “HTML5”? variant described above, and is relatively newer and therefore less mature”
Fortunately, the W3C didn’t take a stand on the issue, and they removed the above paragraph from their version of the spec.
We have other strict technologies with extensible semantics in mainstream use, which have a strict, clearly defined syntax (e.g. C#). Very few people ever complained about the whole app not compiling if you miss a semicolon because they have tools that pick up the semicolons for them. I grew up on BASIC, Pascal and Delphi, and I also learned C/C++, so I don’t take error messages personally. But as long as relatively people who don’t know how to escape an ampersand continue hand-coding their HTML in notepad, fault tolerance will remain a requirement, even though after accumulating a certain critical number of errors, it makes tracing nearly impossible, and validators become completely useful because there’s an “error” on every line, but the page displayed fine just an hour ago. For instance, in one of the apps I developed at work, I put up an XSLT post-processor with a syntax checker, and a lot of the other developers would swear at it because they had no clue as to how they can make their code output well-formed XML.
I believe the Web community needs to split. Just like the desktop application world has Visual Basic for simplicity-oriented developers and C# for those who like structure, we need to have two separate stacks, one for tag soup hand-coders who think namespaces are evil, and another one who prefer a stricter syntax. The split can be made on top of the XHTML5 (the XML version of HTML5), with namespaced elements used to define semantics. Those who prefer the tag-soup-friendly version also happen to be the ones who adhere to the KISS principle and, therefore, don’t need extensible semantics anyway. They would rather copy&paste; a block of html ten times over than introduce a new concept for a frequently used element. It doesn’t matter if the majority does tag soup as long as there’s enough community support for the “complex and extensible” version to keep it alive.
Aaron Miller
I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.” It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.” The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street. No one should ever slight reading the specs and knowing the proper way to code either HTML or XML, but when it comes to parsing the Web, see the recent Opera study, with MAMA, or the Google study that Ian Hickson did, and it’s clear that less than 10% of what we deal with out there is XHTML, much less proper XHTML.
Rob Burns
@Aaron Miller
bq. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.”? It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.”? The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street.
Tag soup gets used in two different ways which you’re confusing in this comment. 1) tag soup sometimes refers to the serialized source content of a document where tags are potentially misnested, content models invented out of thin air and attribute values requiring quotations not quoted: in general vended content not conforming to any specification anywhere. 2) tag soup parsing refers to a parser that is capable of parsing tag soup (a Herculean task).
When you say that XHTML parsed as text/html will be tag soup you’re confusing these two definitions. The XHTML is certainly not tag soup as it adheres to the XHTML syntax and sometimes even other syntactic requirements on top of that (such as XHTML 1.0 appendix C), so there is no sense in which that content can be considered tag soup. However, if such an XHTML document is vended as text/html it will be parsed by the UA’s tag soup parsing just like any other conforming or non-conforming HTML 2-4.0.1. So in this sense both: not using XHTML; and not vending as applicaiton/xhtml+xml means that the content is parsed by the tag soup parsing processor (just like any other HTML).
It’s important to keep these two meanings of tag soup separate to understand the conversation.
Aaron Miller
@Rob, looks like we’re talking about the same distinction. The only difference is that I’m saying the phrase “tag soup” makes it sound anomalous, when in fact from the content side it refers to over 95% of the web, and from the browser (parser) side, it’s SOP. See the Opera MAMA study and Ian Hickson’s Google report if you don’t know what I mean.
Russ Michell
This is not directly related to John’s article, but it reminded me of the following problem I’ve been posing in my head for some time: How is that the syntax of HTML or any other machine-readable grammar is constructed using English? More specifically US English? Has anyone ever tried to construct a language, of even a very light grammar, that allows multi-[Human]languages to describe headers, footers, loops, lists etc? I appreciate that many of these machine-language were first composed in the US and thus US-English has become the Lingua-Franca of programming – but this is the 21st century and not everyone on the Planet who wishes to write code knows how to speak/write English never mind to a specific sub-grammar of it.
Montmorency
Seems that the first one wasn’t perfect.
Here it is: http://interpretor.ru/html5semantics
Tobias Otte
“German translation available”:http://tobias-otte.de/essays/semantik-in-html-5/
blackdog
I agree with all the principal points in the article, i have the same opinion about a preferrable use of attributes; but i think they’re breaking compatibility on purpose, we all agree it is stupid to still have concerns about IE6 in 2009 (and soon ‘10). If they break the cordon everybody will be happier. And in fact the big push on HTML5 came from browser makers, and looks to me MS wants to be in the game.
In the aftermath we will all have a common base to discuss upon.
Afterall i think some new tags would come in handy.
For eg i think that for something as ubiquitous as a calendar, there should exist a tag, it would end the debate wich solution is more semantic (table vs list, that oddly relies on the kind of visualization we want to give), it would spare a lot of code and give more artistic freedom to designers that could target a parameter/class with a simple javascript to radically modify the visualization.
Tchalvakspam
http://wiki.whatwg.org/wiki/FAQ#HTML5_should_support_a_way_for_anyone_to_invent_new_elements.21
Contains some of their responses to the extensibility problem.