Semantics in HTML 5

by John Allsopp

107 Reader Comments

Back to the Article
  1. @65

    “My argument is that we simply can’t ignore IE6 and backwards compatibility more generally, because as we’ve seen with many other very good technologies (SVG for instance), they simply won’t be adopted by the majority of developers because of that lack of widespread compatibility.”

    I don’t know how to say this politely, but this is pure bunk. Google has already taken the first step to eradicate IE6, and if others would be as equally brave, we might actually finally get rid of this albatross.

    Perhaps instead of a new specification, we need new attitudes, at least in the web development/design community. Where are the risk takers? The people who used to push and actively promote the best, rather than tenderly support the absolute worst of the web?

    All I can say is bunk. If we all, actively worked to finally put this old piece of “refuse” to its long, overdue sleep, we could eliminate it as a problem in less than a year.

    Instead, we play it safe. We tippy toe. We clasp our hands to our breasts and wash and re-wash our fingers over and over again, in a tizzy of anxiety, as we murmur, in mortified terms, “Oh, we can’t ignore IE6.”

    Yes, we can. Maybe change needs to find a home in places other than just politics.

    Now is exactly when we can force this absolutely essential change. Corporations have other concerns than browser usage, and the people you’re worried about using IE6 are being laid off.

    IE7 not supported in older Windows versions? Well, guess what—Firefox and Opera work on older operating systems. Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.

    Best of all from a change perspective, minimalist design is very hip right now. So, let’s provide a minimalist design for the IE users, and use the nifty CSS3 tricks and SVG for the rest. Then the few IE6 corporate users still employed will still be able to access your site. And, if they want to get the best effect, they can access it, again, when they get home, where they’re using a decent browser.

    But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.

    Copy & paste the code below to embed this comment.
  2. Perhaps instead of a new specification, we need new attitudes, at least in the web development/design community. Where are the risk takers? The people who used to push and actively promote the best, rather than tenderly support the absolute worst of the web?

    Or maybe we need some hardheaded cost/benefit analysis to establish the “best”, by thinking about the effects of our technical decisions on users, customers, and friends.

    It’s one thing to adopt practices that significantly improve the user experience of some users without significantly impairing the user experience of any large group of users.

    It’s another thing to adopt practices that significantly impair the user experience of a large group of users, especially if there are other ways to achieve benefits for the smaller group of users.

    For example, if developers can encode roughly the same semantics as ‘header’, ‘aside’, ‘footer’, and ‘section’ with ARIA attributes that might improve the experience for the same small group of users without breaking their layout in the most widely used browsers with JS disabled, then maybe it’s rational to use the ARIA attributes instead of the new elements?

    So what if we have to write ‘<div class=“header” aria-landmark=“banner”>’ instead of ‘<header>’? Is the later better language design? Absolutely. Is it worth the cost to end-users? Not necessarily. On its own, is it worth hassling Granny to switch to a “happy” browser? Perhaps not.

    Copy & paste the code below to embed this comment.
  3. I feel that the test of a good article is the extent to which it makes me think. Your article, sir, has me thinking in spades.

    I work for a sizable company, doing front-end web development, so the topic of HTML5, and all the subsequent topics of backwards-compatibility and forward thinking has me pondering what could be on a day-to-day basis. The advances in JavaScript engines lately has me on the edge of my seat, as well as advances in CSS specification implementation in the major browsers.

    Here’s my thought. I feel that the current HTML spec has us backed in a corner. And until we get ourselves out of it, there really isn’t any hope for us. HTML5, while I applaud the addition of new elements, such additions won’t be enough in the end. Backwards compatibility is great, but in my personal opinion, takes a backseat to future possibilities.

    Please understand, this idea is minutes old, but what’s preventing the W3C and browser vendors from implementing an extensible architecture as opposed to mere elements? Elements we have, but the ability to tailor them with custom attributes should be the future. For example, would it make more sense to be able to define custom attributes like “structure” and be able to add it to the div element (or any element for that matter), with the custom definition containing the information necessary to tell the browser how to interpret it?

    Elements should be objects and attributes should be extended objects, to put the idea in perspective. Yes, there will be an initial hump while older browsers died off, but such a solution would eventually pay for itself. No matter what custom attribute we defined, we also enclose the means necessary for the browser to handle the custom definition. HTML would be like any other object-oriented language. Extensible and scalable. We would load attribute definitions like we load JavaScript. Style with CSS would be as simple as your article suggested.

    It’s just an idea and yes, it completely ignores the backwards-compatibility facet of the conversation. But we live in a dynamic world and this seems like the only out from a future circumstance where we’ll still talking about this very same subject.

    Copy & paste the code below to embed this comment.
  4. I’m glad someone is really talking about these new tags. Personally, I was stoked to see <nav> and <section> and <header> and <footer> being added. I’m glad you mentioned DocBook because it solved a problem for print publications that I think HTML needs to solve for digital books, but unfortunately, I don’t think trying to borrow from DocBook is the right approach. I do think your attribute solution is thoughtful and has a lot of merit. For now, the tags make sense to me. In your example of the problem of applying a font color to a section via CSS rule, more complex selectors could be used to apply that style to sub elements of the <section> element. Since <section> is structural, it seems more aimed at the non-human consumers of the page, such as e-book reading systems which otherwise have no way of telling what a section of a book is (the div tag was supposed to do it, but it’s now used for layout). These new tags should not be styled, although they can be part of a CSS selector, as ‘hooks’ in more selective selector strings. The nav is another good one, especially in the case of books, because in many cases a tertiary web app may want to access only the text of the book (or doc, or whatever is at the heart of the web page), and skip any ads, extraneous navigational elements, or other structural scaffolding unnecessary and problematic for rendering the book itself. There are plenty of cases outside of the e-book application scenario—I use that because it’s the area I primarily struggle with. Backwards compatibility is really important, and Ian Hickson is one of the strongest proponents I’ve seen on the WHAT-WG list on keeping things backwards compatible. That said, I do agree we can’t keep inventing new tags through the long involved processes of the W3C and WHAT WG every time the Web evolves on its own.

    Copy & paste the code below to embed this comment.
  5. This was a thoughtful article and I really agree with its stance. Since John opens by taking such a long view on the web, and since I don’t think most people think about that enough, I thought this TED Talk would be interesting for anyone reading:

    http://www.ted.com/index.php/talks/kevin_kelly_on_the_next_5_000_days_of_the_web.html

    Copy & paste the code below to embed this comment.
  6. Imagine if HTML had been invented in Shakespeare’s day. Would we still be using tags like < shoppe type=“ye oldde” >? Clearly our existing language may change in hundreds of years (or less if texting has anything to do with it.) So there may be a need to update the words used for tags, even if just to improve them. (Personally I have always found < blockquote > ridiculously long, especially when there is < q >. Why not just have < quote > and use an attribute?)

    Also why should HTML be written in English? Why not have African or Arabic tag names? Perhaps localised versions can be created?

    I say keep improving the existing tags and attributes and of course add new things as the web moves forward. Backwards compatibility is simply a matter of browsers converting any changed tags. Anything new, well hey, one day tables were new and browsers had to cope back then. And CSS! Did we refuse to use it because of pre-CSS browsers? No, we all moved forward by downloading new versions of Netscape. This is the way it has been and should be in the future.

    Having said that Zachary’s idea above is good. How about tags that didn’t refer to something specific, but the user then applied the relevant attribute? Eg:

    < box use=“sidebar” >
    < list use=“menu” >
    < box use=“header” >
    < text use=“paragraph” >
    < text use=“quote” >
    < text use=“email” >

    It might make reading documents harder, but there’d be no need to battle over the names of the tags. A standard set would suffice for everything. All the browser needs to know is if the tags are block or inline, floated or not and so on. The stylesheet would provide that.

    Copy & paste the code below to embed this comment.
  7. Surely the guys at W3CHTML5WG already have a vision for adding semantics to HTML, don’t they? And it can’t just be adding ad-hock elements every few years, can it? There are already too many solutions in use out there – microformats, rdfa, embedded rdf, xhtml etc – and clearly few people share a single vision for how this is all going to pan out. But more worryingly, we don’t know what the W3C’s vision is, so we have to make up solutions to prompt them into action. I’m looking forward to the day when they get their act together so we can just build really cool things. But I’m not holding my breath.

    Copy & paste the code below to embed this comment.
  8. Google has already taken the first step to eradicate IE6

    How so? I can still use all the aspects of Google that I need to with IE6, including search, maps and email.

    Copy & paste the code below to embed this comment.
  9. Shelley said, “Corporations afraid to change? Oh good lord, no wonder they’re all failing. IE6 is, currently, one of the most insecure browsers in use today.”

    I was a corporate web developer until June last year,and I too hate IE 6. But to blame the econominc downturn and corporate bankruptcies on it feels a little over-exaggerated.

    Now I work for Opera, so have every reason to diss Microsoft, but it’s wrong to ignore IE 6.  It’s temptimng to take the “f**k IE 6” approach,  but lots of companies have windows 2000 machines as it’s supported until 2010. Will they upgrade those machines to windows XP or Vista now, in a credit crunch, just to look at sexy web sites that don’t support IE 6?

    Lots of people in the developing world use older machines out of economic necessity. Sure, they could install Linux and then Opera or Firefox, but are we really back to the era of requiring users install operating systems and certain browsers for the privilege of viewing our super-special sites?

    Copy & paste the code below to embed this comment.
  10. I was shocked yesterday to find a major university in England has IE6 on its default drive image that all PCs have to have. So that’s hundreds of machines all stuck on an old browser. I feel this may be typical for IT at other campuses too as they are never bang up to date due to security concerns of upgrading. But I thought they’d at least have IE7 on there.

    Copy & paste the code below to embed this comment.
  11. I have to say the difference between IE6 and IE7 seem so miniscule (especially when one considers the 7 years spent in development) that this sub-thread about how horrible IE6 is looks like a marketing ploy for Internet Explorer and Windows sales :-). Probably just the conspiracy theorist in me, but its worth pointing that out for anyone feeling they need to upgrade.

    Copy & paste the code below to embed this comment.
  12. I see html5 as a opportunity to help encourage people to update their browsers. While I agree with the article (semantics & all), and despite the shortcomings of html5, this is a browser-marketers dream to encourage people to upgrade to a browser that supports the full html5 spec. Of course it will take the completion of ie8 (if they fully support the current html5 spec). However what better excuse could you think of to give your visitors incentive to upgrade their browsers then something like “This website utilizes ‘marketing term’ technology. To use the site to its full potential please upgrade your browser”? The whole industry could push this new ‘marketing term’, making it the next Web2.0 if you will…

    Of course the only downside of this is that there isn’t much substance to this, from a user’s point of view (the canvas tag is one of the few tags that will give users an actual reason to upgrade). As developers we get html5, css3, etc. The user gets not a lot. This strategy needs a lot of refinement, but it’s certainly something that could work.

    It will be a bumpy road, but we need to balls up as an industry and not take the chicken shit way out all the time.

    Copy & paste the code below to embed this comment.
  13. I think we’re missing the point of html5, and what it could achieve…

    Copy & paste the code below to embed this comment.
  14. I think we should be focusing on getting browsers to work more consistently and getting rid of old browsers like ie6 that have no place in today’s world. Tech moves fast and yet ie6 lingers on. No matter how you try and make things backwards compatible you will always be limited by decaying technology, there is only so far you can go before you have to stop and address the existence of obstacles like old browsers.

    Ignoring them and creating new languages is great but don’t expect not to run into the same problems a few years later.

    Copy & paste the code below to embed this comment.
  15. I understand your concerns of semantic limitations.  I have already solved this problem in the language I created, mail markup language.  You can download the schema in order to play with it or read the specification for documentation.  I solve the problem through the use of the “role” attribute which is compatible with XHTML and HTML 5.  Since my language is inherently XML RDF and OWL are expected to use the role attribute for semantic processing.

    Find everything and more about mail markup language at http://mailmarkup.org/

    Copy & paste the code below to embed this comment.
  16. Since the topic keeps getting raised, I have to ask what are the significant differences in standard support between IE6 and IE7. As far as I can tell they are minimal to non-existent. IE8 promises better support such as CSS :before and :after pseudo element support and the associated generated content properties. However, the big expectation for standard support in IE7, after 7 years of development, was that it would add XHTML support and CSS generated content support. Neither materialized. Nor did other features such as SVG or complete Ruby support. So what are the major problems with IE6 compared to IE7 in terms of standard support that posters keep referring to.

    Copy & paste the code below to embed this comment.
  17. The campus I was referring to have sent round an upgrade to IE7! Now everyone is complaining about the toolbar. (One guy thought the Refresh button had disappeared completely.) Still, at least they now have tabs – one big difference between IE6 and IE7. And better CSS and HTML support. And a heap of bug fixes. And you can zoom in graphics not just text (which also cures the long-standing fonts-set-in-pixels-can’t-be-resized problem). So there’s quite a lot of improvements if you ask me.

    Copy & paste the code below to embed this comment.
  18. And better CSS and HTML support.

    This is the part that my question was about. The claims about IE6 being horrible have mostly related to standards support. Yet with all the things I expected to arrive in IE7 (after many years in development) I can’t really think of many things that IE7 improved. On the other hand, the IE8 beta does offer some CSS and HTML improvements, but what does IE7 offer over IE6 in this area?

    Copy & paste the code below to embed this comment.
  19. Sorry in advance for going slightly off topic…

    @71
    “But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.”

    I don’t know how to put this politely either, but that’s just arrogant. And Elitist. Most designers are getting paid by clients who have a very real bottom line in TODAY’S reality where IE6 still represents 20% – 25% of the mainstream market. Front line web developers / designers need to deal with IE6. What they don’t need is to take the blame for it’s over-extended shelf life. That’s like blaming the road designer because your aunt’s old K-car still gets her to Walmart every Saturday.

    Yes, of course there needs to be continued progression and risk taking. And there is. And, IE6 really will die a quiet little death one day. In the meantime, it’s not nearly the catastrophic issue some make it out to be.

    @89
    IE6 has multiple display issues with CSS borders, margins, floats, png transparency and more—most of those display issues were corrected in IE7. There’s this thing called google where you can dig up all kinds of clarification ;)

    Copy & paste the code below to embed this comment.
  20. There were a lot of bug fixes and improvements made to IE with version 7. Even simple stuff like adding <abbr> was welcome. (I personally don’t think generated content, while useful, is an essential addition.)

    The problem with IE7 is that it also introduced new bugs. And there were still plenty of unfixed bugs.

    Anyone wanting to know more about the true horror of IE bugs might wish to peruse the following sites:

    “Position Is Everything”:http://www.positioniseverything.net/

    “Browser Bugs Section”:http://www.gtalbot.org/BrowserBugsSection/

    Copy & paste the code below to embed this comment.
  21. I really appreciate your idea about attributes. I don’t think that we need more than one html attribute: “semantic”. Then anyone can define all the semantic classes he needs. Of course we need to define the properties such as “rhetoric”, “structure”, etc… and their values as we have “background-color”, “font-family”, etc… in CSS

    Here an example of HTML and CSE (Cascading SEmantic sheet):
     
      HTML:
        ….
        An elderly lady
    phoned…
        ….

      CSE:
        ….
        joke_of_the_day{
          rhetoric:ironic;
          structure:aside;
        }
        ….

    Cascading mechanism may also solve the problem of nestled semantic annotations.
    What do you think about it?
    Regards,
      Matteo (matteo.cajani@alice.it)

    Copy & paste the code below to embed this comment.
  22. Is the browser or the structure (HTML) here really the issue.  It would seem that the major hangup for new tags and features in HTML is the backward compatibility. 

    What blows my mind is that we as a community continue to perpetuate the problem.  Get off HTML and develop something new.  Maintain a legacy object capable of rendering HTML but move something to a new open standard.  Then, make that easy to upgrade.

    Look at “Flash”.  When something new comes out what do people do.  Upgrade.  What do websites say. “Upgrade to the latest”.

    People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress.  We use HTML and CSS as a container for the wiz bang we do with flash and javascript.  We want more than text from the browser, so lets finally do what it takes to get there.

    We need to develop an open standard that allows for upgrades and force everyone to abide by the upgrade path.  If you don’t, then you cant expect to get serviced. 

    Just imagine where you OS would be if we still had to support 8 bit executables! 

    Remember this before you respond with the “What about the other devices”.  Ok, lets talk about that WAP, Mobile CSS, etc.  Your argument is what?  That they read the standard HTML you code?  They don’t.  The only standard is that there is not one.  We need one and HTML sure is not it.

    Sure, there will be pain, but once the pain is gone you’ll be much happier in a world where you can do more than just place a few lines of text in a document.

    Copy & paste the code below to embed this comment.
  23. I found the article encouraged me to think more about HTML5 than I have so far. And while I sympathise with the author, I’m in agreement with those commenters that this is a solution in search of a problem.

    It is perhaps a little ironic that the article is a critique of HTML5’s inflexibility when, at least as far as I know, HTML5 was proposed as a pragmatic solution to the seemingly intractable solution of “what comes after HTML4”. This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss. This has important consequences primarily for those developing browsers so that HTML5 support is both robust and fast. Although this means that the semantics must be frozen, it in no way constrains attributes from being extensible. The difference is, however, one of scope: extensible attributes will have an application (website) specific purpose whereas the specified tags will always have the same purpose.

    The supplied use cases illustrate this misunderstanding:

    While <span role=“2009-05-01”>May Day next year</span> really doesn’t make sense, something along the lines of <span equivalent=“2009-05-01”>May Day next year</span> would.

    Neither of the tags is semantically satisfying particularly when the <date> tag is available.
    1) Ideally the datetime value should be the content of the tag and not an attribute of it. Confusing the two is a common mistake particularly in XML. Of course, being too restrictive here would prevent anyone from incorrectly using the tag and provoking exactly the kind of errors that HTML5 tries to avoid.
    2) How the date is displayed is a matter of presentation and, therefore, something that may be controlled by meta-data: a format attribute or CSS declaration or browser option. This avoids all the problems of localisation like when is 12/1/2009? In the example supplied format=“holiday” or format=“short”

    Making this definition part of the specification allows browsers to handle content intelligently – offer to add the date to a user’s calender or do a search on the date- in a way the suggested “equivalent” simply could not.

    In the same vein we have the new <video> and <audio> tags to handle the now well-established practice of including audio and video in websites.

    Extensibilty here is not the solution; it simply shifts the problem to the namespace.

    Copy & paste the code below to embed this comment.
  24. In comment #94,  Charlie Clark says:

    This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss.

    This comment reflects a confusion repeated in this discussion that ascribes magical properties to either the XML serialization or the traditional text/html serialization of HTML (I can’t tell which one has bee ascribed magical powers, but neither serialization has any).

    HTML5 is basically three things. 1) it is a parsing and serialization specification that attempts to codify the parsing performed by the major browsers with respect to traditional HTML serializations (and perhaps incrementally improve or at least select the best traits of existing browser parsing operations); 2) A specification of browser (and some other UA) behavior with the results a parsed (however parsed) or DOM created document; 3) a specification of a vocabulary of elements and attributes for authoring documents that are ostensibly semantic in nature.

    The topic of discussion here is about #3. It is not about the parsing and serialization of traditional text/html serializations. HTML5 (in the #3 sense) can arise from the traditional serialization, as a solely DOM creation or from an XML serialization (which despite what we are told is not so drastically different with respect to the topic at hand). So I’m not sure why the use XML or use XHTML line keeps arising in this conversation. It has nothing to do with the topic of the conversation.

    Also this is not about confusing whether data belongs in an attribute or in the contents of an element. There is not one right way to do this. The point of RDFa is that one can easily add properly parsed attributes to existing HTML elements (or any SGML or XML or otherwise elements) that add machine readable metadata about the natural language expression as the contents of the element. That means the presentation can be left as is (with the contents of the element appearing) or the UA could replace or augment that presentation with a localizable expression for the date. And whereas the HTML5 attempt to copy RDFa introduces a single purpose date element, RDFa provides a way to add precise machine readable equivalents to an element for any imaginable data type that can be expressed as the contents of an attribute (including anyURI values).

    Copy & paste the code below to embed this comment.
  25. Rob Mech above wrote:

    Get off HTML and develop something new. Maintain a legacy object capable of rendering HTML but move something to a new open standard. Then, make that easy to upgrade.

    Look at “Flash”?…

    People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress.

    I’ve been thinking for years that Flash should be the way forward. It makes perfect sense. A massive market penetration of users, smooth font rendering that can use any font, all the vector goodness you could want, Photoshop-style filters, what’s not to love?

    And I don’t mean full-blown Flash sites with rotating objects. I mean a plain renderer that improves on what the poor browser has to render. And best of all…

    Identical cross-platform rendering!

    Think about it. No more broken layouts due to so many different browsers to test in. No more holding back on things that only work in 1 or 2 browsers. No more having to code to the bare minimum because of ancient browsers still in use today.

    Unless something like this happens we will be stuck in browser hell forever.

    Copy & paste the code below to embed this comment.
  26. For those Russian speakers out there,  there’s a “russian version”:http://habrahabr.ru/blogs/webdev/49734/

    With quite a thriving discussion from what I can tell as well – not that I speak a word.

    Thanks to the translator!

    Copy & paste the code below to embed this comment.
  27. http://d-o-b.ru/test/x-html/xhtml-dtd.htm

    http://validator.w3.org/check?uri=http://d-o-b.ru/test/x-html/xhtml-dtd.htm;ss=1

    http://browsershots.org/http://d-o-b.ru/test/x-html/xhtml-dtd.htm

    Copy & paste the code below to embed this comment.
  28. > custom DTDs (AFAIK) run in quirks mode

    no. afaik.


    > Identical cross-platform rendering! (flash)

    no. afaik. identical only on windows platforms. he have many bugs on linuxes…

    Copy & paste the code below to embed this comment.
  29.   We already have a lot of tools that can provide extensible semantics. The whole point of XHTML (even version 1.0) was to allow users to introduce custom namespaces into xhtml documents and then handle them programmatically using scripts, plug-ins or XSLT. Unfortunately, these technologies are now being bullied out of existence by the simplicity-oriented majority who won’t touch them with a ten-foot pole…

    This is a recurring issue in programming. It appears that, for most people, the introduction of a new concept creates a nearly insurmountable psychological barrier. They say things like “I can’t do it. I can’t understand it. It’s complicated. It’s too abstract”. To me, and a few others (like those who invented object-oriented programming and (XML/XHTML/XSLT/XPath/XQuery), ease of use means fewer clicks and less hand-coding, even if it introduces more concepts I have to learn. To “normal” people, ease of use IS simplicity. New concepts lead to an immediate mental block.

    Just take a look at the whole motivation of behind the move to HTML5. The folks who invented the “html serialization” don’t want doctypes, schemas or even a version number. What they’re trying to do is to make “tag soup handling” part of the spec. Also, they explicitly objected to the use of namespaces.

    The original spec even said

    “Generally speaking, authors are discouraged from trying to use XML on the Web, because XML has much stricter syntax rules than the “HTML5”? variant described above, and is relatively newer and therefore less mature”

    Fortunately, the W3C didn’t take a stand on the issue, and they removed the above paragraph from their version of the spec.

    We have other strict technologies with extensible semantics in mainstream use, which have a strict, clearly defined syntax (e.g. C#). Very few people ever complained about the whole app not compiling if you miss a semicolon because they have tools that pick up the semicolons for them. I grew up on BASIC, Pascal and Delphi, and I also learned C/C++, so I don’t take error messages personally. But as long as relatively people who don’t know how to escape an ampersand continue hand-coding their HTML in notepad, fault tolerance will remain a requirement, even though after accumulating a certain critical number of errors, it makes tracing nearly impossible, and validators become completely useful because there’s an “error” on every line, but the page displayed fine just an hour ago. For instance, in one of the apps I developed at work, I put up an XSLT post-processor with a syntax checker, and a lot of the other developers would swear at it because they had no clue as to how they can make their code output well-formed XML

      I believe the Web community needs to split. Just like the desktop application world has Visual Basic for simplicity-oriented developers and C# for those who like structure, we need to have two separate stacks, one for tag soup hand-coders who think namespaces are evil, and another one who prefer a stricter syntax. The split can be made on top of the XHTML5 (the XML version of HTML5), with namespaced elements used to define semantics. Those who prefer the tag-soup-friendly version also happen to be the ones who adhere to the KISS principle and, therefore, don’t need extensible semantics anyway. They would rather copy&paste; a block of html ten times over than introduce a new concept for a frequently used element. It doesn’t matter if the majority does tag soup as long as there’s enough community support for the “complex and extensible” version to keep it alive.

    Copy & paste the code below to embed this comment.
  30. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.” It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.” The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street. No one should ever slight reading the specs and knowing the proper way to code either HTML or XML, but when it comes to parsing the Web, see the recent Opera study, with MAMA, or the Google study that Ian Hickson did, and it’s clear that less than 10% of what we deal with out there is XHTML, much less proper XHTML.

    Copy & paste the code below to embed this comment.
  31. @Aaron Miller
    bq. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.”? It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.”? The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street.

    Tag soup gets used in two different ways which you’re confusing in this comment. 1) tag soup sometimes refers to the serialized source content of a document where tags are potentially misnested, content models invented out of thin air and attribute values requiring quotations not quoted: in general vended content not conforming to any specification anywhere. 2) tag soup parsing refers to a parser that is capable of parsing tag soup (a Herculean task).

    When you say that XHTML parsed as text/html will be tag soup you’re confusing these two definitions. The XHTML is certainly not tag soup as it adheres to the XHTML syntax and sometimes even other syntactic requirements on top of that (such as XHTML 1.0 appendix C), so there is no sense in which that content can be considered tag soup. However, if such an XHTML document is vended as text/html it will be parsed by the UA’s tag soup parsing just like any other conforming or non-conforming HTML 2-4.0.1. So in this sense both: not using XHTML; and not vending as applicaiton/xhtml+xml means that the content is parsed by the tag soup parsing processor (just like any other HTML).

    It’s important to keep these two meanings of tag soup separate to understand the conversation.

    Copy & paste the code below to embed this comment.
  32. @Rob, looks like we’re talking about the same distinction. The only difference is that I’m saying the phrase “tag soup” makes it sound anomalous, when in fact from the content side it refers to over 95% of the web, and from the browser (parser) side, it’s SOP. See the Opera MAMA study and Ian Hickson’s Google report if you don’t know what I mean.

    Copy & paste the code below to embed this comment.
  33. This is not directly related to John’s article, but it reminded me of the following problem I’ve been posing in my head for some time: How is that the syntax of HTML or any other machine-readable grammar is constructed using English? More specifically US English? Has anyone ever tried to construct a language, of even a very light grammar, that allows multi-[Human]languages to describe headers, footers, loops, lists etc? I appreciate that many of these machine-language were first composed in the US and thus US-English has become the Lingua-Franca of programming – but this is the 21st century and not everyone on the Planet who wishes to write code knows how to speak/write English never mind to a specific sub-grammar of it.

    Copy & paste the code below to embed this comment.
  34. Seems that the first one wasn’t perfect.
    Here it is: http://interpretor.ru/html5semantics

    Copy & paste the code below to embed this comment.
  35. “German translation available”:http://tobias-otte.de/essays/semantik-in-html-5/

    Copy & paste the code below to embed this comment.
  36. I agree with all the principal points in the article, i have the same opinion about a preferrable use of attributes; but i think they’re breaking compatibility on purpose, we all agree it is stupid to still have concerns about IE6 in 2009 (and soon ‘10). If they break the cordon everybody will be happier. And in fact the big push on HTML5 came from browser makers, and looks to me MS wants to be in the game.
    In the aftermath we will all have a common base to discuss upon.

    Afterall i think some new tags would come in handy.
    For eg i think that for something as ubiquitous as a calendar, there should exist a tag, it would end the debate wich solution is more semantic (table vs list, that oddly relies on the kind of visualization we want to give), it would spare a lot of code and give more artistic freedom to designers that could target a parameter/class with a simple javascript to radically modify the visualization.

    Copy & paste the code below to embed this comment.
  37. http://wiki.whatwg.org/wiki/FAQ#HTML5_should_support_a_way_for_anyone_to_invent_new_elements.21

    Contains some of their responses to the extensibility problem.

    Copy & paste the code below to embed this comment.