Semantics in HTML 5

by John Allsopp

107 Reader Comments

Back to the Article
  1. I have to say the difference between IE6 and IE7 seem so miniscule (especially when one considers the 7 years spent in development) that this sub-thread about how horrible IE6 is looks like a marketing ploy for Internet Explorer and Windows sales :-). Probably just the conspiracy theorist in me, but its worth pointing that out for anyone feeling they need to upgrade.

    Copy & paste the code below to embed this comment.
  2. I see html5 as a opportunity to help encourage people to update their browsers. While I agree with the article (semantics & all), and despite the shortcomings of html5, this is a browser-marketers dream to encourage people to upgrade to a browser that supports the full html5 spec. Of course it will take the completion of ie8 (if they fully support the current html5 spec). However what better excuse could you think of to give your visitors incentive to upgrade their browsers then something like “This website utilizes ‘marketing term’ technology. To use the site to its full potential please upgrade your browser”? The whole industry could push this new ‘marketing term’, making it the next Web2.0 if you will…

    Of course the only downside of this is that there isn’t much substance to this, from a user’s point of view (the canvas tag is one of the few tags that will give users an actual reason to upgrade). As developers we get html5, css3, etc. The user gets not a lot. This strategy needs a lot of refinement, but it’s certainly something that could work.

    It will be a bumpy road, but we need to balls up as an industry and not take the chicken shit way out all the time.

    Copy & paste the code below to embed this comment.
  3. I think we’re missing the point of html5, and what it could achieve…

    Copy & paste the code below to embed this comment.
  4. I think we should be focusing on getting browsers to work more consistently and getting rid of old browsers like ie6 that have no place in today’s world. Tech moves fast and yet ie6 lingers on. No matter how you try and make things backwards compatible you will always be limited by decaying technology, there is only so far you can go before you have to stop and address the existence of obstacles like old browsers.

    Ignoring them and creating new languages is great but don’t expect not to run into the same problems a few years later.

    Copy & paste the code below to embed this comment.
  5. I understand your concerns of semantic limitations.  I have already solved this problem in the language I created, mail markup language.  You can download the schema in order to play with it or read the specification for documentation.  I solve the problem through the use of the “role” attribute which is compatible with XHTML and HTML 5.  Since my language is inherently XML RDF and OWL are expected to use the role attribute for semantic processing.

    Find everything and more about mail markup language at http://mailmarkup.org/

    Copy & paste the code below to embed this comment.
  6. Since the topic keeps getting raised, I have to ask what are the significant differences in standard support between IE6 and IE7. As far as I can tell they are minimal to non-existent. IE8 promises better support such as CSS :before and :after pseudo element support and the associated generated content properties. However, the big expectation for standard support in IE7, after 7 years of development, was that it would add XHTML support and CSS generated content support. Neither materialized. Nor did other features such as SVG or complete Ruby support. So what are the major problems with IE6 compared to IE7 in terms of standard support that posters keep referring to.

    Copy & paste the code below to embed this comment.
  7. The campus I was referring to have sent round an upgrade to IE7! Now everyone is complaining about the toolbar. (One guy thought the Refresh button had disappeared completely.) Still, at least they now have tabs – one big difference between IE6 and IE7. And better CSS and HTML support. And a heap of bug fixes. And you can zoom in graphics not just text (which also cures the long-standing fonts-set-in-pixels-can’t-be-resized problem). So there’s quite a lot of improvements if you ask me.

    Copy & paste the code below to embed this comment.
  8. And better CSS and HTML support.

    This is the part that my question was about. The claims about IE6 being horrible have mostly related to standards support. Yet with all the things I expected to arrive in IE7 (after many years in development) I can’t really think of many things that IE7 improved. On the other hand, the IE8 beta does offer some CSS and HTML improvements, but what does IE7 offer over IE6 in this area?

    Copy & paste the code below to embed this comment.
  9. Sorry in advance for going slightly off topic…

    @71
    “But what you’re saying is most designers won’t even take the chance. Wow, must be safe to be them.”

    I don’t know how to put this politely either, but that’s just arrogant. And Elitist. Most designers are getting paid by clients who have a very real bottom line in TODAY’S reality where IE6 still represents 20% – 25% of the mainstream market. Front line web developers / designers need to deal with IE6. What they don’t need is to take the blame for it’s over-extended shelf life. That’s like blaming the road designer because your aunt’s old K-car still gets her to Walmart every Saturday.

    Yes, of course there needs to be continued progression and risk taking. And there is. And, IE6 really will die a quiet little death one day. In the meantime, it’s not nearly the catastrophic issue some make it out to be.

    @89
    IE6 has multiple display issues with CSS borders, margins, floats, png transparency and more—most of those display issues were corrected in IE7. There’s this thing called google where you can dig up all kinds of clarification ;)

    Copy & paste the code below to embed this comment.
  10. There were a lot of bug fixes and improvements made to IE with version 7. Even simple stuff like adding <abbr> was welcome. (I personally don’t think generated content, while useful, is an essential addition.)

    The problem with IE7 is that it also introduced new bugs. And there were still plenty of unfixed bugs.

    Anyone wanting to know more about the true horror of IE bugs might wish to peruse the following sites:

    “Position Is Everything”:http://www.positioniseverything.net/

    “Browser Bugs Section”:http://www.gtalbot.org/BrowserBugsSection/

    Copy & paste the code below to embed this comment.
  11. I really appreciate your idea about attributes. I don’t think that we need more than one html attribute: “semantic”. Then anyone can define all the semantic classes he needs. Of course we need to define the properties such as “rhetoric”, “structure”, etc… and their values as we have “background-color”, “font-family”, etc… in CSS

    Here an example of HTML and CSE (Cascading SEmantic sheet):
     
      HTML:
        ….
        An elderly lady
    phoned…
        ….

      CSE:
        ….
        joke_of_the_day{
          rhetoric:ironic;
          structure:aside;
        }
        ….

    Cascading mechanism may also solve the problem of nestled semantic annotations.
    What do you think about it?
    Regards,
      Matteo (matteo.cajani@alice.it)

    Copy & paste the code below to embed this comment.
  12. Is the browser or the structure (HTML) here really the issue.  It would seem that the major hangup for new tags and features in HTML is the backward compatibility. 

    What blows my mind is that we as a community continue to perpetuate the problem.  Get off HTML and develop something new.  Maintain a legacy object capable of rendering HTML but move something to a new open standard.  Then, make that easy to upgrade.

    Look at “Flash”.  When something new comes out what do people do.  Upgrade.  What do websites say. “Upgrade to the latest”.

    People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress.  We use HTML and CSS as a container for the wiz bang we do with flash and javascript.  We want more than text from the browser, so lets finally do what it takes to get there.

    We need to develop an open standard that allows for upgrades and force everyone to abide by the upgrade path.  If you don’t, then you cant expect to get serviced. 

    Just imagine where you OS would be if we still had to support 8 bit executables! 

    Remember this before you respond with the “What about the other devices”.  Ok, lets talk about that WAP, Mobile CSS, etc.  Your argument is what?  That they read the standard HTML you code?  They don’t.  The only standard is that there is not one.  We need one and HTML sure is not it.

    Sure, there will be pain, but once the pain is gone you’ll be much happier in a world where you can do more than just place a few lines of text in a document.

    Copy & paste the code below to embed this comment.
  13. I found the article encouraged me to think more about HTML5 than I have so far. And while I sympathise with the author, I’m in agreement with those commenters that this is a solution in search of a problem.

    It is perhaps a little ironic that the article is a critique of HTML5’s inflexibility when, at least as far as I know, HTML5 was proposed as a pragmatic solution to the seemingly intractable solution of “what comes after HTML4”. This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss. This has important consequences primarily for those developing browsers so that HTML5 support is both robust and fast. Although this means that the semantics must be frozen, it in no way constrains attributes from being extensible. The difference is, however, one of scope: extensible attributes will have an application (website) specific purpose whereas the specified tags will always have the same purpose.

    The supplied use cases illustrate this misunderstanding:

    While <span role=“2009-05-01”>May Day next year</span> really doesn’t make sense, something along the lines of <span equivalent=“2009-05-01”>May Day next year</span> would.

    Neither of the tags is semantically satisfying particularly when the <date> tag is available.
    1) Ideally the datetime value should be the content of the tag and not an attribute of it. Confusing the two is a common mistake particularly in XML. Of course, being too restrictive here would prevent anyone from incorrectly using the tag and provoking exactly the kind of errors that HTML5 tries to avoid.
    2) How the date is displayed is a matter of presentation and, therefore, something that may be controlled by meta-data: a format attribute or CSS declaration or browser option. This avoids all the problems of localisation like when is 12/1/2009? In the example supplied format=“holiday” or format=“short”

    Making this definition part of the specification allows browsers to handle content intelligently – offer to add the date to a user’s calender or do a search on the date- in a way the suggested “equivalent” simply could not.

    In the same vein we have the new <video> and <audio> tags to handle the now well-established practice of including audio and video in websites.

    Extensibilty here is not the solution; it simply shifts the problem to the namespace.

    Copy & paste the code below to embed this comment.
  14. In comment #94,  Charlie Clark says:

    This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss.

    This comment reflects a confusion repeated in this discussion that ascribes magical properties to either the XML serialization or the traditional text/html serialization of HTML (I can’t tell which one has bee ascribed magical powers, but neither serialization has any).

    HTML5 is basically three things. 1) it is a parsing and serialization specification that attempts to codify the parsing performed by the major browsers with respect to traditional HTML serializations (and perhaps incrementally improve or at least select the best traits of existing browser parsing operations); 2) A specification of browser (and some other UA) behavior with the results a parsed (however parsed) or DOM created document; 3) a specification of a vocabulary of elements and attributes for authoring documents that are ostensibly semantic in nature.

    The topic of discussion here is about #3. It is not about the parsing and serialization of traditional text/html serializations. HTML5 (in the #3 sense) can arise from the traditional serialization, as a solely DOM creation or from an XML serialization (which despite what we are told is not so drastically different with respect to the topic at hand). So I’m not sure why the use XML or use XHTML line keeps arising in this conversation. It has nothing to do with the topic of the conversation.

    Also this is not about confusing whether data belongs in an attribute or in the contents of an element. There is not one right way to do this. The point of RDFa is that one can easily add properly parsed attributes to existing HTML elements (or any SGML or XML or otherwise elements) that add machine readable metadata about the natural language expression as the contents of the element. That means the presentation can be left as is (with the contents of the element appearing) or the UA could replace or augment that presentation with a localizable expression for the date. And whereas the HTML5 attempt to copy RDFa introduces a single purpose date element, RDFa provides a way to add precise machine readable equivalents to an element for any imaginable data type that can be expressed as the contents of an attribute (including anyURI values).

    Copy & paste the code below to embed this comment.
  15. Rob Mech above wrote:

    Get off HTML and develop something new. Maintain a legacy object capable of rendering HTML but move something to a new open standard. Then, make that easy to upgrade.

    Look at “Flash”?…

    People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress.

    I’ve been thinking for years that Flash should be the way forward. It makes perfect sense. A massive market penetration of users, smooth font rendering that can use any font, all the vector goodness you could want, Photoshop-style filters, what’s not to love?

    And I don’t mean full-blown Flash sites with rotating objects. I mean a plain renderer that improves on what the poor browser has to render. And best of all…

    Identical cross-platform rendering!

    Think about it. No more broken layouts due to so many different browsers to test in. No more holding back on things that only work in 1 or 2 browsers. No more having to code to the bare minimum because of ancient browsers still in use today.

    Unless something like this happens we will be stuck in browser hell forever.

    Copy & paste the code below to embed this comment.
  16. For those Russian speakers out there,  there’s a “russian version”:http://habrahabr.ru/blogs/webdev/49734/

    With quite a thriving discussion from what I can tell as well – not that I speak a word.

    Thanks to the translator!

    Copy & paste the code below to embed this comment.
  17. http://d-o-b.ru/test/x-html/xhtml-dtd.htm

    http://validator.w3.org/check?uri=http://d-o-b.ru/test/x-html/xhtml-dtd.htm;ss=1

    http://browsershots.org/http://d-o-b.ru/test/x-html/xhtml-dtd.htm

    Copy & paste the code below to embed this comment.
  18. > custom DTDs (AFAIK) run in quirks mode

    no. afaik.


    > Identical cross-platform rendering! (flash)

    no. afaik. identical only on windows platforms. he have many bugs on linuxes…

    Copy & paste the code below to embed this comment.
  19.   We already have a lot of tools that can provide extensible semantics. The whole point of XHTML (even version 1.0) was to allow users to introduce custom namespaces into xhtml documents and then handle them programmatically using scripts, plug-ins or XSLT. Unfortunately, these technologies are now being bullied out of existence by the simplicity-oriented majority who won’t touch them with a ten-foot pole…

    This is a recurring issue in programming. It appears that, for most people, the introduction of a new concept creates a nearly insurmountable psychological barrier. They say things like “I can’t do it. I can’t understand it. It’s complicated. It’s too abstract”. To me, and a few others (like those who invented object-oriented programming and (XML/XHTML/XSLT/XPath/XQuery), ease of use means fewer clicks and less hand-coding, even if it introduces more concepts I have to learn. To “normal” people, ease of use IS simplicity. New concepts lead to an immediate mental block.

    Just take a look at the whole motivation of behind the move to HTML5. The folks who invented the “html serialization” don’t want doctypes, schemas or even a version number. What they’re trying to do is to make “tag soup handling” part of the spec. Also, they explicitly objected to the use of namespaces.

    The original spec even said

    “Generally speaking, authors are discouraged from trying to use XML on the Web, because XML has much stricter syntax rules than the “HTML5”? variant described above, and is relatively newer and therefore less mature”

    Fortunately, the W3C didn’t take a stand on the issue, and they removed the above paragraph from their version of the spec.

    We have other strict technologies with extensible semantics in mainstream use, which have a strict, clearly defined syntax (e.g. C#). Very few people ever complained about the whole app not compiling if you miss a semicolon because they have tools that pick up the semicolons for them. I grew up on BASIC, Pascal and Delphi, and I also learned C/C++, so I don’t take error messages personally. But as long as relatively people who don’t know how to escape an ampersand continue hand-coding their HTML in notepad, fault tolerance will remain a requirement, even though after accumulating a certain critical number of errors, it makes tracing nearly impossible, and validators become completely useful because there’s an “error” on every line, but the page displayed fine just an hour ago. For instance, in one of the apps I developed at work, I put up an XSLT post-processor with a syntax checker, and a lot of the other developers would swear at it because they had no clue as to how they can make their code output well-formed XML

      I believe the Web community needs to split. Just like the desktop application world has Visual Basic for simplicity-oriented developers and C# for those who like structure, we need to have two separate stacks, one for tag soup hand-coders who think namespaces are evil, and another one who prefer a stricter syntax. The split can be made on top of the XHTML5 (the XML version of HTML5), with namespaced elements used to define semantics. Those who prefer the tag-soup-friendly version also happen to be the ones who adhere to the KISS principle and, therefore, don’t need extensible semantics anyway. They would rather copy&paste; a block of html ten times over than introduce a new concept for a frequently used element. It doesn’t matter if the majority does tag soup as long as there’s enough community support for the “complex and extensible” version to keep it alive.

    Copy & paste the code below to embed this comment.
  20. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.” It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.” The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street. No one should ever slight reading the specs and knowing the proper way to code either HTML or XML, but when it comes to parsing the Web, see the recent Opera study, with MAMA, or the Google study that Ian Hickson did, and it’s clear that less than 10% of what we deal with out there is XHTML, much less proper XHTML.

    Copy & paste the code below to embed this comment.
  21. @Aaron Miller
    bq. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.”? It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.”? The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street.

    Tag soup gets used in two different ways which you’re confusing in this comment. 1) tag soup sometimes refers to the serialized source content of a document where tags are potentially misnested, content models invented out of thin air and attribute values requiring quotations not quoted: in general vended content not conforming to any specification anywhere. 2) tag soup parsing refers to a parser that is capable of parsing tag soup (a Herculean task).

    When you say that XHTML parsed as text/html will be tag soup you’re confusing these two definitions. The XHTML is certainly not tag soup as it adheres to the XHTML syntax and sometimes even other syntactic requirements on top of that (such as XHTML 1.0 appendix C), so there is no sense in which that content can be considered tag soup. However, if such an XHTML document is vended as text/html it will be parsed by the UA’s tag soup parsing just like any other conforming or non-conforming HTML 2-4.0.1. So in this sense both: not using XHTML; and not vending as applicaiton/xhtml+xml means that the content is parsed by the tag soup parsing processor (just like any other HTML).

    It’s important to keep these two meanings of tag soup separate to understand the conversation.

    Copy & paste the code below to embed this comment.
  22. @Rob, looks like we’re talking about the same distinction. The only difference is that I’m saying the phrase “tag soup” makes it sound anomalous, when in fact from the content side it refers to over 95% of the web, and from the browser (parser) side, it’s SOP. See the Opera MAMA study and Ian Hickson’s Google report if you don’t know what I mean.

    Copy & paste the code below to embed this comment.
  23. This is not directly related to John’s article, but it reminded me of the following problem I’ve been posing in my head for some time: How is that the syntax of HTML or any other machine-readable grammar is constructed using English? More specifically US English? Has anyone ever tried to construct a language, of even a very light grammar, that allows multi-[Human]languages to describe headers, footers, loops, lists etc? I appreciate that many of these machine-language were first composed in the US and thus US-English has become the Lingua-Franca of programming – but this is the 21st century and not everyone on the Planet who wishes to write code knows how to speak/write English never mind to a specific sub-grammar of it.

    Copy & paste the code below to embed this comment.
  24. Seems that the first one wasn’t perfect.
    Here it is: http://interpretor.ru/html5semantics

    Copy & paste the code below to embed this comment.
  25. “German translation available”:http://tobias-otte.de/essays/semantik-in-html-5/

    Copy & paste the code below to embed this comment.
  26. I agree with all the principal points in the article, i have the same opinion about a preferrable use of attributes; but i think they’re breaking compatibility on purpose, we all agree it is stupid to still have concerns about IE6 in 2009 (and soon ‘10). If they break the cordon everybody will be happier. And in fact the big push on HTML5 came from browser makers, and looks to me MS wants to be in the game.
    In the aftermath we will all have a common base to discuss upon.

    Afterall i think some new tags would come in handy.
    For eg i think that for something as ubiquitous as a calendar, there should exist a tag, it would end the debate wich solution is more semantic (table vs list, that oddly relies on the kind of visualization we want to give), it would spare a lot of code and give more artistic freedom to designers that could target a parameter/class with a simple javascript to radically modify the visualization.

    Copy & paste the code below to embed this comment.
  27. http://wiki.whatwg.org/wiki/FAQ#HTML5_should_support_a_way_for_anyone_to_invent_new_elements.21

    Contains some of their responses to the extensibility problem.

    Copy & paste the code below to embed this comment.