Semantics in HTML 5

by John Allsopp

107 Reader Comments

Back to the Article
  1. I really appreciate your idea about attributes. I don’t think that we need more than one html attribute: “semantic”. Then anyone can define all the semantic classes he needs. Of course we need to define the properties such as “rhetoric”, “structure”, etc… and their values as we have “background-color”, “font-family”, etc… in CSS

    Here an example of HTML and CSE (Cascading SEmantic sheet):
     
      HTML:
        ….
        An elderly lady
    phoned…
        ….

      CSE:
        ….
        joke_of_the_day{
          rhetoric:ironic;
          structure:aside;
        }
        ….

    Cascading mechanism may also solve the problem of nestled semantic annotations.
    What do you think about it?
    Regards,
      Matteo (matteo.cajani@alice.it)

    Copy & paste the code below to embed this comment.
  2. Is the browser or the structure (HTML) here really the issue.  It would seem that the major hangup for new tags and features in HTML is the backward compatibility. 

    What blows my mind is that we as a community continue to perpetuate the problem.  Get off HTML and develop something new.  Maintain a legacy object capable of rendering HTML but move something to a new open standard.  Then, make that easy to upgrade.

    Look at “Flash”.  When something new comes out what do people do.  Upgrade.  What do websites say. “Upgrade to the latest”.

    People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress.  We use HTML and CSS as a container for the wiz bang we do with flash and javascript.  We want more than text from the browser, so lets finally do what it takes to get there.

    We need to develop an open standard that allows for upgrades and force everyone to abide by the upgrade path.  If you don’t, then you cant expect to get serviced. 

    Just imagine where you OS would be if we still had to support 8 bit executables! 

    Remember this before you respond with the “What about the other devices”.  Ok, lets talk about that WAP, Mobile CSS, etc.  Your argument is what?  That they read the standard HTML you code?  They don’t.  The only standard is that there is not one.  We need one and HTML sure is not it.

    Sure, there will be pain, but once the pain is gone you’ll be much happier in a world where you can do more than just place a few lines of text in a document.

    Copy & paste the code below to embed this comment.
  3. I found the article encouraged me to think more about HTML5 than I have so far. And while I sympathise with the author, I’m in agreement with those commenters that this is a solution in search of a problem.

    It is perhaps a little ironic that the article is a critique of HTML5’s inflexibility when, at least as far as I know, HTML5 was proposed as a pragmatic solution to the seemingly intractable solution of “what comes after HTML4”. This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss. This has important consequences primarily for those developing browsers so that HTML5 support is both robust and fast. Although this means that the semantics must be frozen, it in no way constrains attributes from being extensible. The difference is, however, one of scope: extensible attributes will have an application (website) specific purpose whereas the specified tags will always have the same purpose.

    The supplied use cases illustrate this misunderstanding:

    While <span role=“2009-05-01”>May Day next year</span> really doesn’t make sense, something along the lines of <span equivalent=“2009-05-01”>May Day next year</span> would.

    Neither of the tags is semantically satisfying particularly when the <date> tag is available.
    1) Ideally the datetime value should be the content of the tag and not an attribute of it. Confusing the two is a common mistake particularly in XML. Of course, being too restrictive here would prevent anyone from incorrectly using the tag and provoking exactly the kind of errors that HTML5 tries to avoid.
    2) How the date is displayed is a matter of presentation and, therefore, something that may be controlled by meta-data: a format attribute or CSS declaration or browser option. This avoids all the problems of localisation like when is 12/1/2009? In the example supplied format=“holiday” or format=“short”

    Making this definition part of the specification allows browsers to handle content intelligently – offer to add the date to a user’s calender or do a search on the date- in a way the suggested “equivalent” simply could not.

    In the same vein we have the new <video> and <audio> tags to handle the now well-established practice of including audio and video in websites.

    Extensibilty here is not the solution; it simply shifts the problem to the namespace.

    Copy & paste the code below to embed this comment.
  4. In comment #94,  Charlie Clark says:

    This form of critique misses the point behind HTML5: HTML5 is not supposed to be semantically extensible. There is XHTML for that. No, HTML5 is the version of HTML that contains those tags that many currently miss.

    This comment reflects a confusion repeated in this discussion that ascribes magical properties to either the XML serialization or the traditional text/html serialization of HTML (I can’t tell which one has bee ascribed magical powers, but neither serialization has any).

    HTML5 is basically three things. 1) it is a parsing and serialization specification that attempts to codify the parsing performed by the major browsers with respect to traditional HTML serializations (and perhaps incrementally improve or at least select the best traits of existing browser parsing operations); 2) A specification of browser (and some other UA) behavior with the results a parsed (however parsed) or DOM created document; 3) a specification of a vocabulary of elements and attributes for authoring documents that are ostensibly semantic in nature.

    The topic of discussion here is about #3. It is not about the parsing and serialization of traditional text/html serializations. HTML5 (in the #3 sense) can arise from the traditional serialization, as a solely DOM creation or from an XML serialization (which despite what we are told is not so drastically different with respect to the topic at hand). So I’m not sure why the use XML or use XHTML line keeps arising in this conversation. It has nothing to do with the topic of the conversation.

    Also this is not about confusing whether data belongs in an attribute or in the contents of an element. There is not one right way to do this. The point of RDFa is that one can easily add properly parsed attributes to existing HTML elements (or any SGML or XML or otherwise elements) that add machine readable metadata about the natural language expression as the contents of the element. That means the presentation can be left as is (with the contents of the element appearing) or the UA could replace or augment that presentation with a localizable expression for the date. And whereas the HTML5 attempt to copy RDFa introduces a single purpose date element, RDFa provides a way to add precise machine readable equivalents to an element for any imaginable data type that can be expressed as the contents of an attribute (including anyURI values).

    Copy & paste the code below to embed this comment.
  5. Rob Mech above wrote:

    Get off HTML and develop something new. Maintain a legacy object capable of rendering HTML but move something to a new open standard. Then, make that easy to upgrade.

    Look at “Flash”?…

    People, once we stop living in the past and decide that we want robust browsers with true rendering capabilities, 3D models and the ability to take advantage of the other 99% of the hardware, only then will we make progress.

    I’ve been thinking for years that Flash should be the way forward. It makes perfect sense. A massive market penetration of users, smooth font rendering that can use any font, all the vector goodness you could want, Photoshop-style filters, what’s not to love?

    And I don’t mean full-blown Flash sites with rotating objects. I mean a plain renderer that improves on what the poor browser has to render. And best of all…

    Identical cross-platform rendering!

    Think about it. No more broken layouts due to so many different browsers to test in. No more holding back on things that only work in 1 or 2 browsers. No more having to code to the bare minimum because of ancient browsers still in use today.

    Unless something like this happens we will be stuck in browser hell forever.

    Copy & paste the code below to embed this comment.
  6. For those Russian speakers out there,  there’s a “russian version”:http://habrahabr.ru/blogs/webdev/49734/

    With quite a thriving discussion from what I can tell as well – not that I speak a word.

    Thanks to the translator!

    Copy & paste the code below to embed this comment.
  7. http://d-o-b.ru/test/x-html/xhtml-dtd.htm

    http://validator.w3.org/check?uri=http://d-o-b.ru/test/x-html/xhtml-dtd.htm;ss=1

    http://browsershots.org/http://d-o-b.ru/test/x-html/xhtml-dtd.htm

    Copy & paste the code below to embed this comment.
  8. > custom DTDs (AFAIK) run in quirks mode

    no. afaik.


    > Identical cross-platform rendering! (flash)

    no. afaik. identical only on windows platforms. he have many bugs on linuxes…

    Copy & paste the code below to embed this comment.
  9.   We already have a lot of tools that can provide extensible semantics. The whole point of XHTML (even version 1.0) was to allow users to introduce custom namespaces into xhtml documents and then handle them programmatically using scripts, plug-ins or XSLT. Unfortunately, these technologies are now being bullied out of existence by the simplicity-oriented majority who won’t touch them with a ten-foot pole…

    This is a recurring issue in programming. It appears that, for most people, the introduction of a new concept creates a nearly insurmountable psychological barrier. They say things like “I can’t do it. I can’t understand it. It’s complicated. It’s too abstract”. To me, and a few others (like those who invented object-oriented programming and (XML/XHTML/XSLT/XPath/XQuery), ease of use means fewer clicks and less hand-coding, even if it introduces more concepts I have to learn. To “normal” people, ease of use IS simplicity. New concepts lead to an immediate mental block.

    Just take a look at the whole motivation of behind the move to HTML5. The folks who invented the “html serialization” don’t want doctypes, schemas or even a version number. What they’re trying to do is to make “tag soup handling” part of the spec. Also, they explicitly objected to the use of namespaces.

    The original spec even said

    “Generally speaking, authors are discouraged from trying to use XML on the Web, because XML has much stricter syntax rules than the “HTML5”? variant described above, and is relatively newer and therefore less mature”

    Fortunately, the W3C didn’t take a stand on the issue, and they removed the above paragraph from their version of the spec.

    We have other strict technologies with extensible semantics in mainstream use, which have a strict, clearly defined syntax (e.g. C#). Very few people ever complained about the whole app not compiling if you miss a semicolon because they have tools that pick up the semicolons for them. I grew up on BASIC, Pascal and Delphi, and I also learned C/C++, so I don’t take error messages personally. But as long as relatively people who don’t know how to escape an ampersand continue hand-coding their HTML in notepad, fault tolerance will remain a requirement, even though after accumulating a certain critical number of errors, it makes tracing nearly impossible, and validators become completely useful because there’s an “error” on every line, but the page displayed fine just an hour ago. For instance, in one of the apps I developed at work, I put up an XSLT post-processor with a syntax checker, and a lot of the other developers would swear at it because they had no clue as to how they can make their code output well-formed XML

      I believe the Web community needs to split. Just like the desktop application world has Visual Basic for simplicity-oriented developers and C# for those who like structure, we need to have two separate stacks, one for tag soup hand-coders who think namespaces are evil, and another one who prefer a stricter syntax. The split can be made on top of the XHTML5 (the XML version of HTML5), with namespaced elements used to define semantics. Those who prefer the tag-soup-friendly version also happen to be the ones who adhere to the KISS principle and, therefore, don’t need extensible semantics anyway. They would rather copy&paste; a block of html ten times over than introduce a new concept for a frequently used element. It doesn’t matter if the majority does tag soup as long as there’s enough community support for the “complex and extensible” version to keep it alive.

    Copy & paste the code below to embed this comment.
  10. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.” It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.” The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street. No one should ever slight reading the specs and knowing the proper way to code either HTML or XML, but when it comes to parsing the Web, see the recent Opera study, with MAMA, or the Google study that Ian Hickson did, and it’s clear that less than 10% of what we deal with out there is XHTML, much less proper XHTML.

    Copy & paste the code below to embed this comment.
  11. @Aaron Miller
    bq. I think the phrase tag soup should be retired. It’s one thing to talk about parsing it, another to talk about writing it. It’s important to remember that proper HTML is never “tag soup.”? It is however, not necessarily XHTML, and if parsed as such, will be “tag soup.”? The reverse is not true. This is precisely what pisses off so many XML purists, because it’s a one-way street.

    Tag soup gets used in two different ways which you’re confusing in this comment. 1) tag soup sometimes refers to the serialized source content of a document where tags are potentially misnested, content models invented out of thin air and attribute values requiring quotations not quoted: in general vended content not conforming to any specification anywhere. 2) tag soup parsing refers to a parser that is capable of parsing tag soup (a Herculean task).

    When you say that XHTML parsed as text/html will be tag soup you’re confusing these two definitions. The XHTML is certainly not tag soup as it adheres to the XHTML syntax and sometimes even other syntactic requirements on top of that (such as XHTML 1.0 appendix C), so there is no sense in which that content can be considered tag soup. However, if such an XHTML document is vended as text/html it will be parsed by the UA’s tag soup parsing just like any other conforming or non-conforming HTML 2-4.0.1. So in this sense both: not using XHTML; and not vending as applicaiton/xhtml+xml means that the content is parsed by the tag soup parsing processor (just like any other HTML).

    It’s important to keep these two meanings of tag soup separate to understand the conversation.

    Copy & paste the code below to embed this comment.
  12. @Rob, looks like we’re talking about the same distinction. The only difference is that I’m saying the phrase “tag soup” makes it sound anomalous, when in fact from the content side it refers to over 95% of the web, and from the browser (parser) side, it’s SOP. See the Opera MAMA study and Ian Hickson’s Google report if you don’t know what I mean.

    Copy & paste the code below to embed this comment.
  13. This is not directly related to John’s article, but it reminded me of the following problem I’ve been posing in my head for some time: How is that the syntax of HTML or any other machine-readable grammar is constructed using English? More specifically US English? Has anyone ever tried to construct a language, of even a very light grammar, that allows multi-[Human]languages to describe headers, footers, loops, lists etc? I appreciate that many of these machine-language were first composed in the US and thus US-English has become the Lingua-Franca of programming – but this is the 21st century and not everyone on the Planet who wishes to write code knows how to speak/write English never mind to a specific sub-grammar of it.

    Copy & paste the code below to embed this comment.
  14. Seems that the first one wasn’t perfect.
    Here it is: http://interpretor.ru/html5semantics

    Copy & paste the code below to embed this comment.
  15. “German translation available”:http://tobias-otte.de/essays/semantik-in-html-5/

    Copy & paste the code below to embed this comment.
  16. I agree with all the principal points in the article, i have the same opinion about a preferrable use of attributes; but i think they’re breaking compatibility on purpose, we all agree it is stupid to still have concerns about IE6 in 2009 (and soon ‘10). If they break the cordon everybody will be happier. And in fact the big push on HTML5 came from browser makers, and looks to me MS wants to be in the game.
    In the aftermath we will all have a common base to discuss upon.

    Afterall i think some new tags would come in handy.
    For eg i think that for something as ubiquitous as a calendar, there should exist a tag, it would end the debate wich solution is more semantic (table vs list, that oddly relies on the kind of visualization we want to give), it would spare a lot of code and give more artistic freedom to designers that could target a parameter/class with a simple javascript to radically modify the visualization.

    Copy & paste the code below to embed this comment.
  17. http://wiki.whatwg.org/wiki/FAQ#HTML5_should_support_a_way_for_anyone_to_invent_new_elements.21

    Contains some of their responses to the extensibility problem.

    Copy & paste the code below to embed this comment.