Open book with bookmark
Issue № 305 Illustration by

A Brief History of Markup

A note from the editors: We are pleased to present Chapter 1 of HTML5 for Web Designers by Jeremy Keith (A Book Apart, 2010).

HTML is the unifying language of the World Wide Web. Using just the simple tags it contains, the human race has created an astoundingly diverse network of hyperlinked documents, from Amazon, eBay, and Wikipedia, to personal blogs and websites dedicated to cats that look like Hitler.

Article Continues Below

HTML5 is the latest iteration of this lingua franca. While it is the most ambitious change to our common tongue, this isn’t the first time that HTML has been updated. The language has been evolving from the start.

As with the web itself, the HyperText Markup Language was the brainchild of Sir Tim Berners-Lee. In 1991 he wrote a document called “HTML Tags” in which he proposed fewer than two dozen elements that could be used for writing web pages.

Sir Tim didn’t come up with the idea of using tags consisting of words between angle brackets; those kinds of tags already existed in the SGML (Standard Generalized Markup Language) format. Rather than inventing a new standard, Sir Tim saw the benefit of building on top of what already existed, a trend that can still be seen in the development of HTML5.

From IETF to W3C: The road to HTML 4#section2

There was never any such thing as HTML 1. The first official specification was HTML 2.0, published by the IETF, the Internet Engineering Task Force. Many of the features in this specification were driven by existing implementations. For example, the market-leading Mosaic web browser of 1994 already provided a way for authors to embed images in their documents using an <img> tag. The img element later appeared in the HTML 2.0 specification.

The role of the IETF was superceded by the W3C, the World Wide Web Consortium, where subsequent iterations of the HTML standard have been published at The latter half of the nineties saw a flurry of revisions to the specification until HTML 4.01 was published in 1999.

At that time, HTML faced its first major turning point.

XHTML 1: HTML as XML#section3

After HTML 4.01, the next revision to the language was called XHTML 1.0. The X stood for “eXtreme” and web developers were required to cross their arms in an X shape when speaking the letter.

No, not really. The X stood for “eXtensible” and arm crossing was entirely optional.

The content of the XHTML 1.0 specification was identical to that of HTML 4.01. No new elements or attributes were added. The only difference was in the syntax of the language. Whereas HTML allowed authors plenty of freedom in how they wrote their elements and attributes, XHTML required authors to follow the rules of XML, a stricter markup language upon which the W3C was basing most of their technologies.

Having stricter rules wasn’t such a bad thing. It encouraged authors to use a single writing style. Whereas previously tags and attributes could be written in uppercase, lowercase, or any combination thereof, a valid XHTML 1.0 document required all tags and attributes to be lowercase.

The publication of XHTML 1.0 coincided with the rise of browser support for CSS. As web designers embraced the emergence of web standards, led by The Web Standards
Project, the stricter syntax of XHTML was viewed as a “best practice” way of writing markup.

Then the W3C published XHTML 1.1.

While XHTML 1.0 was simply HTML reformulated as XML, XHTML 1.1 was real, honest-to-goodness XML. That meant it couldn’t be served with a mime-type of text/html. But if authors published a document with an XML mime-type, then the most popular web browser in the world at the time, Internet Explorer, couldn’t render the document.

It seemed as if the W3C were losing touch with the day-to-day reality of publishing on the web.

XHTML 2: Oh, we’re not gonna take it!#section4

If Dustin Hoffman’s character in _The Graduate_ had been a web designer, the W3C would have said one word to him, just one word: XML.

As far as the W3C was concerned, HTML was finished as of version 4. They began working on XHTML 2, designed to lead the web to a bright new XML-based future.

Although the name XHTML 2 sounded very similar to XHTML 1, they couldn’t have been more different. Unlike XHTML 1, XHTML 2 wasn’t going to be backwards compatible with existing web content or even previous versions of HTML. Instead, it was going to be a pure language, unburdened by the sloppy history of previous specifications.

It was a disaster.

The schism: WHATWG TF?#section5

A rebellion formed within the W3C. The consortium seemed to be formulating theoretically pure standards unrelated to the needs of web designers. Representatives from Opera, Apple, and Mozilla were unhappy with this direction. They wanted to see more emphasis placed on formats that allowed the creation of web applications.

Things came to a head in a workshop meeting in 2004. Ian Hickson, who was working for Opera Software at the time, proposed the idea of extending HTML to allow the creation of web applications. The proposal was rejected.

The disaffected rebels formed their own group: the Web Hypertext Application Technology Working Group, or WHATWG for short.

From Web Apps 1.0 to HTML5#section6

From the start, the WHATWG operated quite differently than the W3C. The W3C uses a consensus-based approach: issues are raised, discussed, and voted on. At the WHATWG, issues are also raised and discussed, but the final decision on what goes into a specification rests with the editor. The editor is Ian Hickson.

On the face of it, the W3C process sounds more democratic and fair. In practice, politics and internal bickering can bog down progress. At the WHATWG, where anyone is free to contribute but the editor has the last word, things move at a faster pace. But the editor doesn’t quite have absolute power: an invitation-only steering committee can impeach him in the unlikely event of a Strangelove scenario.

Initially, the bulk of the work at the WHATWG was split into two specifications: Web Forms 2.0 and Web Apps 1.0. Both specifications were intended to extend HTML. Over time, they were merged into a single specification called simply HTML5.


While HTML5 was being developed at the WHATWG, the W3C continued working on XHTML 2. It would be inaccurate to say that it was going nowhere fast. It was going nowhere very, very slowly.

In October 2006, Sir Tim Berners-Lee wrote a blog post in which he admitted that the attempt to move the web from HTML to XML just wasn’t working. A few months later, the W3C issued a new charter for an HTML Working Group. Rather than start from scratch, they wisely decided that the work of the WHATWG should be used as the basis for any future version of HTML.

All of this stopping and starting led to a somewhat confusing situation. The W3C was simultaneously working on two different, incompatible types of markup: XHTML 2 and HTML 5 (note the space before the letter five). Meanwhile a separate organization, the WHATWG, was working on a specification called HTML5 (with no space) that would be used as a basis for one of the W3C specifications!

Any web designers trying to make sense of this situation would have had an easier time deciphering a movie marathon of _Memento_, _Primer_, and the complete works of David Lynch.

XHTML is dead: long live XHTML syntax#section8

The fog of confusion began to clear in 2009. The W3C announced that the charter for XHTML 2 would not be renewed. The format had been as good as dead for several years; this announcement was little more than a death certificate.

Strangely, rather than passing unnoticed, the death of XHTML 2 was greeted with some mean-spirited gloating. XML naysayers used the announcement as an opportunity to deride anyone who had ever used XHTML 1—despite the fact that XHTML 1 and XHTML 2 have almost nothing in common.

Meanwhile, authors who had been writing XHTML 1 in order to enforce a stricter writing style became worried that HTML5 would herald a return to sloppy markup.

As you’ll soon see, that’s not necessarily the case. HTML5 is as sloppy or as strict as you want to make it.

The timeline of HTML5#section9

The current state of HTML5 isn’t as confusing as it once was, but it still isn’t straightforward.

There are two groups working on HTML5. The WHATWG is creating an HTML5 specification using its process of “commit then review.” The W3C HTML Working Group is taking that specification and putting it through its process of “review then commit.” As you can imagine, it’s an uneasy alliance. Still, there seems to finally be some consensus about that pesky “space or no space?” question (it’s HTML5 with no space, just in case you were interested).

Perhaps the most confusing issue for web designers dipping their toes into the waters of HTML5 is getting an answer to the question, “when will it be ready?”

In an interview, Ian Hickson mentioned 2022 as the year he expected HTML5 to become a proposed recommendation. What followed was a wave of public outrage from some web designers. They didn’t understand what “proposed recommendation” meant, but they knew they didn’t have enough fingers to count off the years until 2022.

The outrage was unwarranted. In this case, reaching a status of “proposed recommendation” requires two complete implementations of HTML5. Considering the scope of the specification, this date is incredibly ambitious. After all, browsers don’t have the best track record of implementing existing standards. It took Internet Explorer over a decade just to add support for the abbr element.

The date that really matters for HTML5 is 2012. That’s when the specification is due to become a “candidate recommendation.” That’s standards-speak for “done and dusted.”

But even that date isn’t particularly relevant to web designers. What really matters is when browsers start supporting features. We began using parts of CSS 2.1 as soon as browsers started shipping with support for those parts. If we had waited for every browser to completely support CSS 2.1 before we started using any of it, we would still be waiting.

It’s no different with HTML5. There won’t be a single point in time at which we can declare that the language is ready to use. Instead, we can start using parts of the specification as web browsers support those features.

Remember, HTML5 isn’t a completely new language created from scratch. It’s an evolutionary rather than revolutionary change in the ongoing story of markup. If you are currently creating websites with any version of HTML, you’re already using HTML5.

27 Reader Comments

  1. Delightful writing,
    One little thing, ‘Sir’ title should be used only with family name or full name, Using this title with good name only will change it’s meaning and context.

  2. Actually, there was an “HTML1”, just not a formal standards spec. HTML1 was generally held to be the original standard that Sir Tim supported (I can’t remember if there was an official document or not, or just the libwww code). This is similar to the oft-referenced but never-written DOM level 0.

  3. “It’s no different with HTML5. There won’t be a single point in time at which we can declare that the language is ready to use. Instead, we can start using parts of the specification as web browsers support those features.”

    That’s exactly the problem. The failure of browsers to implement CSS 2.1 in a reasonable amount of time has been the bane of my existence as a web developer (*cough* IE *cough*).

    To put it simply, I’m preemptively cursing Microsoft for the acts of incompetence that they have yet to commit.

    For that one reason alone, don’t consider me a fan of HTML5.

  4. Chris, it’s true: there was a document called “HTML Tags” which is what Sir Tim supported in WorldWideWeb (aka Enquire) but as you say, there was a never an official recommendation called HTML 1.

    I don’t even think there’s a URL for the HTML Tags document, although I’d love to be proven wrong on that. Has anybody seen it?

    Here’s an email from 1991 where Tim Berners-Lee is responding to Dan Connolly’s progress on X11 and I believe this is the first mention of HTML Tags but alas, the URL on CERN that’s referenced is now 404:

  5. Stargazer, I think you misunderstand how standards bodies and browsers work. They don’t produce a fully-written spec that’s then handed over to browsers. Neither do browsers wait until a specification is finished before implementing parts of it.

    So, to talk about CSS2.1 or HTML5 or any other specification as a monolithic thing that can’t be used until every single piece of the spec is implemented in all browsers …well, that’s not really how standards (or browsers) work. Surely you’d rather see _some_ incremental implementation in all browsers rather than despair that one browser hasn’t implemented _everything_?

  6. It doesn’t really matter if browsers implement incremental changes or not–we are always at the mercy of the worst widely used browser (not going to call out names like Internet Explorer).

    At best, we can use creative hacks to gain access to a fraction of the market. However what are the hacks for HTML5?

    Consider the

    tag. I think this is a useful tag that makes more sense than using

    tags everywhere, but what is the point if it is not supported by widely used browsers? The hack is to use

    tags instead; this makes the

    tag meaningless for now, because the hack is what is done already.

    I’m not saying that there won’t be a day _eventually_ when HTML5 will be useful, but it is a long wait for that day, and this fact must be recognized.

    Until that day comes, web developers will simply have to suffer when trying to use HTML5. Users will also have to suffer through websites that were poorly designed by those who were overeager to use HTML5. Lastly, we’ll have to listen to out-of-touch idealists who _actually believe_ that HTML5 will be a good replacement for Flash in the near future.

    I will eventually be fond of HTML5 and all it has to offer; In the meantime, I’m not a fan.

  7. Stargazer, there are actually a number of different strategies for using the new semantic elements in Internet Explorer. Using divs is just one of those strategies. In the final chapter of HTML5 For Web Designers, I outline some of those strategies e.g. using Remy Sharp’s html5shiv script:

  8. first i’d like to say thanks for the very informative and humorous article.

    it seems as if we will be mostly stuck using the parts of html and css that work the same in all browsers well into the future.

    i don’t subscribe to hacks, and i do what i can for IE, i prefer to code for the majority and peek at what IE did with my code last, lolol.

    personally i wait until new code becomes standard across the major browsers before i use it, if people don’t want to upgrade their browser that’s their problem, i’m not going to fill my head with a bunch of hacks and useless code that’s likely to be replaced or stop working when a new browser version is released. for the most part you can make sites look the same across all major browsers with just a few core html and css tidbits.

    the sad part is if you don’t use the new code the browser makers won’t have much pressure to standardize the new code – but we have plenty of adventurous web developers that like wading in code.

  9. Jeremy,

    I’m really interested in html5shiv. When using it, what happens if the user has no javascript?

    By the way, thank you, it was a pleasant reading.

  10. Very nice read makes me curious for more. Maybe there is no real need to use HTML5 today, but i think getting familiar with is a must and if you are familiar with something you will use it. Just my 2 cents.

  11. A bit disappointed by this. Even though it’s a nice read (for its intended audience I guess), the fact that ALA has taken a broader view makes it a waste of html-related talk. Articles on HTML are getting rarer so a simple book excerpt feels a little sloppy.

    Next time something edgy, revolutionary or at least useful?

  12. I certainly hope HTML5 developers stick to a strict standard of implementation. We’re just getting to the point now where “developers” (quotes for a reason) are learning not to rely on browsers to correct and display bad markup properly. In addition, as the web becomes less and less a series of sites and more a mash-up of web services, online applications and syndicated content, using strict markup is key to manipulating and maintaining the integrity of the DOM.

  13. The HTML5 movement looks to be a great leap forward in the right direction, can’t wait to get my hands on this book :-). That first chapter was a fun read.

    It’s nice to know there are a conscious group pushing standards, semantic HTML(5), and future technologies to bring down the idea that anyone who can run a WYSIWYG is web designer.

    Looking forward to the era of compliant coders, not obtuse dreamweavers.

  14. This appears to be an article upon “A Brief History of HTML” opposed to markup. There are markup languages, that people actually commonly use, outside of HTML.

    I did enjoy your extremely brief mention of XHTML 1.1 apart from 1.0. I always seem to get pissed when I bring such things up or mention differences thereof.

  15. Interesting read.

    This is a good example of how a market-driven standard with a few key leaders (such as Ian Hickson) proved more suitable than the “pure” consensus approach taken by the W3C.

    Apple, Microsoft, etc are happy to conform and advance a standard like HTML5 *because* it’s driven by market demands rather than bureaucratic processes.

  16. Nicely written article but I couldn’t be more delighted to hear about the HTML 5 book, Jeremy. Your DOM scripting book is fantastic and I will definitely be ordering this latest work.

  17. Keith —

    In the book you recommend using the HTML5 doctype right now. Won’t that cause most browsers to fall back to “quirks” mode? I can’t imagine that IE 6 or 7 recognize that doctype. (Or am I misunderstanding how doctype works?)

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA