A List Apart

Menu
Issue № 69

Rated XHTML

by Published in HTML, Industry

Being a web developer is a tough job. Not only do you have to steer clear of the traps and pitfalls the popular browsers think up for you on a daily basis, you also have to keep at least half an eye on all kinds of developments that may (or may not) have an impact on your job. Having hardly mastered style sheets and DHTML, new techniques clamor for attention. Which ones are important right away? Which ones can you dismiss for now?

This article gives my view on the language the W3C has developed to succeed HTML: XHTML. Agree or disagree with me, at least you’ll have something to think about and to help you decide.

First I’ll explain what XHTML is, then I’ll give the four rules of writing correct XHTML and finally I’ll add some words about why you should use XHTML.

What is XHTML, Anyway?

XHTML is HTML written according to the XML rules of well-formedness. To understand XHTML, we therefore have to understand XML. Many articles have already been written on this subject, so a short summary should be enough:

XML is a general markup language. Unlike HTML, XML allows you to make up your own tags and thus impose your own structure on a document. Do you need a tag ? Add it to your document, make sure some program knows what to do when it encounters this tag, and you’re ready.

There are a few simple rules for XML documents (see below). As long as your tags are correctly formed, XML doesn’t care what the actual tags are. So XML is a generalized markup language that you can use in any way you like.

In contrast, HTML is a much more rigidly defined markup language where your tags have to adhere to a syntax to make sure browsers understand you. Nonetheless, the open character of XML allows us to treat HTML documents as XML documents with the specific purpose of being shown by a web browser. However, the old standards of HTML are not completely XML compatible. For instance, using a </P> at the end of each paragraph is not required in HTML, it is optional. Web browsers don’t care if it’s there because they’re programmed not to, but XML parsers will be stricter and will tell you that your HTML document is not well-formed XML.

To bridge the gap between the two, XHTML was developed. In essence it is simply HTML, but the XML rules of well-formedness have been added to the normal HTML syntax. Thus web pages would become XML-conforming and web developers would become acquainted with the rules and restrictions of XML.

Rules of the Game

In practice, the following rules have been added to HTML for writing XHTML:

  1. Make sure all your tags are lower case.
  2. Close all your tags. In the case of tags that don’t have a closing tag, like   <IMG> or <BR>, add a slash to the end of the tag: <img />, <br />.
  3. Nest tags correctly. No more <B>

    text

    , but <p>text

    .
  4. Put quotes around all attribute values. No more <P ALIGN=center> but <p align=“center”>.

The good news is that current browsers don’t have any problems with XHTML. After all, rule 1, 2 and 4 are already optional in HTML, while rule 3 is required (even though in most cases browsers ignore nesting errors). The only really new one is rule 2a. However, this rule only leads to problems when you write <br/> without the space. Now the browser sees a br/ tag that it doesn’t know, so it doesn’t do anything. Inserting a space solves this problem. If you write <br /> the browsers see a br tag with an unknown attribute /. The br is executed, the unknown attribute is ignored.

The bad news is that you have to change your coding practices. Personally I dislike rule 1. First of all I’ve never understood why XML tags can only be lower case, secondly I always make my HTML tags upper case to make them stand out from the surrounding text. All of a sudden I can’t do this any more, while I think it’s useful. Nonetheless, I don’t mind changing my coding practices, but only if there are good reasons to.

Why Use XHTML

So why use XHTML instead of good old HTML? W3C gives the following reasons:

Document developers and user agent designers are constantly discovering new ways to express their ideas through new markup. In XML, it is relatively easy to introduce new elements or additional element attributes. The XHTML family is designed to accommodate these extensions through XHTML modules and techniques for developing new XHTML-conforming modules (described in the forthcoming XHTML Modularization specification). These modules will permit the combination of existing and new feature sets when developing content and when designing new user agents.

Alternate ways of accessing the Internet are constantly being introduced. [...] The XHTML family is designed with general user agent interoperability in mind. Through a new user agent and document profiling mechanism, servers, proxies, and user agents will be able to perform best effort content transformation. Ultimately, it will be possible to develop XHTML-conforming content that is usable by any XHTML-conforming user agent.

So future, as yet unspecified, enhancements of XHTML will allow developers to use novel, as yet unwritten, modules to extend XHTML to include new, as yet undefined, things in their web pages. In addition, W3C expects new user agents to require XHTML instead of HTML in the future.

X It Off Your List

Frankly, I don’t think these two reasons are enough for us web developers to switch from HTML to XHTML.

The first reason is unimportant at the moment. Maybe the XHTML modules will dazzle our socks off, maybe they’ll never be good for anything. In any case it’ll take at least two or three years before the modules will appear on the scene. Since we don’t yet know how they will work or exactly what they will do or even if they will be worth the trouble, we cannot do anything with them or prepare for them.

The second reason is also unimportant at the moment. There are no pure XHTML-conforming user agents, no browsers that require XHTML. Besides, it’s uncertain whether they’ll ever appear. After all, if you write a browser that only works with XHTML, it will give errors when you try to view simple HTML pages. That’s not really what browser vendors want.

Suppose Ed End-User goes to his favourite web page with the newest, XHTML-requiring, Ultra Browser X7 only to see lots of incomprehensible error messages that complain about the lack of valid XHTML. Will he think “Naughty web developers, you should have used XHTML!” or will he think “Bloody browser’s buggy!” ?

So when a new browser is released, the manufacturer will include support for good old HTML because end users will (rightly) demand it. New browsers on as yet unreleased platforms may require valid XHTML (though I don’t think so, see below), but Netscape and Explorer on personal computers won’t because they have to be conservative in their choice of languages.

Staying Power

I think that many people underestimate the staying power of HTML. It’s the standard at the moment, without it you can’t make a web page. Because of that all web developers use HTML. Because of that, all future browsers that are intended to show traditional web pages must continue to support HTML as we now know it. Because of that all web developers will continue to use HTML, so WWW pages will continue to be written in HTML, so browsers will have to continue to support it, etc.

But what about new browsers? What about new sections of the Internet, like WAP? What about learning XML by way of XHTML? Read on…

Just Say No

Of course, new browsers on new platforms may require XHTML. But then they’ll run into the same problem as the old browsers on the old platforms: they won’t correctly show existing web sites with HTML pages, which means that the end users will feel cheated. To avoid this, new browsers will also have to support HTML.

Of course, XHTML may become the standard language for a new section of the Internet, as WML has become the standard language for WAP pages. This is one of W3C’s reasons for developing it (see above). But frankly I don’t believe that. New sections of the Internet require truly new languages because they will be different from the WWW, while XHTML is only good to write traditional WWW pages in.

Of course XHTML can form a bridge between HTML and XML and make web developers acquainted with XML rules. But I wonder if XML is that important for pure web developers. I’m not convinced that every web developer should know XML, because I don’t think client side XML will be widely used. Server side XML is another case, of course.

Finally, to repeat the key phrase from the W3C quote on the previous page:

Ultimately, it will be possible to develop XHTML-conforming content that is usable by any XHTML-conforming user agent.

Doesn’t this sound familiar? Wasn’t HTML, too, supposed to be working on any user agent? We all know what happened to that plan…

So if HTML is here to stay, why bother to switch to a more difficult language that goes against your coding practices when the switching is not necessary? I don’t see any reason to start using XHTML. I’ll happily continue to write my tags uppercase to separate them from content and I’ll leave out the occasional </P> when I feel like it.

As are all of W3C’s specifications, XHTML is a theoretical construct that is interesting in its implications and may still grow to play an important role on the WWW, but right now it is worthless in practice. Software vendors should make the first move. They should start using (and requiring) XHTML in constructive ways without alienating the users of their products Only then will the rest of the Web follow.

Those fanatics who think that everything W3C says has the power of a Divine Commandment and therefore treat anyone who doesn’t use XHTML as a heretic to be burned at the stake at the earliest opportunity, are simply wrong. XHTML isn’t about the present, it’s about the future.

No Comments

  1. Sorry, commenting is closed on this article.