Better Living Through XHTML

XHTML is the standard markup language for web documents and the successor to HTML 4. A mixture of classic (HTML) and cutting–edge (XML), this hybrid language looks and works much like HTML but is based on XML, the web’s “super” markup language, and brings web pages many of XML’s benefits, as enumerated by the Online Style Guide of the Branch Libraries of The New York Public Library.

Article Continues Below

If you want your site to work well in today’s browsers and non–traditional devices, and to continue to work well in tomorrow’s, it’s a good idea to author new sites in XHTML, and to convert old pages to XHTML as your work schedule permits.

And the W3C has made it easy to do so. You can learn the rules of XHTML faster than Domino’s delivers a medium pizza with black olives and fresh mushrooms. These few, simple rules exemplify W3C’s practicality, for they bring consistency and XML well–formedness to the web without requiring busy designers and developers to learn entirely new markup techniques.

But as with any transition, you’ll get better and more predictable results if you prepare ahead. This article will help you do that, by examining tools that can assist you in converting to XHTML and verify that you’ve done it correctly. The article will also discuss changes in the way some browsers display XHTML pages that might puzzle you if you’re not anticipating them, and help you prepare workarounds if needed.

KNOW BEFORE YOU GO#section2

If you haven’t already done so, read the Online Style Guide of the Branch Libraries of The New York Public Library before you read this article.

The Style Guide illuminates XHTML without requiring you to muddle through the often arcane literature at W3C; includes valuable information on CSS (including Style Sheets you can grab and use on your own sites); explains how to work with the W3C validators; and offers updated tips for Dreamweaver users.

This author helped The New York Public Library create the Style Guide and is grateful to the Library and to web coordinator (and Style Guide co–creator) Carrie Bickner for choosing to make the Style Guide available to the entire community. The Style Guide is frequently updated to correct errors and provide new information.

TIDY TIME#section3

By far the easiest method of creating valid XHTML pages is to write them from scratch. But much web design is really redesign, and you’ll often find yourself charged with updating old pages. Redesign assignments provide the perfect opportunity to migrate to XHTML.

The free tool HTML Tidy can quickly convert your HTML to XHTML. We recently used it to do just that with The Daily Report at zeldman.com. We likewise used Tidy for last year’s CSS/XHTML conversion of A List Apart, and we’ve employed it successfully on several client sites.

Tidy was created by the brilliant standards geek Dave Raggett and is now maintained as Open Source software by the community at Source Forge, though some versions are maintained by individuals as a labor of love.

For our conversion, we used MacTidy 1.0b13, the most recent version of Tidy for Mac OS, developed by Terry Teague.

There are online versions of Tidy as well as downloadable binaries for Windows, Unix, various Linux distributions, Mac OS X, and other platforms. Each version offers different capabilities and consequently includes quite different documentation.

READ THE MANUAL#section4

Like many busy people we tend to avoid reading the manual, but in this case we urge you to read every word. Though some versions of Tidy look rudimentary and its capabilities may appear obvious, Tidy is a power tool. You must acquaint yourself with its settings and preferences to ensure the desired results.

For our Daily Report conversion, we refreshed our memory with a mere glance at MacTidy’s documentation, and this proved to be a mistake.

On our first pass, using settings we’d never used before, Tidy converted our encoded character entities to non–encoded, platform–specific keyboard characters; transformed Unicode entities that work in all browsers into named entities that should work in all browsers but don’t; and changed comment brackets (the < in <!— for instance) to encoded characters, thereby triggering errors in an embedded JavaScript function.

The power was Tidy’s, the fault was ours. Misuse of Photoshop, Illustrator, or Flash can have similarly dire consequences, and Tidy cannot be blamed for the mistakes of its users. So do yourself three favors:

Read the manual.
Keep a backup copy of your document.
Read the manual.

Those who share our manual reading avoidance problem will want to know which preference setting was right. Alas, there is no single “right” setting. The proper setting depends on the type of character encoding you intend to specify in your page header, the type of encoding you’ve told your HTML editor to output (usually, but not always, Latin1), and other variables. Here’s a tip, though. Be sure to choose Convert HTML to XML if you want to generate XHTML. (Remember: XHTML is really XML.)

MAKE A DATE TO VALIDATE#section5

The Style Guide explains how to work with the W3C’s (X)HTML and CSS validators. Validation takes just a few moments. If you don’t bother with this step, and if your XHTML or CSS contains errors, your site may not function properly. It may also look quite different than what you intended.

With valid markup and CSS, compliant browsers tend to render your site as you expect, with exceptions to be discussed below. With invalid XHTML or CSS, all bets are off, and you can’t blame the browsers. (Well, you can, but it wouldn’t be fair and it won’t do you a bit of good.)

If you write your markup by hand, unless you’re perfect, you’re likely to make mistakes every once in a while. If you use Macromedia Dreamweaver or Adobe GoLive straight out of the box, your site is certain to contain errors that the validators can help you fix.

We have every confidence that upcoming versions of Dreamweaver and GoLive will help you author more valid web content, but these versions are not yet on the market, and even when they become available, you may well need to go in and massage your markup by hand. (Meanwhile, Dreamweaver users should consult the Style Guide’s tips, updated 15 February 2002 to coincide with the present article.)

Regardless of the way you generate your markup, it pays to work with the validators. They’re like non–judgemental XHTML and CSS consultants that will point out your problems without thinking badly of you.

VALIDATOR GOTCHAS#section6

Even the best consultants sometimes give bad advice. They may also eat too much garlic at lunch, or use your fax machine more than you’d like. The robotic consultants at validator.w3.org and jigsaw.w3.org/css-validator/ may also occasionally give you unexpected trouble.

Primarily, this has to do with the language the validators use to report errors. Written by and for standards geeks, the validators sometimes provide “help” that may confuse or even mislead average working stiffs. It’s not W3C’s job to dumb down web standards for the rest of us, but we sometimes wish the validators would say, “Hey, dummy, you forgot to close your <p> tag” instead of the cryptic stuff they occasionally spit out.

WHY SO CRYPTIC, COMRADE?#section7

To be fair, cryptic validator messages are often the result of software limitations. The validator is not 2001’s HAL. If you forget to close one tag, the validator can’t possibly know that you intended to close it, and may thus report an error further down on the page instead of zeroing in on the real problem. The validator may point to an improperly nested tag that is, in fact, properly nested—but an earlier one is not, and that throws the validator for a loop.

As author, you are responsible for your own errors, whether generated by a (possibly misused) tool or marked up by hand. Knowing about the XHTML validator’s tendency to report nesting errors below where they actually occur can help you make sense of confusing error reports and get back on track quickly.

VALIDATOR BUGS#section9

The validators are also the products of human engineering, and thus, like all software, contain a few bugs. You should report these bugs when you encounter them (we tell you how below), but may feel intimidated about doing so, since you’re likelier to think your markup is at fault than to suspect that a powerful computer programmed by standards experts could be wrong. But every once in a great while, it can be.

In our recent encounter with Tidy, using incorrect preference settings, we got a web page that wasn’t quite right, and decided to fix the errors by hand. Without realizing it, we missed one error.

Obeying our ill–advised preference settings, Tidy had converted our encoded © copyright symbol to the Macintosh keyboard character for copyright. This keyboard character is fine for Macintosh–to–Macintosh document transfer, but is not advisable for the web. We ran the resulting page through the W3C validator and it passed with flying colors.

We next attempted to validate our style sheet, but W3C’s CSS validator told us it could not do so because of an error in our XHTML: “An invalid XML character (Unicode: 0xa9) was found in the element content of the document.”

CATCH-22#section10

Unfortunately, we could not search and replace “0xa9,” since 0xa9 was not a text string in our document. (It happens to be the copyright symbol in Unicode, but unless you’ve committed Unicode characters to memory, the validator’s message is not particularly helpful.)

The CSS validator provided Line and Column references for the error, and these could have proved useful in pinpointing the problem if they mapped to anything. But the references mapped to nothing since the CSS validator does not print out your markup.

The W3C markup validator does print out your XHTML markup, complete with Line references, but only if it thinks your markup is invalid. And as we’ve said, that W3C validator considered our markup kosher.

We thus found ourselves in a Catch–22. One validator said our page was good; the other choked on it and provided error reports we could not use.

Temporarily baffled, we uploaded the page anyway, and within an hour, Daily Report readers including Mark Howells, Zeke Runyon, and Dylan Foley had taken it upon themselves to proofread our source and find the error. We thanked them, corrected the error, and were back in business.

Had we been working on a client project instead of a personal site, we would not have uploaded the page until we had found and corrected the error. In most cases, it’s best to quit your HTML editor, go for a walk, and return later, with a clearer head.

REPORTING VALIDATOR BUGS#section11

Such problems are quite rare (and in our case, they could have been prevented by consulting Tidy’s user manual in the first place), but they do crop up. One friend of ours, who has been called the “greatest living web designer,” routinely types Windows keyboard characters into his source. Markup errors happen; validator errors (very occasionally) happen.

If you think the W3C XHTML validator is in error, visit the feedback page. To report possible CSS validator errors, write to www-validator-css@w3.org. (The email address listed on the CSS validator ReadMe page is non–functional because incomplete.) If the validator is indeed at fault—or you have strong reasons to think it is—be kind and considerate in reporting the error.

The W3C validators are a free resource maintained by knowledgeable individuals as a labor of love. Displays of petulance, though sometimes tempting, will either offend these hard–working folks or (more likely) make them wonder why you’re behaving so rudely, and prompt them to toss your note into the trash.

XHTML & BROWSERS#section12

So now you have a valid XHTML page. Will it look the way it used to? In some recent, standards–friendly browsers, it may not—but you can fix that quickly.

After converting the Daily Report from HTML 4.01 Transitional to XHTML 1.0 Transitional, our page was in no way different from the previous version except for the change in doctype and associated markup rules.

But IE6 and Mozilla/Netscape 6 decided it should look different than it used to. Here’s what IE6/Windows did to our menu bar:

IE6 mangles menu bar.

And here’s how Netscape 6.2/Mac felt about it:

NN6 mangles menu bar.

View a Daily Report of April 2002 to see how the menu is supposed to display.

MENDING THE GAPS#section13

To fix these (to our mind) glitches in MSIE6 and Mozilla/Netscape 6, we added two rules to a style sheet embedded in the page header:

img {display: block;}.inline {display: inline;}

The first rule fixed the menu bar. The second fixed layout problems caused elsewhere by the first rule. Where we wanted images to display inline, we added a class=“inline” attribute to the img tag. Problem solved.

If markup (structure) and visual display (design) are two different animals per W3C thinking, why did these browsers change our display, and how did we come up with the CSS rules that solved the problem?

STRICT INTERPRETATIONS & RANDOM QUIRKS#section14

For one thing, as noted in the Daily Report itself (26 January), experts disagree on how standards like CSS should be interpreted. In particular, they disagree on what styles (if any) should be applied by default to images that have not been styled by the page designer.

Eric Meyer’s Tables, Images, and Mysterious Gaps explains how the CSS experts at Mozilla interpret unstyled image tags in relation to the implied grid of every web page, and provides workarounds for those who don’t want extra space being added to their web layouts. The issue primarily affects “combination” layouts that use a mixture of ancient (tables) and modern (CSS) layout technologies. {Ed: Netscape may have moved the cited article since Better Living Through XHTML was first published.}

STRICT vs. STRICT#section15

Meyer’s helpful article states that Mozilla and its child browser, Netscape 6 only do this to (X)HTML documents with strict doctypes, but that may be unintentionally misleading, as it seems to suggest that extra space is applied to images only in HTML Strict or XHTML Strict. A glance at many web design mailing lists will show you that this is the popular interpretation of the word “strict” in this context.

In fact, what Meyer and his colleagues mean by “strict doctypes” is “complete” (or valid) doctypes, i.e. any document—even HTML 4.01 Transitional—that includes a full URI. (Meyer is not misusing the word strict; it’s just that the word means different things in different contexts.)

In practice, we’ve found that Netscape 6.x applies its experts’ strict CSS interpretation to some HTML 4.01 Transitional documents with complete doctypes, and not to others. This may be because Mozilla is still in Beta, hence Netscape 6.x is still unfinished; or it may indicate an underlying principle that we’ve failed to discern.

More to the point for our present purposes, Mozilla/NN6 always applies this CSS interpretation—and thus, this extra space—to pages authored in XHTML (Strict or Transitional). The moment you convert to XHTML, images contained in table cells will do to your layout what Germany did to Poland.

DOCTYPES AND DISPLAY#section16

Most CSS–compliant browsers use the presence or absence of a complete doctype to trigger standards–compliant or backward–compatible (“Quirks mode”) presentation, respectively, a practice first suggested as far as we know) by Todd Fahrner in 1998, and first implemented by IE5/Mac in March, 2000.

Mozilla/NN6 follows this pattern, as does IE6/Win. IE6 also includes a DOM property that tells whether standards–compliant mode is switched on for a given document.

When in “standards” mode, a compliant browser assumes that you know what you’re doing and displays your page per W3C specs. In “Quirks” mode, the browser surmises that you’ve crafted an old–fashioned, probably invalid page, and displays it as an older browser might. You control which tack the browser takes by including or excluding a complete (X)HTML doctype.

See Fix Your Site With the Right DOCTYPE! to learn which doctype you should use for your web project.

(There’s one exception to this rule: Mozilla/NN6, in common with MSIE, treats HTML 4.0 pages—even those with complete doctypes—in backward–compatible “Quirks” mode. So if you’re not quite ready for XHTML, but you’re writing valid HTML and CSS and want the browser to display your page correctly, choose an HTML 4.01 doctype. Of course, we encourage you to use XHTML instead.)

After converting to XHTML, if your images begin invading the borders of neighboring countries, you’ll have to take a few minutes to add compensatory rules to your style sheets. Each layout is different, so no single CSS rule or collection of rules will solve every problem, but Eric Meyer’s article and the style rules we used and have listed above should provide a starting point, and this extra work should not take you much time at all.

WHITE SPACE AND DISPLAY#section17

Through Eric Meyer’s article, Mozilla/Netscape has documented why it acts as it does. We’re not sure why IE6/Win changed its display when we updated our page’s doctype to XHTML. (Both versions—the old HTML 4.01 Transitional and the new XHTML 1.0 Transitional—used complete doctypes.)

We think it may have to do with the way some browsers handle white space. Each of the two tags below is functionally equivalent, but because of their varying use (or non–use) of white space, they might display differently in a browser that attempts to parse white space in markup. Thus:

… might display differently than:

The second example—the one with white space in its markup—might result in unwanted visual gaps on your web page. Likewise in the example below. The first tag (with no white space)…

… could look different in your browser than the functionally identical:

Why does this happen? The “whitespace bug” was a known problem in Netscape Navigator dating back to Version 3.0 (if not earlier). When Microsoft decided to build a competing browser, its engineers emulated much of Netscape’s behavior—including some of its bugs. Our guess is that MSIE6 continues to emulate this old Netscape bug.

Regardless of why IE6 behaved as it did, our additional rules (display: block) fixed the problem in that browser as well. Your mileage may vary, but some version of (display: block) will probably solve your design problem in both Mozilla/NN6 and IE6.

BETTER LIVING THROUGH XHTML#section18

When properly used, W3C standards enhance accessibility and promise long–term durability (which we call “forward compatibility”) for any document published on the web. If you care to reach the largest audience for the longest time possible, you want to work with web standards, and where document structure is concerned, XHTML is the way to go.

While some W3C standards are intended to help experts accomplish sophisticated tasks, markup (XHTML) and style sheets are for everyone, and W3C has taken pains to pave the road to XHTML.

The rules of XHTML take minutes to learn and the benefits of XHTML are vast. It is easy to author in XHTML and equally easy to convert HTML to XHTML by hand. Tools like Tidy can help automate the process as long as you take a few minutes to read the documentation before pushing the button.

Free online validators help ensure that your XHTML and CSS are kosher, though error reporting may sometimes, momentarily, confuse you, and in very rare instances the validators can misbehave.

After converting to XHTML, you may need to adjust your style sheet to compensate for some browsers’ default presentation of images, particularly when they occur in table cells, but if you make this a part of your work routine it can become second nature. And as new browsers continue to gain market share, we’ll be doing less and less design work with tables, and more and more through CSS.

With a little care and feeding, XHTML will help your sites work better in more browsers and devices, thus reaching greater numbers of readers, now and for years to come. What more could you ask?

No Comments

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.