Inline XML

As I sat at my desk a few days ago, I suddenly wondered why HTML includes a <code> tag, and a <var> tag, and yet it takes marking up code no further than that. It’d be understandable to have just the <code> tag, but if they’re going to
have a <var> tag, shouldn’t they have more programming tags?

Article Continues Below

The eventual product of this line of thought was that the W3C was too
busy to produce a proper code markup inside the HTML specification,
since the specification has to include so much else. So if they’re
not going to add a proper code markup language, then how would we
mark up our script examples? “Aha!” I thought to myself. “The perfect
chance to use XML.”

Why would we use XML here? The main reason that I initially wanted to
be able to mark up my example scripts was so that I could color-code
their components for easy reading. There are three paths we could
take in order to do that.

We could take the path that sites such as Zend.com have taken and mark it up with <font>s, but having ditched fonts way back on the highway, do we want to waste fuel and progress going back to get ’em?
Nah.

Alternatively, we could use <span>s and classes to mark up all the
code. However, this approach is almost as bad as the <font> one. Yes,
we’re using CSS instead of embedding presentation inside the
structure, but we’re also burying semantic meaning inside our class
attribute.

The only other way of marking up this content is with the use of
inline XML. By taking this approach, we’re:

  • Marking content up by what it means, and letting anything that
    browses the document, human or machine know what that meaning is.
  • Not cluttering our code up with unneeded characters (<var> is both
    more concise than <span class=“var”> and more meaningful and concise
    than <font color=“blue”>).
  • Allowing ourselves to expand a tag if we want to, via attributes (eg <var
    type=“string”>
    ).

Namespaces

Which brings us to the point of this article — how to use multiple
XML languages together in a single document by using namespaces.
Let’s say that I wanted to mark up a fragment of
code I was explaining like so:

<script type="PHP">
function layEgg »
  ($size, »
   $color) {
     //No eggs of negative size
     if($egg <= 0) {
         return »
  false;
    }
     ...
}

</script>

(Line wraps are marked ».  –Ed.)

Obviously we can’t do it just like that — if any of our script tags
have the same name as our HTML tags then how will we be able to tell
the difference between them?

The answer lies with XML namespaces. Do you remember the
xmlns = “http://www.w3.org/1999/xhtml” line
you’ve had to add to your HTML tag? That’s an XML namespace, which
has been used to identify that you’re writing your markup using the
language whose unique corresponding url is
http://www.w3.org/1999/xhtml. If you don’t know what I’m talking
about, the NYPL style guide will enlighten you.

Side note: The xmlns = “http://www.w3.org/1999/xhtml” identifies the
XHTML namespace as the default namespace for all tags which don’t
declare one of their own.

The reason that you have to add this line to your HTML tag is so that
you can now include different languages within your document, such as
an XML-based “script markup language,” without having tag names clash
with each other. To add another language to your document you have to add
extra namespaces to your HTML tag (note that you can add it to any
element which is an ancestor of the tags which use that namespace,
but it’s easiest to put them all on the HTML tag). To do this we
first have to decide on a unique prefix for these tags — let’s use
“s” for ours. The next thing we need is a unique namespace —  a url
which doesn’t have to point to anything. Let’s use http://www.alistapart.com/stories/inlinexml/.
Now, to add our new namespace to the document we add the code
xmlns:s = “http://www.alistapart.com/stories/inlinexml/” next to our other namespace in the
HTML tag.

Putting them to work

Now we have the namespace declared in the document tag. The next
thing we need to do is prefix all the XML tags in our document with
their unique prefix followed by a colon. The previous example now
becomes:


function layEgg »
  ($size, »
  $color) {
     //No eggs of negative size
     if($egg <= 0) {
         return »
  false;
    }
     ...
}

Now that we’ve done this we can manipulate our XML to display in
various colors, by using the usual selectors. Note that the colon in
the name must be escaped with a backslash, so that it isn’t confused
with a pseudo-class (such as :link). An example styling of our code
could be as follows:

<style>
s:script[language="PHP"] {
     color: black;
}
s:var {
     color: blue;
}
s:comment {
 color: green;
}
...
</style>

Unfortunately, styling these elements doesn’t work in even some very
recent browsers such as Opera 6/Win. The good thing is of course that
all browsers will ignore unknown tags and just display the text
within them,
so as long as you don’t need to format anything as a block element,
or otherwise out of the normal flow, your XML should degrade
perfectly.

Caveat #section2

There is a downside to everything, and the downside to including other XML languages in your XHTML documents is that they will no longer pass validation. This points more to a problem with validator services and DTDs than it does with including other XML languages inside your XHTML.

The W3C, being aware of the problem, have issued a few DTDs which allow you to combine several different XML languages together in one document, such as XHTML+MathML and XHTML+SVG+MathML. Presumably the authors of other XML languages which could benefit from being included in XHTML will follow suit with their own similar DTDs.

In the meantime, authors can still use the validator to check that their HTML is written correctly, ignoring errors about the elements from other namespaces they have included. The other path adventurous authors can take is attempting to create their own XHTML variant using the modularisation of XHTML, although proper XML validators rather than the HTML validator need to be used to validate these documents.

Wrapping up

So now you’ve seen how to declare an XML namespace, and how to label
its elements within your documents as well as how to style them. You
could also use JavaScript to manipulate the tags if you wanted to, of
course.

As for other applications? You’re limited only by your imagination
(and browsers). Why not make and use an inline recipe markup
language, or book markup language? What about a shopping cart markup
language? There are
no limits!

Extra reading

About the Author

Lachlan Cannon

Lachlan is an aspiring programmer as well as an admin and occasional content writer at evolt.org. His personal site uses the W3C XHTML+MathML Doctype.

24 Reader Comments

  1. Any way you could toss in the referenced code marked up as specified by the author? Would be nice to be able to see how it looks in different browsers without having to recreate the snippet on my own.

  2. Interesting article, but you make one point I’m not sure I understand:

    “Alternatively, we could use s and classes to mark up all the code. However, this approach is almost as bad as the one. Yes, we’re using CSS instead of embedding presentation inside the structure, but we’re also burying semantic meaning inside our class attribute. ”

    Burying semantic meaning inside our class attribute? You lost me there. A class attribute is just a name, it might as well be a gid or other code. It doesn’t have to have semantic meaning at all, but what it SHOULD do is tell you what kind of our is. Seems to me that that’s the whole point, isn’t it?

    I guess what I don’t understand is why it’s a problem, or something to be avoided. and seem to me like useful applications of the class attribute.

    Or are you saying you want to be able to set a “type” on a tag without attaching it to a presentation rule in a stylesheet? ?

    (hope the parser escapes the pseudo-tags in this post; if not, I’ll repost without them.)

  3. Mad City Man, the URL you provided to the HTML spec seems to back up the point that Steve brought up. According to the spec, the class attribute is appropriate “to achieve the desired structural and presentational effects” … “since HTML does not include elements that identify objects such as “client”, “telephone number”, “email address”, etc.”

    I’d say that the class attribute should *definitely* be used to embed semantic distinctions– that’s what it’s there for.

    Lachlan wrote: “Marking content up by what it means, and letting anything that browses the document, human or machine know what that meaning is.”

    Well… not exactly. If you just make up new dialects all the time, like the one in this article, then humans or machines can’t just automagically know what the markup “means”. For anything to be meaningful, there has to be some agreement beforehand. So, embedding terms from the Dublin Core might be useful, but the markup used in this article is no more useful– but precisely as “meaningful”– as using DIVs and SPANs with class attributes.

    (Except that the inline XML method won’t have the desired results in some major browsers. Reality is such a pain.)

  4. Ok, I’m not getting one bit of this article and I’m assuming it is because I am missing a basic understanding of XML. I do get XHTML and CSS, but never managed to figure out the purpose or use of XML (but I do see the power in it). I’m assuming I’m missing some basic general understanding, so could someone show me a site/tutorial, or maybe even help a newby out and explain this for me?

  5. without a reasonable example. I’ve done custom tags by namespaces but many who will read the article have not. They will need to see results. The author has come up with an excellent reason to apply this technique, so follow-thru can be esp. helpful here.

    Mozilla devotees may point out that browser as capable of styling custom tags directly. This is easier and more intuitive but the namespace route is better in the long run for the reasons the author gave.

    Pretty slick!

  6. I have now posted the example used in this article on my site at http://illuminosity.net/writing/articles/inlinexml/colon-escape.html for people to see in its full “glory”. With regards to the class versus namespaces issue, classes are no doubt the solution to be used for now, for reasons of browser support etc, however namespaces allow easier styling and parsing of the extra elements once they are well supported. for example to give the style class the equivalent of the php type attribute I use one would would have to create variants within the class, such as class “script php”. Once you start wanting a few different attributes this method becomes very messy to understand, and harder to parse if you want to draw specific things out of your example. If, for example, you wanted to collect all scripts on your site, it’s much easier to look through for elements , than for

    . there are other reasons to get to know namespaces too, such as combining existant xml languages together, such as svg with xhtml, or mathml with svg.

    This article, rather than trying to trigger a holy war about the best methods to use was instead meant to be a simple introduction to how to use namespaces within xml documents.

  7. I was surprised but the example didn’t work in Mozilla (1.2b) for me. The CSS color coding was missing. Also the DOCTYPE declaration should be removed as it doesn’t conform to XHTML 1.1. And lastly, the document should have been served with text/xml, application/xml, application/xhtml+xml, not text/html.

  8. *And lastly, the document should have been served with text/xml, application/xml, or application/xhtml+xml, not text/html.

  9. I agree with Lach’s last point about namespaced elements being more appropriate vehicles for semantic meaning than class attributes; and the article highlights (X)HTML’s shotgun approach to semantic markup.

    I’ve done several academic publishing projects in HTML and been infuriated by the hit-and-miss approach of the default semantic elements: (X)HTML has , , and so obviously someone went overboard on programming semantics, but there's not a single element to mark up an author's name or the title of a publication. There's of course, but that's only really appropriate when you're actually citing a source. Hence I have to fill my markup with and . Obviously this isn't an editorial playground, but it's something that's bugged me since day one ;)

    Also, I believe IE as of version 5 or so has allowed you to apply style to unknown tags, just as Mozilla has, so the browser should work with custom tags with or without proper namespacing.

  10. Go back to ‘class’.

    As others have pointed out, using the ‘class’ attribute to specify semantic details is not only just fine, but expected. Think of the ‘class’ attribute as more of a “subclass” attribute – in that it allows you to “subclass” a particular element and provide a more specific instance thereof. In the case of and

    , since they are the most generic of tags, subclassing them results in an element that is semantically richer, but not much (if any) richer than any other semantic specific element (like

    ) without a class attribute.

    is semantically close enough to to be just fine for now (until 95% of folks are using browsers with sufficient native XML support (parsing, selectors, dom, schema etc.)).
    Until then, if you must use your own XML tags, consider transforming them server side to span/div tags with the appropriate class before sending them to the client.

    Would Namespaces by any other name smell so foul?

    Namespaces are bugly. They are perhaps the worst aesthetic corruption of markup to come out of W3C. Ick. They have enabled such ugliness as XLink, which has polluted the latest document formats (like SVG – explain to me again why SVG needed a different element instead of simply reusing what was in XHTML?).

    Modern selectors

    The “:” syntax was an experimental way to specify namespaces in CSS selectors that has since been abandoned. Go here:

    http://www.w3.org/TR/css3-selectors/#typenmsp

    For the right way to use selectors with namespaces (if you must).

    Tyler

  11. Tyler, if you read the comments which are posted here I did say that *for now* using classes is no doubt the better method. Hopefully sufficient browser support is not too far up the track, and by that time people will be ready to use namespaces.

    Secondly, I don’t think namespaces are ugly at all. The syntax works just right for how they should work, but if you have a problem with it, you should take it up on a W3C mailing list.

    Thirdly, not all languages could re-use xhtml semantics. Browsers can’t just assume that because an element has the same name it’ll mean the same thing, or xml would be pointless, you might as well just add tags to xhtml. Therefore, more generic technologies need to be developed for xml, to be able to apply for all xml languages if necessary.

    Lastly, thankyou for the updated selector link. Do you know how support for that compares with support for the droped “:”?

  12. Funny thing is that I was just finishing up a page of demo code for my new Web site. I’d been using classes to colour code them. Nice idea, but I like validating my docs plus I don’t think ditching Opera users is ok. Maybe Netscape/Internet Explorer 4 users, but not above that mark. In future I will use inline XML to colour code my work, but for now I’ll use what I know works.

    Thanks, short but sweet article. =)

  13. In the caveat it talks about how mix-name-spaced documents don’t validate against a DTD, unless there is a DTD specifically for that collection fo mix-name-spaces (e.g. XHTML+MathML and XHTML+SVG+MathML).

    I know that this is becuase DTD’s arent name space aware.
    My question is, if we ever get to the point of using some other validation technology/language (such as xml Schema or relax ng, etc (now please don’t start a flame war about xml schema. I’m only using it as an example of “some other form of validation” – I’ll just generically call them all SuperSchema), will we be able to validate a specific namespace from a mixed namespace document. For instance, say I have xhtml + mynamespace, could I validate it against the SuperSchema for “xhtml” and it will say “As fas as the xhtml in this document is concerned, it validates. The other namespace is up to you.” I could then validate it against a SuperSchema for mynames space. In other words I could validate my document against a SuperSchema for each langauge contained in the mix, and none of them would blow-up becuase of the other namespaces.

    Also – Thanks for this article. It is very helpful.

  14. Just a heads up. Peter Janes has helped to rework my example using the @namepsace selector method. It’s available at: http://illuminosity.net/writing/articles/inlinexml/at-namespace.html .

    Jeff, I would hope that this is where we are heading. One day I would like to be able to write up my document using mixed and matched vocabularies, tell my editor to validate, and it would validate against the main schema, saying that yes it’s fine for these documents from other vocabularies to be used at these points, and the fragments from all the different vocabularies validate against their own vocabulary schema. I’m not aware of whether or not there are any schema languages which can do this yet, but there are a few efforts going into producing different schemae, and hopefully they will gravitate towards working that way.

  15. what is the use of putting inlinefigure element in an XML document. once after putting the inlinefigure if i want to see the figure then what i have to do. please help me.

  16. what is the use of putting inlinefigure element in an XML document. once after putting the inlinefigure if i want to see the figure then what i have to do. please help me.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA