Inline XML

by Lachlan CannonNovember 01, 2002

As I sat at my desk a few days ago, I suddenly wondered why HTML includes a <code> tag, and a <var> tag, and yet it takes marking up code no further than that. It’d be understandable to have just the <code> tag, but if they’re going to
have a <var> tag, shouldn’t they have more programming tags?

Article Continues Below

The eventual product of this line of thought was that the W3C was too
busy to produce a proper code markup inside the HTML specification,
since the specification has to include so much else. So if they’re
not going to add a proper code markup language, then how would we
mark up our script examples? “Aha!” I thought to myself. “The perfect
chance to use XML.”

Why would we use XML here? The main reason that I initially wanted to
be able to mark up my example scripts was so that I could color-code
their components for easy reading. There are three paths we could
take in order to do that.

We could take the path that sites such as Zend.com have taken and mark it up with s, but having ditched fonts way back on the highway, do we want to waste fuel and progress going back to get ’em?
Nah.

Alternatively, we could use s and classes to mark up all the
code. However, this approach is almost as bad as the  one. Yes,
we’re using CSS instead of embedding presentation inside the
structure, but we’re also burying semantic meaning inside our class
attribute.

The only other way of marking up this content is with the use of
inline XML. By taking this approach, we’re:

Marking content up by what it means, and letting anything that
browses the document, human or machine know what that meaning is.
Not cluttering our code up with unneeded characters (<var> is both
more concise than  and more meaningful and concise
than ).
Allowing ourselves to expand a tag if we want to, via attributes (eg <var type=“string”>).

Namespaces

Which brings us to the point of this article — how to use multiple
XML languages together in a single document by using namespaces.
Let’s say that I wanted to mark up a fragment of
code I was explaining like so:

<script type="PHP">
function layEgg »
  ($size, »
   $color) {
     //No eggs of negative size
     if($egg <= 0) {
         return »
  false;
    }
     ...
}

</script>
(Line wraps are marked ».  –Ed.)

Obviously we can’t do it just like that — if any of our script tags

have the same name as our HTML tags then how will we be able to tell

the difference between them?


The answer lies with XML namespaces. Do you remember the

 xmlns = “http://www.w3.org/1999/xhtml” line

you’ve had to add to your HTML tag? That’s an XML namespace, which

has been used to identify that you’re writing your markup using the

language whose unique corresponding url is

http://www.w3.org/1999/xhtml. If you don’t know what I’m talking

about, the NYPL style guide will enlighten you.


Side note: The xmlns = “http://www.w3.org/1999/xhtml” identifies the

XHTML namespace as the default namespace for all tags which don’t

declare one of their own.


The reason that you have to add this line to your HTML tag is so that

you can now include different languages within your document, such as

an XML-based “script markup language,” without having tag names clash

with each other. To add another language to your document you have to add

extra namespaces to your HTML tag (note that you can add it to any

element which is an ancestor of the tags which use that namespace,

but it’s easiest to put them all on the HTML tag). To do this we

first have to decide on a unique prefix for these tags — let’s use

“s” for ours. The next thing we need is a unique namespace —  a url

which doesn’t have to point to anything. Let’s use http://www.alistapart.com/stories/inlinexml/.

Now, to add our new namespace to the document we add the code

xmlns:s = “http://www.alistapart.com/stories/inlinexml/” next to our other namespace in the

HTML tag.


Putting them to work



Now we have the namespace declared in the document tag. The next

thing we need to do is prefix all the XML tags in our document with

their unique prefix followed by a colon. The previous example now

becomes:


function layEgg »
  ($size, »
  $color) {
     //No eggs of negative size
     if($egg <= 0) {
         return »
  false;
    }
     ...
}



Now that we’ve done this we can manipulate our XML to display in

various colors, by using the usual selectors. Note that the colon in

the name must be escaped with a backslash, so that it isn’t confused

with a pseudo-class (such as :link). An example styling of our code

could be as follows:

<style>
s:script[language="PHP"] {
     color: black;
}
s:var {
     color: blue;
}
s:comment {
 color: green;
}
...
</style>


Unfortunately, styling these elements doesn’t work in even some very

recent browsers such as Opera 6/Win. The good thing is of course that

all browsers will ignore unknown tags and just display the text

within them,

so as long as you don’t need to format anything as a block element,

or otherwise out of the normal flow, your XML should degrade

perfectly.

Caveat #section2
There is a downside to everything, and the downside to including other XML languages in your XHTML documents is that they will no longer pass validation. This points more to a problem with validator services and DTDs than it does with including other XML languages inside your XHTML.
The W3C, being aware of the problem, have issued a few DTDs which allow you to combine several different XML languages together in one document, such as XHTML+MathML and XHTML+SVG+MathML. Presumably the authors of other XML languages which could benefit from being included in XHTML will follow suit with their own similar DTDs.
In the meantime, authors can still use the validator to check that their HTML is written correctly, ignoring errors about the elements from other namespaces they have included. The other path adventurous authors can take is attempting to create their own XHTML variant using the modularisation of XHTML, although proper XML validators rather than the HTML validator need to be used to validate these documents.


Wrapping up



So now you’ve seen how to declare an XML namespace, and how to label

its elements within your documents as well as how to style them. You

could also use JavaScript to manipulate the tags if you wanted to, of

course.
As for other applications? You’re limited only by your imagination

(and browsers). Why not make and use an inline recipe markup

language, or book markup language? What about a shopping cart markup

language? There are

no limits!


Extra reading



 Scott Andrew
Namespaces Recommendation 

Like this:#section3
Like Loading...
		
			
				
									
						
							Further reading about							HTML
						
													
								
									The Future of Web Software Is HTML-over-WebSockets									
								The future of web app development is taking shape, and it's changing the way we think about server-side app architecture. In this article, Matt E. Patterson shows why a new WebSockets-driven approach is catching developers' attention, and how it can mean faster, easier development that results in an experience just as rich as client-side SPAs.
							
														
								
									Conversations with Robots: Voice, Smart Agents & the Case for Structured Content									
								Voice user interfaces, smart software agents, and AI-powered search are changing the way users—and computers—interact with content. Whether or not you’re building services for these emerging technologies, structured content is now necessary to ensure the accuracy and integrity of your content across the evolving digital landscape.

24 Reader Comments

Anonymous says:

November 1, 2002 at 7:00 am

test
Eric J says:

November 1, 2002 at 7:11 am

Heads up. The NYPL style guide referenced in the story is actually http://www.nypl.org/styleguide/xhtml/guidelines.html instead of http://www.nypl.org/styleguide/XHTML/guidelines.html. Apparently the caps make a difference. Great article by the way.
Tk says:

November 1, 2002 at 7:58 am

Any way you could toss in the referenced code marked up as specified by the author? Would be nice to be able to see how it looks in different browsers without having to recreate the snippet on my own.
Anonymous says:

November 1, 2002 at 8:13 am

Yeah, let’s see the example rendered in the article.
Steve Linberg says:

November 1, 2002 at 8:59 am

Interesting article, but you make one point I’m not sure I understand:

“Alternatively, we could use s and classes to mark up all the code. However, this approach is almost as bad as the one. Yes, we’re using CSS instead of embedding presentation inside the structure, but we’re also burying semantic meaning inside our class attribute. ”

Burying semantic meaning inside our class attribute? You lost me there. A class attribute is just a name, it might as well be a gid or other code. It doesn’t have to have semantic meaning at all, but what it SHOULD do is tell you what kind of our is. Seems to me that that’s the whole point, isn’t it?

I guess what I don’t understand is why it’s a problem, or something to be avoided. and seem to me like useful applications of the class attribute.

Or are you saying you want to be able to set a “type” on a tag without attaching it to a presentation rule in a stylesheet? ?

(hope the parser escapes the pseudo-tags in this post; if not, I’ll repost without them.)
mad_city_man says:

November 1, 2002 at 10:38 am

What Lachlan meant was that the class attribute is often used on the generic span tag to denote some level of information. For example Lachlan would denote the author to someone reading the code. “Author” is not a type of span, span being a generic container for styling.
http://www.w3.org/TR/html4/struct/global.html#h-7.5.4
sco says:

November 1, 2002 at 5:49 pm

Mad City Man, the URL you provided to the HTML spec seems to back up the point that Steve brought up. According to the spec, the class attribute is appropriate “to achieve the desired structural and presentational effects” … “since HTML does not include elements that identify objects such as “client”, “telephone number”, “email address”, etc.”

I’d say that the class attribute should *definitely* be used to embed semantic distinctions– that’s what it’s there for.

Lachlan wrote: “Marking content up by what it means, and letting anything that browses the document, human or machine know what that meaning is.”

Well… not exactly. If you just make up new dialects all the time, like the one in this article, then humans or machines can’t just automagically know what the markup “means”. For anything to be meaningful, there has to be some agreement beforehand. So, embedding terms from the Dublin Core might be useful, but the markup used in this article is no more useful– but precisely as “meaningful”– as using DIVs and SPANs with class attributes.

(Except that the inline XML method won’t have the desired results in some major browsers. Reality is such a pain.)
Stephen says:

November 1, 2002 at 8:56 pm

Ok, I’m not getting one bit of this article and I’m assuming it is because I am missing a basic understanding of XML. I do get XHTML and CSS, but never managed to figure out the purpose or use of XML (but I do see the power in it). I’m assuming I’m missing some basic general understanding, so could someone show me a site/tutorial, or maybe even help a newby out and explain this for me?
Brett says:

November 1, 2002 at 9:06 pm

without a reasonable example. I’ve done custom tags by namespaces but many who will read the article have not. They will need to see results. The author has come up with an excellent reason to apply this technique, so follow-thru can be esp. helpful here.

Mozilla devotees may point out that browser as capable of styling custom tags directly. This is easier and more intuitive but the namespace route is better in the long run for the reasons the author gave.

Pretty slick!
Lach says:

November 2, 2002 at 1:01 am

I have now posted the example used in this article on my site at http://illuminosity.net/writing/articles/inlinexml/colon-escape.html for people to see in its full “glory”. With regards to the class versus namespaces issue, classes are no doubt the solution to be used for now, for reasons of browser support etc, however namespaces allow easier styling and parsing of the extra elements once they are well supported. for example to give the style class the equivalent of the php type attribute I use one would would have to create variants within the class, such as class “script php”. Once you start wanting a few different attributes this method becomes very messy to understand, and harder to parse if you want to draw specific things out of your example. If, for example, you wanted to collect all scripts on your site, it’s much easier to look through for elements , than for

. there are other reasons to get to know namespaces too, such as combining existant xml languages together, such as svg with xhtml, or mathml with svg.

This article, rather than trying to trigger a holy war about the best methods to use was instead meant to be a simple introduction to how to use namespaces within xml documents.
apartness says:

November 2, 2002 at 10:18 am

Hopefully those readers who were initially unhappy can now focus on the article’s subject. 🙂
Anonymous says:

November 2, 2002 at 4:45 pm

I was surprised but the example didn’t work in Mozilla (1.2b) for me. The CSS color coding was missing. Also the DOCTYPE declaration should be removed as it doesn’t conform to XHTML 1.1. And lastly, the document should have been served with text/xml, application/xml, application/xhtml+xml, not text/html.
Anonymous says:

November 2, 2002 at 4:47 pm

*And lastly, the document should have been served with text/xml, application/xml, or application/xhtml+xml, not text/html.
Alun David Bestor says:

November 2, 2002 at 8:19 pm

I agree with Lach’s last point about namespaced elements being more appropriate vehicles for semantic meaning than class attributes; and the article highlights (X)HTML’s shotgun approach to semantic markup.

I’ve done several academic publishing projects in HTML and been infuriated by the hit-and-miss approach of the default semantic elements: (X)HTML has , , and so obviously someone went overboard on programming semantics, but there's not a single element to mark up an author's name or the title of a publication. There's of course, but that's only really appropriate when you're actually citing a source. Hence I have to fill my markup with and . Obviously this isn't an editorial playground, but it's something that's bugged me since day one ;)
Also, I believe IE as of version 5 or so has allowed you to apply style to unknown tags, just as Mozilla has, so the browser should work with custom tags with or without proper namespacing.
Tyler says:

November 3, 2002 at 6:59 pm

Go back to ‘class’.

As others have pointed out, using the ‘class’ attribute to specify semantic details is not only just fine, but expected. Think of the ‘class’ attribute as more of a “subclass” attribute – in that it allows you to “subclass” a particular element and provide a more specific instance thereof. In the case of and

, since they are the most generic of tags, subclassing them results in an element that is semantically richer, but not much (if any) richer than any other semantic specific element (like

) without a class attribute.

is semantically close enough to to be just fine for now (until 95% of folks are using browsers with sufficient native XML support (parsing, selectors, dom, schema etc.)).
Until then, if you must use your own XML tags, consider transforming them server side to span/div tags with the appropriate class before sending them to the client.

Would Namespaces by any other name smell so foul?

Namespaces are bugly. They are perhaps the worst aesthetic corruption of markup to come out of W3C. Ick. They have enabled such ugliness as XLink, which has polluted the latest document formats (like SVG – explain to me again why SVG needed a different element instead of simply reusing what was in XHTML?).

Modern selectors

The “:” syntax was an experimental way to specify namespaces in CSS selectors that has since been abandoned. Go here:

http://www.w3.org/TR/css3-selectors/#typenmsp

For the right way to use selectors with namespaces (if you must).

Tyler
Lach says:

November 3, 2002 at 10:00 pm

Tyler, if you read the comments which are posted here I did say that *for now* using classes is no doubt the better method. Hopefully sufficient browser support is not too far up the track, and by that time people will be ready to use namespaces.

Secondly, I don’t think namespaces are ugly at all. The syntax works just right for how they should work, but if you have a problem with it, you should take it up on a W3C mailing list.

Thirdly, not all languages could re-use xhtml semantics. Browsers can’t just assume that because an element has the same name it’ll mean the same thing, or xml would be pointless, you might as well just add tags to xhtml. Therefore, more generic technologies need to be developed for xml, to be able to apply for all xml languages if necessary.

Lastly, thankyou for the updated selector link. Do you know how support for that compares with support for the droped “:”?
Anonymous says:

November 5, 2002 at 8:29 pm

: is a valid way to match elements “based on their fully qualified name”.

See: http://www.w3.org/TR/css3-selectors/#downlevel
TommyH says:

November 6, 2002 at 7:24 am

Funny thing is that I was just finishing up a page of demo code for my new Web site. I’d been using classes to colour code them. Nice idea, but I like validating my docs plus I don’t think ditching Opera users is ok. Maybe Netscape/Internet Explorer 4 users, but not above that mark. In future I will use inline XML to colour code my work, but for now I’ll use what I know works.

Thanks, short but sweet article. =)
Jeff says:

November 8, 2002 at 12:01 pm

In the caveat it talks about how mix-name-spaced documents don’t validate against a DTD, unless there is a DTD specifically for that collection fo mix-name-spaces (e.g. XHTML+MathML and XHTML+SVG+MathML).

I know that this is becuase DTD’s arent name space aware.
My question is, if we ever get to the point of using some other validation technology/language (such as xml Schema or relax ng, etc (now please don’t start a flame war about xml schema. I’m only using it as an example of “some other form of validation” – I’ll just generically call them all SuperSchema), will we be able to validate a specific namespace from a mixed namespace document. For instance, say I have xhtml + mynamespace, could I validate it against the SuperSchema for “xhtml” and it will say “As fas as the xhtml in this document is concerned, it validates. The other namespace is up to you.” I could then validate it against a SuperSchema for mynames space. In other words I could validate my document against a SuperSchema for each langauge contained in the mix, and none of them would blow-up becuase of the other namespaces.

Also – Thanks for this article. It is very helpful.
Lach says:

November 9, 2002 at 6:19 am

Just a heads up. Peter Janes has helped to rework my example using the @namepsace selector method. It’s available at: http://illuminosity.net/writing/articles/inlinexml/at-namespace.html .

Jeff, I would hope that this is where we are heading. One day I would like to be able to write up my document using mixed and matched vocabularies, tell my editor to validate, and it would validate against the main schema, saying that yes it’s fine for these documents from other vocabularies to be used at these points, and the fragments from all the different vocabularies validate against their own vocabulary schema. I’m not aware of whether or not there are any schema languages which can do this yet, but there are a few efforts going into producing different schemae, and hopefully they will gravitate towards working that way.
srini says:

July 9, 2003 at 2:57 am

what is the use of putting inlinefigure element in an XML document. once after putting the inlinefigure if i want to see the figure then what i have to do. please help me.
srini says:

July 9, 2003 at 2:57 am

what is the use of putting inlinefigure element in an XML document. once after putting the inlinefigure if i want to see the figure then what i have to do. please help me.
Marek Moehling says:

September 12, 2003 at 3:26 pm

as for color coding you can’t yet beat JavaScript:
http://www.byteshift.de/tips/css/get-rid-of-h1-formatting
(this demo is on a different subject, but uses JS for adding colors)
E Logo Design says:

October 9, 2003 at 4:42 pm

What about browsers compatibility?