Introduction to RDFa
Issue № 286

Introduction to RDFa

RDFa (“Resource Description Framework in attributes”) is having its five minutes of fame: Google is beginning to process RDFa and Microformats as it indexes websites, using the parsed data to enhance the display of search results with “rich snippets.” Yahoo!, meanwhile, has been processing RDFa for about a year. With these two giants of search on the same trajectory, a new kind of web is closer than ever before.

Article Continues Below

The web is designed to be consumed by humans, and much of the rich, useful information our websites contain, is inaccessible to machines. People can cope with all sorts of variations in layout, spelling, capitalization, color, position, and so on, and still absorb the intended meaning from the page. Machines, on the other hand, need some help.

A new kind of web—a semantic web—would be made up of information marked up in such a way that software can also easily understand it. Before considering how we might achieve such a web, let’s look at what we might be able to do with it.

Improved search#section1

Adding machine-friendly data to a web page improves our ability to search. Imagine a news story that says “today the prime minister flew to Australia,” in reference to Britain’s prime minister, Gordon Brown. The article might not call the prime minister by name, but it’s still pretty easy to ensure that this news story shows up when someone searches for “Gordon Brown.”

If the news story in question dates from 1940, however, we wouldn’t want this document to appear when users search for “Gordon Brown”—but we would want it to appear when they search for “Winston Churchill.”

To accomplish this using the same technique as the Gordon Brown example—i.e., by mapping one set of words to another—our search engine must know the start and end dates of the premierships of all British prime ministers, and then cross-reference those with the publication date of the newspaper article. This wouldn’t be completely impossible, but what if the article is a piece of fiction, or if it’s actually about the Australian prime minister? In these cases, a simple list of dates won’t help us.

The indexing algorithms that try to deduce necessary context from the text are sure to improve in the coming years, but extra markup that makes information unambiguous can only make search more accurate.

Improved user interfaces#section2

Yahoo! and Google have both begun to use RDFa to improve user experience by enhancing the appearance of individual search results. Here’s Google’s approach:

A rich snippet on Google

A rich snippet on Google.

…and here’s Yahoo!’s:

An enhanced result on Yahoo!

An enhanced results example on Yahoo!

There’s a commercial advantage to having a better “understanding” of the pages being indexed: more relevant, focused advertisements can be placed alongside search results.

Now that we know why we might want to put more machine-friendly data in our pages, we can ask how we might go about it.

HTML’s metadata features#section3

You’ll no doubt already be familiar with the basic metadata features that HTML supports. The most commonly used are the meta and link elements, and some people will also be aware that the @rel attribute used on link can also be used with a. (Note: I’ll be using the term “HTML” to mean “the HTML family of languages,” since what I’m saying applies equally to both HTML and XHTML.)

We’ll look at these existing features first, because they provide the conceptual foundation upon which RDFa has been built.

The HTML use of meta and link#section4

The meta and link elements live in the head of a document, and allow us to provide information that relates to that document. For example, I might want to say that I created my document on May 9th, 2009, that I am the author, and that I give other people the right to use the article however they want:

(Line wraps marked » —Ed.)


<html>
<head>
  <title>RDFa: Now everyone can have an API</title>
  <meta name="author" content="Mark Birbeck" />
  <meta name="created" content="2009-05-09" />
  <link rel="license" href="http://creativecommons.org/licenses/ » 
by-sa/3.0/" /> </head> . . . </html>

This example shows how HTML neatly packs the document’s metadata into a space distinct from the document’s text. HTML uses the head element for metadata and the body element for whatever content the web page contains.

HTML also allows us to blur these two areas: we can place the @rel attribute on a clickable link, yet retain the meaning that it contains in link.

Using @rel#section5

Imagine I want to allow my site visitors to view my Creative Commons license. As things stand, the information about which license I’m referring to is hidden from readers because it’s in the head. But that’s easily addressed by adding an anchor in the body:


<a href="http://creativecommons.org/licenses/by-sa/3.0/">
CC Attribution-ShareAlike</a>

This is fine, and it allows us to achieve our goals: first, we have machine-ready metadata in the head that describes the relationship between the document and the license:


<link rel="license" href="http://creativecommons.org/licenses/ » 
by-sa/3.0/" />

…and second, we have a link in the body, that allows a human to click through and read the license:


<a href="http://creativecommons.org/licenses/by-sa/3.0/">
CC Attribution-ShareAlike</a>

But HTML also allows us to use the @rel attribute of link on an anchor. In other words, it allows metadata that would normally go into the head of the document to appear in the body.

With this incredibly powerful technique, we can express both the metadata for machines, and the clickable link for humans, in one convenient package:


<a rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/">
CC Attribution-ShareAlike</a>

This simple method of augmenting inline markup with metadata is not often used in web pages, but it’s right at the heart of RDFa. This leads to the first principle of RDFa:

Rule 1:#section6

The link and a elements imply that there is a relationship between the current document and some other document; the @rel attribute allows us to provide a value that will better describe that relationship.

Don’t forget though: using @rel with a is merely taking advantage of an already existing HTML feature, which RDFa then draws attention to.

Applying distinct licenses to images#section7

The previous example provides licensing information about the web page that contains it. But what if the page contains multiple items, each of which has a different license? It doesn’t take more than a moment to think up scenarios where this would apply, such as a page of search results on Flickr, YouTube, or SlideShare.

RDFa takes the simple idea behind @rel—that it expresses a relationship between two things—and builds on it, by allowing the attribute to be applied to the @src attribute on the img element.

So, for example, imagine a page of search results on Flickr:


<img src="image1.png" />
<img src="image2.png" />

Let’s say that the first image is licensed with the Creative Commons Attribution-ShareAlike license, but that the second uses CC’s
Attribution-Noncommercial-No Derivative works license.

How should we mark it up?

If you guessed that we simply place the @rel attribute on the img tag, then you are exactly right. To express two
different licenses, one for each image, we simply do this:


<img src="image1.png"
  rel="license" href="http://creativecommons.org/licenses/by-sa/3.0/" />
<img src="image2.png"
  rel="license" href="http://creativecommons.org/licenses/ » 
by-nc-nd/3.0/" />

Here, you can see the core principle in action—incrementally building on the metadata features that HTML already provides. Building
on HTML concepts in this way makes it easier for people to orient themselves when using RDFa.

Rule 2:#section8

The @rel and @href attributes are no longer confined to the a and link elements, but can also be used on img to indicate a relationship between the image and some other item.

Adding properties to the body#section9

In our HTML illustration, we saw that we can also add textual properties about the document:


<meta name="author" content="Mark Birbeck" />
<meta name="created" content="2009-05-01" />

This tells us who created the document, and when, but it can only be used in the head of the document. RDFa takes this technique and embellishes it so that it can be used in body; @content is therefore no longer confined to the meta tag, but can appear on any element.

Rule 3:#section10

In ordinary HTML, properties are set in the head of the document, using @content with meta. In HTML documents with RDFa, @content can be used to set properties on any element.

There is a minor change from the way @content is used in head though, which is that since the @name attribute is already used for a different purpose in other parts of HTML, it would get a little confusing to also use it to represent the property name in the body. RDFa therefore provides a new attribute, called @property, to play this role.

Rule 4: #section11

Although HTML uses the @name property to set the name of a property on meta, it can’t be used on other elements, so RDFa provides a new attribute called @property.

Suppose our document’s publication date and author name are in the head of the document, and that the same information is in human-readable form in the body of the document:


<html>
<head>
  <title>RDFa: Now everyone can have an API</title>
  <meta name="author" content="Mark Birbeck" />
  <meta name="created" content="2009-05-09" />
</head>
<body>
  <h1>RDFa: Now everyone can have an API</h1>
  Author: <em>Mark Birbeck</em>
  Created: <em>May 9th, 2009</em>
</body>
</html>

With RDFa we can coalesce these two sets of information, so that the metadata is located at the same point as the readable text:


<html>
<head>
  <title>RDFa: Now everyone can have an API</title>
</head>
<body>
  <h1>RDFa: Now everyone can have an API</h1>
  Author: <em property="author" content="Mark Birbeck">
    Mark Birbeck</em>
  Published: <em property="created" content="2009-05-09">
    May 14th, 2009</em>
</body>
</html>

We’ll see in a moment how we can improve on this example. For now we just need to recognize that whether the metadata appears in the body of the document or the head, it means the same thing—and that this is merely the text property equivalent of the @rel technique that HTML already has for expressing relationships in body.

Using vocabularies#section12

We have to take a small diversion here. We can get away with using @name="author" in the document head because even though the property “author” is not defined in any specification, over the years people have come to expect it. But RDFa allows—and requires—much greater precision. When we use a term such as “author” or “created,” we need to indicate where that term comes from. If we don’t, we have no way to know if what you mean by “author” is the same thing I mean.

This may seem unnecessary. After all, how could anyone confuse an obvious term such as “author”? But imagine that the term is “country” on a holiday website; does that term define the country the holiday is in, or does it indicate that the holiday takes place in the country, rather than in the city? Many other words also have different meanings in different contexts, and if you then add to that the possibility of different languages, you’ll soon realize that if we want to make any headway with our data, we need to be precise. And that means indicating where our terms come from.

In RDFa, we do this by indicating that we want to use a certain collection of terms, or vocabulary. This is easily done—just specify the address of the vocabulary, in conjunction with a short-form map, like this:


xmlns:dc="http://purl.org/dc/terms/"

(If you understand XML, you’ll recognize this as the syntax for an XML namespace declaration.)

This example provides us access to the list of terms from the Dublin Core vocabulary, by way of the prefix “dc.” Dublin Core has many terms available to us, and the two we’ll use in our example are “creator” and “created.” To put them to work, we need to place the prefix in front of them, like so:


dc:creator
dc:created

Now it’s completely clear: “dc:creator” is not the same as “xyz:creator.”

Note that the prefix mapping needs to be placed in the document somewhere “above” the location where it will be used. In our example, it could be placed on the body element or the html element. The full example might look like this:


<html xmlns:dc="http://purl.org/dc/terms/">
 <head>
  <title>RDFa: Now everyone can have an API</title>
 </head>
 <body>
  <h1>RDFa: Now everyone can have an API</h1>
  Author: <em property="dc:creator" content="Mark Birbeck">
    Mark Birbeck</em>

  Published: <em property="dc:created" content="2009-05-09">
    May 9th, 2009</em>

 </body>
</html>

There are plenty of other vocabularies to choose from, and I’ll list a few more in the next article in this series. Of course, there is nothing to stop you from inventing your own for use within your company, organization, or interest group. But note one thing that often surprises people: there is no central organization to police your work. There are best practices to follow. However, with power comes responsibility, so try to find out as much as you can about the process before you start work on a new vocabulary.

Before we return to our example, I should add one last point about vocabularies; you will no doubt be wondering why
@rel="license" didn’t get the same treatment as @property="author", and require a prefix. The answer is that HTML already has some built-in values used with @rel (such as “next” and “prev”), and RDFa adds a few more. One of those added by RDFa is “license.”

But once you want to go outside of this list of values—for example, to use a term from the Dublin Core vocabulary such as “replaces” or a term from FOAF such as “knows” — then you must use the prefix mapping technique in exactly the same way as we have for @property.

For example, say our article not only has a CC license as we saw before, but it also replaces some other document—a relationship we can express using Dublin Core’s “replaces” term. We express these two relationships like this:


<html xmlns:dc="http://purl.org/dc/terms/">
 <head>
  <title>RDFa: Now everyone can have an API</title>
 </head>
 <body>
  <h1>RDFa: Now everyone can have an API</h1>
  Author: <em property="dc:creator" content="Mark Birbeck">
    Mark Birbeck</em>

  Created: <em property="dc:created" content="2009-05-09">
    May 9th, 2009</em>

  License: <a rel="license" href="http://creativecommons.org/licenses/ » 
by-sa/3.0/"> CC Attribution-ShareAlike</a> Previous version: <a rel="dc:replaces" href="rdfa.0.8.html"> version 0.8</a> </body> </html>

Now that we understand vocabularies, let’s get back to our main example.

Using inline text to set the value of a property#section13

In the previous example, the duplication of the text “Mark Birbeck” in both the @content attribute and the inline text may have jarred you. If it did, you’re certainly getting into the swing of RDFa. We can indeed remove the @content value if the inline text holds the value that we want to use for metadata:


Author: <em property="dc:creator">Mark Birbeck</em>

Rule 5: #section14

If no @content attribute is present, then the value of a property will be set using the element’s inline text.

Although the @content technique is derived from HTML’s meta element, think of the preceding example as the “default” way to set a property. Providing a @content value can be a way to override the inline value, if it doesn’t quite say what you want. It also allows authors more leeway with the text that the user reads, since they can be more precise within the embedded data. The publication date illustrates this; all of the data in the following examples have the same meaning, yet give very different presentations to the reader:


<span property="dc:created" content="2009-05-14">May 14th, 2009</span>
<span property="dc:created" content="2009-05-14">May 14th</span>
<span property="dc:created" content="2009-05-14">14th May</span>
<span property="dc:created" content="2009-05-14">14/05/09</span>
<span property="dc:created" content="2009-05-14">tomorrow</span>
<span property="dc:created" content="2009-05-14">yesterday</span>
<span property="dc:created" content="2009-05-14">14 Mai, 2009</span>
<span property="dc:created" content="2009-05-14">14 maggio, 2009</span>

Rule 6: #section15

If the @content attribute is present, it overrides the value in the element’s inline text to set the value of the property.

In the next issue of ALA, we’ll learn how to add properties to an image—and how to add metadata to any item.

About the Author

Mark Birbeck

Mark is managing director of Backplane Ltd., a London-based company involved in a number of RDFa/linked data projects for UK government departments. He is the original proposer of RDFa, and has spoken on the subject at various events.

24 Reader Comments

  1. You have not really made it clear what the real purpose of RDFa is… I don’t really understand what would be the advantage of defining metadata _inline_…

    It seems like we always tend to try and seperate different categories of data; as web developers, we can seperate data from markup (XSLT), we can seperate content from styling (CSS) and we can seperate content from action (Javascript), for instance. Having inline metadata reminds me of the days of inline CSS… ~shudder

    Now I understand the need to target specific parts of the content with additional machine-readable information, but there should be a way to do so without having to clutter up the beautiful, clean, concise markup. Something like an external metadata file?

    I think RDFa might be suitable for a restricted number of projects such as mentioned in the article (Youtube, Yahoo…) but it still has this _experimental_ feel to it. I don’t think the technology is really up to par with the common coding practices of the semantic web.

    What do you think?

  2. hey there, thanks for the article!

    i have been coding with MFs for some time now, and am wondering, you mention that Yahoo! “has been processing RDFa for about a year”, but you don’t mention anything about MFs with them, are they not bothering with MFs, or have they specifically sided toward RDFa?

    other than possibly that, any real strength of one over the other?

    i would rather not have to code BOTH into my mark-up, and, while i have no allegiance to one or the other, i guess i would lean toward MFs, just because i already kind of know them…

    thanks,
    Atg

  3. RDFa/MFs are simply to provide better “semantic” value to your mark-up, meaning, to give users (be they human or bot) a better explanation of exactly “what” the data is, not just what is “says”…

    a date is, i think, the easiest example, and i can think of few people that do a better job of explaining MFs, specifically, than Jeremy Keith:
    http://adactio.com/journal/1579

    another great example of the power MFs/RDFa can bring to the web, check out these guys:
    http://visitmix.com/Lab/Oomph

    cheers,
    Atg

  4. @epgui: Most authors do not create separate, semantic views of data such as RDF and RDFa allows them (and authoring tools) to add machine-readable attributes to their HTML documents and these attributes are part of the content, rather than presentation or behavior; this becomes more important as content (pieces of documents, such as images) gets shared and remixed across sites and pages.

    @Aaron: Microformats provide a similar approach, but they are design patterns (based on defined vocabularies such as dates, contact information, etc.) whereas RDFa enables authors to use an existing vocabulary _or_ create a new vocabulary to describe _anything_ in a semantic manner.

  5. I understand the value of microformats… However, I think the current syntax for highlighting semantic data is not appropriate for a world wide web standard.

    Maybe I am missing the point, it’s just that I don’t like the idea of having all that metadata _inline_, in the body of my markup. I can see how it would quickly become very unnessecerily cluttered on a very regular project, say, a blog.

    I could see that the benefits of the technology for internet bigshots (again, like Youtube, Google and Yahoo, for instance) would largely outweight the inconvenience…

    I am not one to define web standards, but I am thinking that an external metadata file would be very nice. I would guess it could be easily done with an XPath-like syntax for selecting nodes for which to specify metadata. The whole thing could be just an external XML document.

    I am unaware whether or not something similar already exists.

    I don’t know if that was more clear of a thought =P

  6. Right! You have a very good point, didn’t think of that. A technology like RDFa would prove to be very useful for content that is to be shared across domains or services.

    I’m sure there are also other advantages!

  7. @epgui,

    Interesting question.

    I’d generally see the information added using RDFa as being another ‘version’ of the information you already have in your document. So in that sense it is not like CSS or JavaScript, which are completely different to your inline content.

    For example, if I write:

    _ This article was published today._

    it’s not clear when the article was published. But a tag and some RDFa can make this completely clear:

    _ This article was published today._

    Have more precise information could be used to improve indexing and search, or be displayed as a tooltip in a browser, and so on. But note that I haven’t added anything ‘new’ to the document, in the way that you do when you say:

    _div { color: red; }_

    Instead, I’ve simply written June 23rd, 2009 in two different ways.

    Regards,

    Mark

  8. Microformats is supported by Safari, Firefox (with plugins). Is there any such support for existing RDFa vocabularies? Will these RDFa attributes be valid HTML 5?

  9. First off, great article!

    Definitely helpful in distinguishing RDFa from microformats. I’ve only recently started using microformats and have been a little fuzzy on the differences, pros/cons, etc. between the two.

    For those wanting more info on microformats, I highly suggest reading Emily Lewis’ series on microformats:
    http://www.ablognotlimited.com/articles/getting-semantic-with-microformats-introduction/

    I am looking forward to part 2 of this RDFa primer.

    Thanks again,

    Jason

  10. Following the discussion in post 8.

    From a usability perspective I like the concept of RDFa’s as it provides greater context to the information that you’re providing to users on the web.

    However, as an author creates content, they so to need an easy way of adding the RDFa descriptors. I suppose it’s another piece of training that needs to be done, learning the vocabulary of Dublin Core. How does this integrate with CMS systems? Does the insertion of RDFa’s lie with the author or the developer?

  11. So if we’ve declared that we’re using a specific vocabulary up at the top, for example Dublin Core:

    _<html xmlns:dc=”http://purl.org/dc/terms/”>_

    Why is it that the property atribute also has to contain the vocabulary prefix (*dc*:) such as:

    __

    instead of only _”creator”_?

    Does that mean that it’s possible to reference multiple vocabularies at the top and therefore use multiple properties when identifying an author in inline text?

    _

  12. This style of meta data on the web is flawed in every sense and is a step backwards. Semantics require smarter computing, and smarter indexing, but allow for a more conventional information ideology, and less of a flawed system of ever-changing configuration standards.

    Semantic information processing is the way of the future. Remember where you read this cause I’m gonna quote myself later. Semantics vs RDFa is like Capitalism vs Communism. One allows for natural progression and adapts, setting new rules (ie, XHMTL…Banking regulations), and the other tries to create the mother of all systems that accounts for everything… except for oh, we didn’t expect that, or that, wait… followed by years of web clean up again just like html4 left behind (the ‘ie’ for communism’s comparison would be RDFa…Kyoto Protocol which attempts to understand all things science here and now and makes rules for each attribute, all set in stone).

    RDFa even goes against Aristotle’s teachings. Those who study philosophy understand what I mean.

  13. For those who come and say “Micro Formats are semantic you fool.”. Keep in mind that my idea is only to avoid making exact science. Science should always be obscure so as to always remain open for modification. To create a sub-section of W3’s already semantic (X)HTML is not intelligent.

    HTML5 is the perfect example of this non-obscure science. Who’s to say that in 5 years we’ll even have footers and headers on websites? “Oh, we’ll just delete that element and add a new one when it exists”.

  14. This is a very detailed and informative post. Not only that you introduced me to a new term, RDFa, but you presented it in a very clear manner that even I can understand. Thanks!

  15. I’m not up on HTML 5 as of yet, but how does RDFa fit into a standards-compliant web page. HREF and REL cannot be used on an IMG tag under XHTML 1.0 and break standards compliance. Is RDFa just a proposal to have built into the standard or is it already supported in HTML 5? Most Micro Formats work within the existing spec. I apologize if this is a stupid question, but the article didn’t make it clear.

  16. Will this RDFA create impact in the search engines and results? Was the question that raised in to my mind when this was started. But it has made so much effect in the search results. The search engine loves the sites having good no of reviews.

  17. @audienst : RDFa uses is own dtd based on XHTML 1.1, and named XHTML+RDFa 1.0. It implies that the pages using it have to be declared as application/xhtml+xml document type.

    This is a really great text I surely will translate into French.

  18. At least for dates it would be better to be able to state something like:

    2009-06-23

    and then have a CSS or presentation schema that says:

    a[property=dc:issued] {
    date-template: ‘published on %M %d’;
    }

    or something of the likes.

    The goal of something like this would be to have only one version of the data in the HTML document, and let the way its spelled for humans be a presentational issue.

    For this to work, some basic data formats should be standarized, for instance:

    Date, Long, Float, String

    we could have something like:

    HTML:

    Richard Avalon, New York Times

    CSS:

    and a string template by explode

    a[spp:photo-attribution] {
    string-template-explode:’,’ ‘This photograph was taken by %1, and originally published on %2’;
    }

  19. epgui said: “Something like an external metadata file?”

    There are two ways of doing that at the moment – one is called GRDDL, which is a W3C standard that uses XSLT and can be used with HTML 4 and XHTML 1.x. It’s a bit complex though. I’m leaning more towards a non-W3C standard solution called RDF-EASE which uses a CSS-like syntax for basically overlaying metadata in the same way that a CSS file overlays styling information.

    Aaron Gregg: “you mention that Yahoo! “has been processing RDFa for about a year”, but you don’t mention anything about MFs with them, are they not bothering with MFs, or have they specifically sided toward RDFa?”

    Yahoo! are parsing both microformats and RDFa. Google are also parsing both microformats and RDFa, but Google’s RDFa parsing is a bit disappointing (for various long-winded reasons I won’t go into).

    A few comments make comparsions between RDFa and Microformats and ask the difference. The existence of RDFa is not a reason not to publish microformats (and vice versa). You can easily interleave microformats and RDFa into one document. I’ve put up an example document at http://gist.github.com/138115 – showing hCard and RDFa mixed together. It uses a variety of exisiting RDF vocabularies in some interesting ways.

    islandapart: What you seem to prefer is to wash your hands of all semantics and basically just let Google’s text-extracting sausage machine handle it all. That’s fine, except for – like capitalism, I guess – you then being completely reliant on Google’s whims for everything. If we as web developers all take semantics to hand – proper, deep semantics of what the actual objects being represented are there fore rather than just the surface semantics of “this is a header, this is the body text” etc. – we can ensure vendor neutrality and make sure that the Web can be a place for everyone to enjoy, rather than semantics being something for those with clever programmers, lots of patents and the money to buy lots and lots of computing power.

    And, well, I’ve studied Aristotle (among many other philosophers) and nothing I’ve read in my day-job as a philosophy student convinces me that RDFa or microformats are bad ideas. In fact, the great philosophers have very little to say about W3C standards, semantic markup or CSS. They don’t tell me much about the Java virtual machine or whether to use Linux, Mac or Windows. To suggest they do is quite silly.

  20. I agree with @islandjumper. This should be the role of the machine to understand and parse documents but it is not possible yet (better, more sophisticated AI needed). I don’t like the whole MF idea – putting semantics into class attribute is messy and unnecessary. I think W3C should focus more on XHTML 2.0 and introduce some dict: namespace with global dictionaries read by all bots. But, because W3C decided to do this step back (I just read that MS don’t like HTML 5 spec and they won’t implement it) we have to wait another 10 years to make this possible.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA