Discovering Magic
Issue № 293

Discovering Magic

Most of us create identities across the web without much conscious thought. We fill in profiles, upload photos, videos, reviews, and bookmarks. Although this information is often public, it’s fragmented into the silos of individual websites. Wouldn’t it be a little magical if, when you signed up for a new site, the site said something like, “We notice you have a profile photo on Flickr and Twitter, would you like to use one of those or upload a new one?” I built a JavaScript library that can help you do just that. Ident Engine discovers and retrieves distributed identities and user-generated content to help you build a little magic into your user interfaces.

Article Continues Below

Try it out! Enter your profile URLs into the lifestream and combined profile demos. Were you shocked by the level of detail it found out about you? Let me show you how it works.

Our footprints across the web#section1

Social media sites encourage us to have more open and transparent conversations, and create opportunities for new people to participate in our lives. We often see these distributed identities in the “elsewhere on the web” design pattern, in which people list their social media site profiles like this:

   

Elsewhere on the web:

   

   

Places you can find me:

   

Most of these identities are tied into social media sites where we create content. Most web identities are distributed—built from a mesh of interlinked profiles, each of which contains a wealth of information. These identities are our digital footprints across the web. APIs that use the semantic web and open data formats are at the heart of the architecture that brings all our footprints together.

The semantic web and open format data#section2

The semantic web attempts to make information that is currently only intelligible to humans machine readable. Microformats are the semantic web technology that currently shows the most practical promise. Microformats are small patterns of class names and attributes you can add to HTML to help define blocks of reusable semantic data. There is a lot more information on the web formatted with microformats than most people realize: Yahoo’s SearchMonkey recently found 1.35 billion profiles marked up with the microformat hCard. Many other microformats are used to mark up other kinds of user-generated content, from lengthy reviews to tiny Twitter updates.

Beyond microformats, other open data formats such RDF, RSS, and ATOM contain rich data we can use.

Most of our web footprints are encapsulated in open-standard data formats. To exploit the value of these fragments, all we need is a machine-readable method to link the fragments to their common owner. This is a job for XFN (XHTML Friends Network).

The power of rel=“me” and Google’s Social Graph API#section3

XFN is a simple and powerful microformat. We can use XFN to define an interlinking set of web pages that together, represent an individual. To achieve this, simply add the rel=“me” attribute to any link between two web pages that represent the same person. For example, I could use the markup below to define a relationship between any page on the web that represented me and my Twitter profile:
 

rel="me" href="http://twitter.com/glennjones">Twitter

Just over a year ago, Google released the Social Graph API, which allows anyone to query these relationships. After you provide a URL starting point, it returns a map (social graph) of all the related pages (edges) connected by rel=“me” links. Using this API, you can discover the numerous identities people have across the web. Try out the identity discovery demo. You’ll find that results vary depending on how many social media sites an individual uses and how well those sites are interlinked. Sometimes, changing the starting point returns different results.

The two ways your identities are found#section4

There are two different ways to retrieve information from the Social Graph API using its rel=“me” relationships. The simplest way is otherme, which returns a list of sites based on rel=“me”. Try this example: http://socialgraph.apis.google.com/otherme?q=http://www.glennjones.net/&pretty=1

The second and more low level Social Graph API call, lookup allows you to control the inclusion of inward and outward linking. It also returns more complex link relationship views. The two main parameters of this call are edo (edges outward) and edi (edges inward).
 
Here’s a look at the output of a lookup call for glennjones.net: http://socialgraph.apis.google.com/lookup?q=http://www.glennjones.net/&fme=1&edi=0&edo=1&pretty=1&jme=1

If we executed an API call against the social graph in the diagram below to find only the outward relationships (i.e., edi=0&edo=1), it would return my blog, my Last.fm profile, my Google profile, and my Flickr pages. Whereas if we executed an API call to find both the outward and inward relationships (i.e., edi=0&edo=1), the Social Graph API would also find my Brightkite page.
 
Google’s testparse is a useful debugging method that displays the relationship links Google finds in a given piece of HTML.

Node linking diagram
Fig. 1. Node linking diagram

Of the two methods to query the social graph, the first is to start at point A (the first URL) and follow all the outward links to other pages (edges) using rel=“me”. In the diagram, A links to B and then B links to all the C pages. The links from A to B to C are called a chain.

Only search engines, such as Google, can use the second method, which is to search for any inward rel=“me” links from other webpages (edges). For example, point X, (my Brightkite account) has a rel=“me” link to my blog, but there’s no way to find point X following links from point A. You must do a search of every page to find point X.
 
The otherme API call only makes use of outward relationships. It returns reliable verified linkages, but often, the results are limited.

Imposters and rogue relationships#section5

These methods of extrapolating relationship data are always always open to errors and possible abuse. If you use only outward linking (i.e., the otherme API call), the results are usually solid. This is because the same individual should own each of the pages in the chain, so the process is hard to hijack. Unfortunately, the otherme method does not return as many results as it would in combination with inward claims.

But you can’t really trust inward claims. Any imposter can include a rel=“me” link on a page to hijack someone else’s social graph. People can copy and paste HTML containing semantic markup into the wrong context by mistake, creating rogue relationship links. With some careful post-processing, however, you can use inward-linking data if you’re willing to accept the odd error and rogue relationship in exchange for a fuller set of results.

The decision to use inward claims should be based on the type of interface you are building and your audience’s expectations. If you need to play it safe, use only the otherme method call. Over time, people will create stronger linkages between their identities and using inward claims will become unnecessary.

Beyond listing identities: social graph node mapping#section6

The Social Graph API returns interlinked page URLs and can also provide more detailed information from some social media sites. Using a technique called “social graph node mapping,” it finds useful URLs related to the same individual on a site. For certain accounts, it can also bring back some small bits of data such as the user’s full name.
 
To extract this second tier of information, the API must create a canonical user account called SGN (Social Graph Normalized URL). For example, using an SGN, my account on Flickr is expressed as: sgn://flickr.com/?ident=glennjonesnet. This domain/username pairing can be used to find other URLs (service endpoints). Using the node mapping technique, the results for my account on Flickr look like this:
 

"sgn://flickr.com/?ident=glennjonesnet": {
 "attributes": {
 "url": "http://www.flickr.com/photos/glennjonesnet/",
 "profile": "http://www.flickr.com/people/glennjonesnet/",
 "rss":http://api.flickr.com/services/...",
 "atom":"http://api.flickr.com/services/...",
},

Enhancing the discovery process#section7

To enhance SGN, I added my own custom data to the API’s output. This allows me to describe many more sites and service endpoints, and—more importantly—to programmatically retrieve content from these sites.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The custom data for a single Flickr service
  endpoint description
uri-template http://www.flickr.com/people/murtaugh/
media-type Html
schema hCard
content-type Profile

Using this data, we can cross-reference the accounts the Social Graph API finds to identify specific data we want to retrieve. With API endpoint discovery, we can target specific content types from across an individual’s accounts. To enhance the identity data we have so far, we collect all the available profiles from across the web. Try out the profiles demo to learn more.

Although the hCard format is comprehensive, it’s often used to hold only two pieces of information, such as name and URL.

It makes sense to aggregate our collection of profiles into a single, combined profile. The aggregation rules Ident Engine uses favor completeness, and thus favor business-related data sources over personal data sources. For example, the address is chosen by the number of data elements it contains. Some values such as name, username, and URL are based on the most commonly used value across all the profiles. Try out the combined profile demo to learn more.

Blending profile and social graph data#section8

You can use the profiles to to help extend your social graph. Profiles marked up in hCard can contain multiple URLs. Most social media sites mark these URLs with rel=“me”, but some don’t. Sometimes you can discover new profiles by checking the URLs listed. Ident Engine also aggregates the data to find the top usernames and the primary URL defined across all the profiles. This data helps to resolve ambiguity issues, such as determining which hCard is the representative hCard on any given page.

Parsing profiles—you’ve got choices#section9

There are several different microformat parsers you can use to parse hCard profiles. Currently, Ident Engine uses two: Yahoo’s YQL and my own .Net parser, UfXtract. YQL parses microformats directly from Yahoo’s search index, and provides a fast, reliable URL-based API. Its index does not include all the pages on the web, but when you need responsiveness and scalability, YQL is the best choice. It can return XML or JSON. Here’s an example query using YQL microformats:

select * from microformats 
 where url = 'http://www.glennjones.net/about/'

http://query.yahooapis.com/v1/public/yql?q=select * from microformats where url=‘http:%2
F/www.glennjones.net/about/’&format=xml

 
The UfXtract parser is more compliant to microformat specifications and parses complex microformat markup. However, the UfXtract API is better suited to personal hacks and projects than to large-scale commercial use. It too can return XML or JSON. Here’s an example UfXtract API call:
 

http://ufxtract.com/api/?url=http://www.glennjones.net/about/&format=hCard&output=xml

 
Other good parsers include Optimus, hKit, and Swignition.
 

Retrieving user-generated content#section10

Profiles are not the only type of content API endpoints can describe. We can also target other open standard data sources. In fact, it’s possible to build a fairly complete lifestreaming application from these sources. Try out the lifestream demo to see this idea in action.

The lifestream demo content is parsed from RSS, Atom, or microformats. Although there are many server-side libraries for parsing RSS and Atom feeds when working client-side, the YQL API can successfully parse most feeds and convert them into a single XML or JSON format. Here’s an example YQL feed query:

(Line wraps marked » —Ed.)

select * from feed 
 where url='http://api.flickr.com/services/feeds/ »
 photos_public.gne?id=77314934@N00'

http://query.yahooapis.com/v1/public/yql?q=select * from feed where url=‘http://api.flickr.com
/services/feeds/photos_public.gne?id=77314934@N00’&format=xml

 

The API can return additional information about embedded rich media resources, such as images or audio. You can encapsulate all types of content in feeds, so Ident Engine’s overlay data has a “content-type” property for each API endpoint. This describes the content as an event, activity stream, status, or video, etc. With this information, you can present content in a reasonable context.
 
Using feeds does not preclude using microformats as a method of collecting content. There are many circumstances in which microformats have advantages over feeds. For example, I can use the hAtom microformat to extract Twitter statuses, but their equivalent RSS feeds are not publicly accessible.

 
  http://ufxtract.com/api/?url=http://twitter.com/glennjones&format=hAtom&output=xml

 

Privacy#section11

It can be very disconcerting for users to confront the data you’ve discovered using these methods. They may even be unaware that they’d shared the information publicly. It’s important to educate web users about how they present themselves, and about which content they choose to make public.

Tools such as the social graph API and data formats such as microformats don’t degrade privacy—on the web, there is no real privacy in obscurity—but as designers and developers, we need to find better ways to encourage users to make informed decisions about privacy. Most social media site approaches to privacy (and persona projection) are inadequate and unsophisticated compared to the ways in which we deal with privacy in the real world.
To create trust when designing interfaces that collect personal profile data, we must be as transparent as possible, and provide links to the original sources. If users find data that is wrong or outdated, they should have access to an easy method of tracking down that information so they can correct it.
 

Consider your audience #section12

The methods we’ve examined work best for certain audiences under specific circumstances—for example, users of sites or applications that provide services to other social media sites. In such cases, you’ll often ask for account details as part of the first interaction with new users, and your audience is by definition actively participating in self publishing.

Use progressive enhancement#section13

Be sure to consider progressive enhancement when you design interface functionality around this type of information discovery. Never rely completely on the data quantity or quality returned. Instead, design your discovery features to supplement the main task flow. For example, if you design a practical photo picker, provide a file upload as a starting point, and offer enhanced discovery as an optional extra.

All wrapped up for you: Ident Engine#section14

We’ve covered how to join social graph and profile data, but a few secondary problems remain. Rather than going into greater detail, I created a JavaScript library, and I invite the more technically curious among you to pull it to pieces to find out how all the cogs fit together. For those less technically curious among you, I hope it will be relatively easy to use.
 
The library uses jQuery, so initiating a search is simple. First, you need to bind a function to render your results to the library. In the code below I bound a call to the renderListing function into any update events fired by Ident Engine. The library will fire an update every time it finds new data during the search process.
 

jQuery(document).ready(function () {
  doc.bind('ident:update', renderListing);
  ident.useInwardEdges = true;
  var url = 'http://twitter.com/glennjones
  ident.search(url);
}); 

The render function first clears any previous content. Then, it loops around the array of objects, in this case, to render any profile URLs found.
 

function renderListing(e){
  resetContent();
  var ul = jQuery('
    ').appendTo("˜results'); for (var x = 0; x < ident.indenties.length; x++) { var profileUrl = ident.identities[x].profileUrl; jQuery('
  • ' + profileUrl + '
  • ').appendTo(ul); } }
function resetContent(){
    jQuery('#results').html('');
}

Parsing user content with Ident Engine#section15

There are several ways to extract user-generated content from the API endpoints Ident Engine discovers. The following two methods will load my status from Identica using the hAtom microformat. The findContent method can only be used after a search, while the load content works independently of the search.

ident.findContent(
  'identi.ca', 
  'Status', 
  'hAtom' 
);
   
ident.loadContent( 
  'http://identi.ca/glennjones', 
  'identi.ca', 
  'Identica', 
  'Status', 
  'hAtom' 
);

The returned data is stored in seven collections: Identity, Profile, Resume, Entry, Event, XFN, and Tag. As with profiles, an event fires to tell you new content has been added, once it’s collected from the APIs.
You can download the source code, along with documentation and examples, from http://identengine.com/. It’s under an MIT open source licence.
 

Discover the possibilities#section16

The amount of data available on the web is directly tied to social trends in openness and self publishing. With billions of microformats embedded in the web and RDF growing in strength, it’s now possible to build applications on this data. Now it’s your turn to use identity discovery to build a little magic into the user experiences you design.

About the Author

Glenn Jones

Glenn Jones is the Creative Director of Madgex. After 18 years in digital design he is equally as passionate about coding as interaction design. He is currently addicted to exploring the semantic web and data portability.

10 Reader Comments

  1. Great read, I’m a big believer in RDF and Microformatting. I was disappointed to see the Web moving away from XHTML back to HTML because of the flexibility RDFa provides in communicating and sharing data

  2. While I very much appreciate the technical aspects of this well-written article I do question the complete lack of reference to data protection and overindulgence in Web2.0-ness.

    As for microformats – CSS classes are not the place for semantic metadata. That just turns them into “keywords” and we all know what happened with meta-keywords.

    While there is definitely something to be gained by minor extensions and “namespaces are a honking great idea” and as a recent article showed RDF shows some of the possibilities, the semantic web is a dodo.

  3. Charlie I don’t agree that microformats are putting keywords into css classes. The HTML class attribute has many roles; it can be used to help add interaction to a page with JavaScript or presentation style with CSS. This is reflected in the “HTML 4 spec”:http://www.w3.org/TR/REC-html40/struct/global.html#h-7.5.2 which refers to the attribute being used “For general purpose processing by user agents”. The example given for general purpose processing: “identifying fields when extracting data from HTML pages” describes exactly how microformats use the class attribute. It is a common misconception that classes are just for CSS.

    I think I know where you are coming from with the meta-keywords comparison. The reason why the use of metadata in HTML can fail is that it is often hidden from view. The meta-keywords are out of sight and because of that errors go unnoticed. Microformats don’t have the same problem as most of the data values are in clear sight.

    If I am honest I would say that building the semantic web is taking a lot longer than most would hope. There are real points of traction starting to shine through; “Google’s parsing of Microformats/RDFa”:http://googlewebmastercentral.blogspot.com/2009/05/introducing-rich-snippets.html in its search results being the most obvious example. I also hoped the demos with this article would go some way to showing that designers /developers can practically use semantic data today. It may not yet be the grandiose vision, but it’s not a dodo.

  4. Glenn – if anything like the semantic web is to take off then any data attributes have to be unambiguous. Namespaces are great for this as can be seen by the extensions to RSS and I can see some advantage in doing something similar in HTML with RDF for unambiguously handling copyright, data protection or similar common requirements. But overloading HTML namespace really isn’t going to fly – to overextend my dodo metaphor.

    There was a recent article on “Opera Dev”:http://dev.opera.com/articles/view/styling-and-extracting-hcalendar/ about the calendar microformat which really made me think about the problems with the approach. Why on earth isn’t the ICS available directly? Or, pack use a namespace so that the browser can decide what do with the calendar information directly as happens with the date-formatting. Apart from the technology there are very important data protection reasons for letting the browser and the user decide what happens with this kind of information.

  5. Most people dont even realize their own spread on the internet and then there are some who seek to build on it. This is a nice way to consolidate all the data and measure the overall weight of web presence of an entity.

  6. As usual, _A List Apart_ seems to have read my mind again! ALA articles have a spooky habit of co-inciding with whatever I’m working on or thinking about at the moment…

    I was just working on turning my (currently rather bare) “personal home page(The personal website of Jordan Clark)”:http://www.jdclark.org/ into a collation of the various bits-and-bobs that are dispersed all over the web, then up pops this!

    @Glenn:

    Fair play, I must say that the “Ident Engine”:http://identengine.com/ is very impressive to say the least. It’s great for web designers to see the benefits of using semantic markup – along with microformats, RDF, RSS etc. – pay off with the emergence of tools like this. It also the practical benefits such as this that will encourage the further use of standards-compliant (X)HTML, as opposed to theoretical pipe-dreams.

    What we need now is an “inverse Ident” – some sort of centralized web service that does the same thing for online profiles as what “OpenId”:http://openid.net/ is trying to do for the log-in dilemma. (Please excuse me if something like this already exists!)

    I’ve always found it slightly annoying to have to create a profile on “Facebook(Jordan Clark’s profile on Facebook)”:http://www.facebook.com/clarky.y2k then an almost identical one on “Elance(Jordan Clark’s profile on Elance)”:http://xclarky.elance.com/ – and yet another for “LinkedIn(Jordan Clark’s profile on LinkedIn)”:http://www.linkedin.com/in/xclarky and… (I’m sure you get the idea!)

  7. I’m impressed, this is a great tool—a great tool many people should be frightened of. It shows how your own identity has become public online, what is especially interesting for people who don’t know what can happen to their personal data once they publish it.

  8. I love this tag, I think it’s a great tool and have used it in the past. Every day more and more people are realizing how public the information they put onto the web is. I have never really been afraid of what I’ve put online because none of it could ever come back to hurt me.

  9. I’ve been thinking about this stuff for awhile, first playing around with Google Social Graph/Profile search API and now with Ident’s. Its very interesting stuff, and fills in the gaps to questions i’ve often had when following hashtags for conferences on twitter. I’ve often wondered- who are all these people. Leveraging these new API’s I’ve put together HashParty, a twitter hashtag explorer that does just that.

    http://hashparty.com/

    Give it a look see and tell me what you think.

    The team and I working on allowing people to claim and clarify their id’s as well. This is a rapid concept coming together in about 2 weeks after digesting what all I could find online to help us connect the dots. In testing the concept, Google Reader/Profile id’s appear to mess up Indent the most, some users get incorrectly reported with having numerous ids for some strange reason. Overall cool tech though and always trying to improve our results.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

Nothing Fails Like Success

Our own @zeldman paints the complicated catch-22 that our free, democratized web has with our money-making capitalist roots. As creators, how do we untangle this web? #LetsFixThis