Beyond Goals: Site Search Analytics from the Bottom Up

Avinash Kaushik demonstrated that site search analytics (SSA) is a powerful tool you can use to assess customer intent quantitatively. In SSA, as with all flavors of web analytics (WA), you can work from the top-down; by starting with clear, measurable metrics based on your organization’s goals, you can benchmark and continually optimize the performance of your content and designs. While goal-driven analysis is wonderfully useful, we’ll explore a different, “bottom-up” approach that relies on pattern analysis and failure analysis to help you understand your users’ intent in qualitative ways that complement the top-down approach.

Article Continues Below

User behavior—it’s yours to discover#section2

Rather than measuring performance by key performance indicators (KPI), in bottom-up analysis you “play” with the data to uncover the unexpected: Interesting patterns in the ways people search your site and strange “outliers” that teach you something new about your customers. For example, if you manufacture printers, you might be surprised to learn that your most common search queries are actually for printer drivers, and not product information. Once you realize that existing customers are searching much more frequently than potential customers, you might drastically alter the content you invest in the most.

To understand customer intent, bottom-up analysis is as important as top-down analysis for two reasons:

Analysis always benefits from different perspectives. No one perspective is complete and authoritative; bottom-up analysis is another lens with which to observe and draw conclusions from your data.
Top-down analysis means measuring only known goals. Top-down analysis doesn’t anticipate the unknowns that arise as your site, your business, your customers, and the world itself change over time. Without bottom-up analysis, you’ll miss out on important discoveries that aren’t goal-driven.

Additionally, there are occasions where you’ll need to rely on bottom-up analytics because top-down analysis won’t work. For example:

A site may not have clear and obvious goals. For example, management may not be able to clearly articulate your organization’s goals.
A site’s goals may not be measurable. It might be difficult to generate useful KPI for your personal website, your daughter’s elementary school site, or the local YMCA’s site.
The act of measurement may not be feasible. You may not be able to perform measurement because you lack analytics software, time, or expertise.

This is where bottom-up analysis can help: Pattern analysis uncovers trends in the types of information users want. Failure analysis helps you identify the screw-ups that you’d better fix as soon as possible. Let’s get started.

Querying your search queries#section3

Bottom-up analysis is simpler than you think: You really do just sift through your data in a variety of ways and wait for interesting patterns and outliers to emerge. For example, examine the top fifty most common search queries. Can you categorize them by topic, or by the types of documents that searchers request? Can you categorize them in some other way?

This really is informal. You don’t need to master Excel’s most inscrutable formulae, nor do you need a statistics degree. Just dive in and have fun. You can start with your analytics software’s basic reports, or by parsing your raw data into a format that you can drop into Excel. As you play with the information, “ask your data” some generic questions:

What are the most frequent unique queries?
Are frequent queries retrieving quality results?
What are the click-through rates per frequent query?
What are the most frequently clicked results per query?
Which frequent queries retrieve zero results?
What are the referrer pages for frequent queries?
Which queries retrieve popular documents?
What interesting patterns emerge in general?

These basic questions are relevant to just about any site, and the answers will often lead you to follow-up questions specific to your site and its users. They’re the ideal guide as you confront—and jump into—megabyte after megabyte of search data. And they’ll help you with the next steps: pattern analysis and failure analysis.

Pattern analysis#section4

Here’s a data sample from the Michigan State University site. Stored in Excel, it includes one week of search queries taken in October and sorted by most to least common:

Fig. 1. Michigan State University search results from October, 2006. Figure courtesy of Rich Wiggins.

On just a quick review, some interesting questions emerge:

Why is the course “CSE 101” the most common query? No other courses crack the top 35. What exactly do users want to know about this course?
Why are “campus map” and “map” such common queries when the campus map is so clearly displayed on the site’s main page?
Is there a problem with the site’s navigation? Or with how the map is displayed? Or maybe there’s no problem at all—maybe a lot of users just like to search?
Why would “housing” rank so highly even though the semester is already underway? Is “housing” queried as frequently at other times of the year? Let’s see if the data reveals which documents those who searched for “housing” visited in October when compared with, say, May. What will the differences reveal?

Note that while none of these questions have to do with KPI, each is important nonetheless. After all, 2.5% of all searchers sought information about the “lon capa” system. (Lon capa is a course management system.) During this particular week in October, 2.1% of all searchers searched for a variant of CSE 101. Another 1.2% searched for maps. These three queries (and their variants) account for over 5% of that week’s search activity. If you’re the Michigan State webmaster, you should look into how well your search engine supports those searchers, and whether you have content that serves those searchers.

Categorizing these queries will really help you understand the data’s patterns, and you don’t have to be a librarian to do it. All sorts of categorization approaches could be applied; it depends on the patterns that emerge most clearly for you. The following chart shows queries color-coded by category, and mapped out over time. It took about an hour to create:

Fig. 2. Michigan State University queries color-coded by category, and mapped out over time. Figure courtesy of Rich Wiggins.

Examining query frequency over time introduces another interesting facet: Seasonality. Queries that represent systems, (coded yellow), decline over the course of the semester, perhaps as students become more familiar with those systems. (In this context, systems are applications that take you away from the web.) Maps (black) are more useful at the start of the semester, the library (orange) as finals approach, while football (gray) queries decrease as the MSU team spirals downward in another dismal season.

Or, at least, that’s how it seems. Ultimately, analytics tell us what is happening, not why. After detecting data patterns, we might guess what’s going on with reasonable accuracy. But we can’t know for sure unless we conduct qualitative analysis, such as actual user testing, where we can ask people why they do what they do.

When you’ve got your own search data in front of you, start by asking the following questions. Interesting patterns, trends, and outliers will quickly begin to emerge:

What are users’ most frequent queries?
How might we categorize queries (e.g., by task, topic, audience type)?

What do those categories tell us about our users and what kind of information they need?
How do timing and seasons affect users’ information needs?

Once you start to see some of your site’s search patterns, you can use failure analysis to find immediate opportunities to improve the information you provide to your site visitors.

Failure analysis: learn what you need to fix now#section5

Where does search go wrong on your site? If you can analyze your data to find major screw-ups, you’ll be able to fix them. To start, simply identify the searches that fail to retrieve any results whatsoever. After all, it’s generally a safe assumption that searchers want to retrieve at least one result. Here’s an example from a biking products retailer, courtesy of BehaviorTracking.com:

Fig. 3. Top search terms not found.

Here are a few things I observed from playing with this simple report for a half hour:

“Price” was the top query with zero results between January 17 and April 16. Wow! It’s hard to believe that pricing information wouldn’t be included on the site, but perhaps product prices are buried within each product page. If pricing information is already there, perhaps it’s time to redesign the page to make pricing information more prominent.
Perhaps the retailer, not realizing the potential of the daredevil couples’ market, doesn’t sell “mountain tandem bikes.” If that’s the case, it’s time to call the manufacturer to place an order. Or, perhaps these bikes are stocked, but the site calls them “tandem trekker bikes.” If that’s the case, it’s time to tweak the product labeling.
Even though it’s not something a bike retailer typically sells, “insurance” is a steadily frequent query. Perhaps there’s an opportunity to develop a referral program with a speciality insurer?
Lots of user misspell “mountain” as “montain.” In fact, typos feature regularly in most site-search query logs. Maybe the retailer should turn on their search engine’s spell-check feature (or acquire a search engine that supports spell-checking).

Failures take different forms in different contexts. For example, Netflix looks at titles that are in demand: The ones that are most searched and most clicked-through (which they learn about from SSA’s sibling, clickstream analysis). Of those, Netflix then examines the titles that are failing: The ones that are least likely to be added to customer queues. They can then examine why—is there enough stock, are there movie genres that they don’t carry, or something else?

Failure analysis demonstrates what’s going wrong with your site and by extension illustrates SSA’s value as a diagnostic tool. For example, let’s say you estimate that, based on your data analysis, 8% of your user queries include typos. If you estimate that your users perform searches on half of their visits, these numbers reveal a compelling conclusion: Installing a spell-check feature to fix typos could improve the overall user experience of your site by 4%: [8% (searches gone wrong) x 50% (portion of users who search) = 4%].

Four percent might not sound like much, and we can certainly challenge that number. But it may be enough to impress your organization’s decision-makers—who may be considering far more expensive and less effective alternatives to installing a spell-checker—such as a redesign. Besides, if you can improve the site four percent here and three percent there, those little numbers start to add up.

Meet in the middle#section6

We’ve discussed two basic types of bottom-up analytics. The value you derive from each type depends on the data you start with. And that, of course, depends: Your data might be from a text file in a search log, a search engine or analytics tool report, or in some wonderfully flexible database that supports ad-hoc queries and custom reporting. Whatever the case, you should find these Excel-based examples useful—they’re low tech, low cost, and most importantly, they expose you to the actual analysis process that your favorite analytics tool may hide.

In fact, be wary of the standard reports that come with your analytics application. They certainly have value, but these reports also provide a false sense of security—as if they were designed with your needs in mind. Nothing could be farther from the truth: Top-down, goal-driven analytics should be centered on your KPI, and your organization’s goals aren’t the same as everyone else’s. Similarly, your search query data—and the users, content, and actions they represent—are unique to your context. Top-down and bottom-up analytics will benefit you in different ways, and if you can find a happy middle ground, you’ll have an unequaled understanding of your customers’ intent.

Learn more about site search analytics#section7

Hurol Inan’s Search Analytics: A Guide to Analyzing and Optimizing Website Search Engines (BookSurge, 2006) is an excellent SSA resource. My own book, Search Analytics: Conversations with your Customers (Rosenfeld Media, 2009), co-authored with Marko Hurst, will be published in the coming months; our book site contains many useful SSA links and resources.

Happily, many of the best experts practicing SSA share their wisdom on their blogs: Avi Rappoport, Gary Angel, Rich Wiggins, and Lee Romero’s sites are all well worth bookmarking. For SSA-related research, check the work of Yahoo!’s Ricardo Baeza-Yates, Amanda Spink, CMS Watch’s Phil Kemelor, and Jim Jansen of Pennsylvania State University.

8 Reader Comments

willwill says:

September 23, 2009 at 1:54 pm

I really appreciate the approach you are describing to getting a better understanding of your visitor’s intent. Internal Search is one of the view things customers do on many sites and it is a direct indicator of how a site is not serving a certain segment.

I believe getting a feeling for what visitors are missing is often the first step in creating changes to a site’s design, that can then be validated by measuring outcomes. It’s very much along the lines of what any scientist is doing when he is working with lots of data: coming up with a hypothesis during the discovery phase, where you immerse yourself in data and formulate what might actually be happening, before you then use a quantitative method to measure if you’re right.

A lot of that first phase is about lookng at available numbers, but it is also about having hunches and developing ideas. I wish we had more tools that helped us along those lines.
Lou Rosenfeld says:

September 24, 2009 at 1:25 pm

Willwill, I agree; as I mentioned in the article, you’ve got to play with data to develop hunches. Otherwise we risk missing out on the unexpected. Better tools would be nice, but I’ve found Excel to be good enough to handle most aspects of exploratory data analysis.
patrick_l says:

September 29, 2009 at 1:12 pm

I’s a nice and useful way to use Excel for site search analysis, and quite varying from the other methods presented today. I did not believe before that there were so many different techniques for that issue.
bugsyrocker says:

November 1, 2009 at 5:06 pm

…I was pleasantly surprised to come across this article. Great to see the MSU community taking a part on A List Apart.

Keep up the great work.
Dena TasarÄ±m says:

March 9, 2010 at 9:26 pm

Nice results. As a person who is also acquinted with MSU community, it’s good to see the them taking roles on A List Apart.
seoasia says:

September 12, 2010 at 8:48 am

Hi there

Thanks very much for your insights. As someone who often gets lost in data, I appreciate your tips on how to wade through it in a way that makes sense.

Using the bottom methodology that you describe has already given me a wider perspective on conversion optimization
websitebuilder says:

February 18, 2011 at 6:35 am

Every good site manager or designer probably formally, or informally, almost certainly engages in bottom-up analysis at some point. I wonder how many “community managers” do … ? I can think of more than a couple examples of website community managers who’d really benefit from a rethinking of their strategy, to shift their focus from “brand building” to “meeting user expectations and needs” by identifying what people are actually seeking.
Jrtayloriv says:

March 7, 2011 at 6:15 pm

I had never given much thought to using site search statistics to help me determine how to best arrange the layout and images (as in your campus map example).

Thanks.