Comments on Testing Search for Relevancy and Precision

10 Reader Comments

Back to the Article
  1. John,

    Shouldn’t a good search engine provide disambiguation help? In your example of “parking”, Google would suggest the top relevant related searches to automagically refine the user’s query.


    Copy & paste the code below to embed this comment.
  2. Good article, thanks. I wish you could expand on “[”¦] we used these metrics to identify weaknesses in the configuration of our search engine, and as a yardstick to track improvement as we implemented optimization, best bets, and a thesaurus.”

    Copy & paste the code below to embed this comment.
  3. Hi Jim,

    Thanks for the question.  Disambiguation or “narrow your results” is a technology extension that supplements the core function of the search engine.  But the methods I discuss here focus specifically on the quality of the core relevancy calculation, because it’s the basis of a quality search experience.

    Functional add-ons (narrower, broader, filters, similar, etc.) are often great additions to a search engine.  But not always.  They’re sometimes implemented as technology crutches without regard to whether they solve any actual problems.  Sometimes they’re only helpful in a few circumstances, and otherwise only clutter up the results page.  People furthermore often won’t make use of a function that requires additional work of them.  Finally, not all designers are working with products that have those kinds of capabilities.

    At its heart, a search has to be good at judging the relevance of documents to a user’s query.  That’s really what you bought it to do.  So to measure that accurately, we need to temporarily set aside the effect of functions that might (or might not!) be used to improve the search once it’s already been submitted.


    Copy & paste the code below to embed this comment.
  4. Thanks so much for the question. 

    One of the great things about working with quantitative numbers is that they allow you to summarize a complex problem in a very concise way.  While search is a qualitative experience, the methods I describe here explain it in clear, simple numbers.

    So for example, after completing the evaluation you can present the results like this:

    - Our mean relevancy score is currently 5.7, and we want to bring that up to 2.5.
    - Currently, 11% of the best matches fall below the 10th position.  We want to reduce that to 5%.
    - By the loose standard, our precision score is currently 63%.  We want to bring that up to 75%.
    - By the permissive standard, our precision score is currently 89%.  We want to bring that up to 98%.

    These metrics create a compelling case for further work to improve the quality of the search experience, and suggest the type of work that needs to be done.  For example, solutions like engine tuning, thesaurus, and spellcheck improve the quality of all searches, while optimization and bets bets fix stubborn outliers that remain problematic.

    In the past, I’ve used these methods to set objectives and create improvement plans in just this way.  Not only has it been effective, but it’s often significantly overshot the improvement target resulting in a screamingly great search experience.

    Copy & paste the code below to embed this comment.
  5. Hi, that was a clear and helpful article! My site has very few visits, but still it’s good to know a good way to analyze site search experience.

    I would just like to point out that the first spreadsheet link appears to be broken, as it points to an invalid destination (http://d/).

    Copy & paste the code below to embed this comment.
  6. Bersimon,

    Glad you found it helpful!  Sorry you couldn’t download the spreadsheet; I’m not having the same problem.  Try following this link:

    Thanks for the comment,


    Copy & paste the code below to embed this comment.
  7. Thank you for this article! That were things I did not think about until now, but I got the impression that I have learned something ;). I am sure it will be helpful for me when I have to maintain bigger websites.

    Copy & paste the code below to embed this comment.
  8. I’ve run into the issue of scoring a result set for usability evaluation before (using different interfaces for complex queries but it is the same difference); One of the things I used is the typical hit and run behavior that we know google users are using; if the right result is not within the first 10 results, users rather re-query than goto the next page. So results after 10 are typically unimportant. The good old “precision” used to tweak engines is less important for these reasons and less useful in this kind of evaluation.

    In order to be able to have a user based scoring for the search results to a given query, one could count the relevant results within the top 10 and use the position of each of those results to create an aggregate score for the result set; say the 1st and 3rd result are relevant out of 100 results, you could give a score of (1/1+2/3)/10 = 0.17; if the second relevant result would be 2 in the result deck, the score would’ve been 0.2 etc.

    It would be even better if you have a couple of people evaluate the results for relevancy instead of the single ambiguous you.

    Copy & paste the code below to embed this comment.
  9. Thanks for the interesting article.
    Sometimes it is also a good idea to spread the search results into a section which is manually predetermined, if a certain keyword appears in the query, and a section, which is generated totally automatically by the search engine. So if some keywords were searched for quite often, you can at least present for these queries the most relevant results.

    Copy & paste the code below to embed this comment.
  10. I think this is an excellent step-by-step explanation of how to evaluate recall, precision, and relevance.  It’s not just metrics, it explains how to use the research, very helpful.

    I believe Ledderman above is referring to Search Suggestions / Best Bets.  These are particularly useful in cases like your Football example, where a static link to the sports department would serve the users.

    Copy & paste the code below to embed this comment.
  11. Sorry, commenting is closed on this article.