Comments on The Myth of Usability Testing

29 Reader Comments

Back to the Article
  1. Thanks for a thought provoking article! I wasn’t aware of the Molich experiments, but am glad I am now, and enjoyed the tips on simpler ways to do user testing.

    I’m a data-driven design guy, so I appreciate the more evidence-based approaches to what we do in design and usability, but I worry here we’re replacing one flawed system with another.

    Firstly, as for the tools you recommend (most of which I would echo), aren’t they just as prone to the problems Molich found? I can’t imagine the five second test being particularly meaningful, for instance.

    Secondly, hasn’t this desire for more usability testing come about after recognizing the limitations of our intuition? Sure, it’s a nice ego boost to think that I, Joe or Jane Professional, can simply know, Gladwell-style, in the blink of an eye what the problems are (though I wouldn’t cite “Blink” as evidence for anything, really), but I don’t think it holds up in reality. If it did, I would spot problems costing businesses $millions, charge them $1 million for my blink-of-an-eye fix, they could fire their internal team, make millions, and everyone wins. Well, everyone except their internal team.

    What evidence is there that a well trained intuition isn’t just as faulty as the usability teams? Ask a dozen designers, get a dozen different answers.

    It’s like saying an experienced stock trader can just “know” which stocks are going up. If that were true, we’d all give them our money, and we’d all be rich.
    I think our intuition is brilliant for coming up with ideas and _possible_ solutions (our design is only ever going to be as good as our best ideas, after all), but we, as a profession, still need a formal way of measuring, testing, and publishing our results, in my opinion.
    I’m not for flawed usability testing as descried—pseudo-science doesn’t help anyone—but nor am I for returning to expert intuition either, as informed by testing or otherwise. But I do think we need a new way of thinking about design on the web.

    Copy & paste the code below to embed this comment.
  2. Great article, Robert. User experience engineering is an art just like writing and graphic design. You can follow all the rules, but it takes a subjective element to get it right for human consumption. Metrics help by making the subjective objective. They validate or invalidate subjective feelings and decisions.

    “Silverback”: is a tool worth mentioning because it aids in the observation of user experiences not just report metrics.

    Copy & paste the code below to embed this comment.
  3. I don’t agree with your conclusion of the Molich tests. It’s not usability testing that you should blame, but the usability companies that participated. Maybe they are not that good.

    Or maybe the method itself. Maybe usability as a science is not that exact as some people hope.
    Imagine you have a huge garden and you invite 10 companies to maintain it. Will the result be the same?

    Definitely not. Some differences will be the result of a bad knowledge of gardening. Or lazy employees. Of gardeners not knowing the business. I think most of the “˜mistakes’ made, would be the result of one of those elements.

    Other differences will be the result of a different approach. And who is really capable of telling that those differences ‘are’ mistakes, things that the gardeners missed?

    Gardening is not an exact science.

    Neither is usability. (Whatever method you use.) There will always be differences.

    Copy & paste the code below to embed this comment.
  4. If you’re going to do usability testing, do it thoughtfully, thoroughly, and by collaborating with the team. There is a lot of art involved in user research—it isn’t science. But a usability test well done can reveal amazing insights that you can’t get any other way, even if you have observed hundreds of people using designs.

    I have a lot to say about usability testing. Some of what I have to say I said in this periodical, just a couple of weeks ago:

    (If that’s not enough for you, check out my blog: If *that’s* not enough, come to my session at UI 14, ‘Mastering The Art of User Research’

    Copy & paste the code below to embed this comment.
  5. i’m pretty skeptical about the 5-second test. what could you hope to discover from a test like this. i think it could only be useful as a very narrow test, ie what do users notice first. anyone found it genuinely useful?

    Copy & paste the code below to embed this comment.
  6. Thanks for the article. While agree with most of what you’ve said, but couldn’t you pick a better title? I’m getting linked to this article by people who know me (since I specialize in this field) who think that it’s against usability testing.

    Anyway, results of usability testings can vary greatly depending on few factors such as participants, tasks given to participants, the experience of the moderator…etc, so of course you’ll get different results. That doesn’t mean that usability testing is ineffective. It just means that you need to be careful and understand what you’re doing.

    Copy & paste the code below to embed this comment.
  7. Robert, from some of your comments I understand that the “myth” you are referring to in the article title is that usability research is good for “determining what to focus on next.” But I didn’t really get that from the article itself. Are you saying that usability testing and evaluation (BTW, are you equating the two for the purposes of your thesis?) are bad methods for deriving high-level strategic guidance? I would definitely agree with that, but mainly because the method focuses on too-granular issues and is limited in terms of research participants. But then what is the significance of the Molich story and the other examples you cite of, quite frankly, faulty thinking / planning? These stories are worrisome, for sure, and worthy of further study / discussion in and of themselves. But I’m not sure that they say anything positive at all about usability methodologies - regardless of the purpose or intent. In fact, I hope none of my clients read your title and the first few paragraphs alone…and leave thinking that usability research is unreliable!

    Some other questions: Doesn’t your Blink reference support the idea that a good usability expert can provide value? Does that mean that the Molich evaluators were just incompetent? Also, what about just using usability research for its intended purpose - to identify specific design problems that impede user success and / or fail to encourage behaviors that the site wants to encourage (engagement, exploration, interaction, etc.)? Is this a “good” use or a “bad” use of the methodologies?

    I know there must be more to this than “use the right tool for the job” and use it properly… but I confess that I’m not seeing it. Help?

    Copy & paste the code below to embed this comment.
  8. The biggest question mark for me is how valid is the usability data when it was produced using subjects who are not the real users of the application.
    For instance, you can tell me to buy something from a website(me as a usability test subject), but even if the direction informs me I’m buying a product, if I’m not the real customer I may not really understand what I’m looking for. There’re things that a true customer considers before making a purchase or not, that a usability subject may bypass, in effect inventing the truth about the flow of a real-life user. To spend additional funds and change the direction of a site altogether based on the ‘data’ of a non-user test subjects, how valuable has it proven to be?

    Copy & paste the code below to embed this comment.
  9. Thanks for a great article, Robert - highly interesting reading!

    I get the feeling that conclusions from tests often are drawn too early in order to find some kind of business short-cut. This should certainly be good and educative reading for them… :-)

    Copy & paste the code below to embed this comment.
  10. But in this day of USA Today attention spans, and especially given our discipline’s struggle for respectability and acceptance, there is danger in the titillating but misleading article title or the carelessly arrived at, but well written, conclusion.

    I write in reaction to “The Myth of Usability Testing” by Robert Hoekman Jr. ( 

    Misleading title # 1:  The article title implies either that a) all of usability testing is a myth, or b) there is only one myth associated with usability testing.  As “Mashhoor” asked in his/her post to the discussion about the article, “. . . couldn’t you pick a better title? I’m getting linked to this article by people who know me (since I specialize in this field) who think that it’s against usability testing.”  Indeed, Hoekman Jr. says, in discussion item #10, “I knew [the title] would probably get exactly this type of reaction, and . . . I decided the potential controversy could only draw attention to [usability testing]. . . . I wholeheartedly support running usability studies.”  I hope all the readers who think the title might suggest otherwise choose to read this far.

    Sloppy or misleading conclusion # 1:  In discussing Molich’s CUE-2 test, Hoekman Jr. says, “Collectively, the teams reported 340 usability problems. However, only nine of these problems were reported by more than half of the teams. And a total of 205 problems—60% of all the findings reported—were identified only once. Of the 340 usability problems identified, 61 problems were classified as “˜serious’ or “˜critical’ problems.”
    “Think about that for a moment.”
    “For the Hotmail team to have identified all of the “˜serious’ usability problems discovered in the evaluation process, it would have to have hired all nine usability teams.”
    In stark contrast to Hoekman Jr.‘s conclusion that usability testing can’t possibly be cost-effective is Molich’s own conclusion:  “Realize that single tests aren’t comprehensive. They’re still useful, however, and any problems detected in a single professionally conducted test should be corrected” ( Also, in summarizing CUE-4 (, Molich says:  “Many of the teams obtained results that could effectively drive an iterative process in less than 25 person-hours. Teams A and L used 18 and 21 hours, respectively, to find more than half of the key problem issues, but with limited reporting requirements.”

    Misleading title # 2:  First major header — “Why usability evaluation is unreliable.”  Even if some usability evaluation is unreliable — and given the low barriers to entry for the field of usability engineering, who would be surprised?—that doesn’t mean all usability evaluation is unreliable.  Indeed, Hoekman Jr. goes on in this section to describe BAD usability evaluation (e.g., “Right Questions, Wrong People, and Vice Versa”).  With this I agree totally — bad usability evaluations are unreliable, and are just generally, um, bad.  I wonder if a better header for this section might have been “Some things that lead to unreliability of usability evaluations”?  Or maybe “Good methods gone bad”?

    Sloppy or misleading conclusion # 2:  “Usability evaluations are good for a lot of things, but determining what a team’s priorities should be is not one of them.”

    Allow me to observe that usability evaluations are also poor for Julienning fries — for that I’d recommend a Veg-o-Matic.  For establishing your team’s priorities, I’d recommend, oh, some sorta business process.  But if your goal is to identify and prioritize potential problems your users may have with your product or site design — well then, usability evaluation can kick Veg-o-Matic ass.  Which brings me to the best part of the Hoekman Jr. article . . .

    Great, representative illustration # 1 — the graphic at the head of the article, drawn by Kevin Cornell, showing a hammer resting against a bent and undriven screw.  EXACTLY.  Here, a hammer is the wrong tool for the job.  There are many jobs for which usability evaluation is the wrong tool, but, as with the hammer, many for which it is the right tool.

    Sloppy or misleading conclusion # 3:  “It’s only natural that existing users perform tasks capably and comfortably despite poor task design. After all, the most usable application is the one you already know. But this doesn’t mean poor designs should not be revamped. Rather, to adapt to and harness the power of usability testing, current users should be brought in to test new ideas—ideas that surface from expert evaluation and collaboration with designers to create new solutions.”  Yes, and non-current-but-still-representative users may be brought in, at any time, to evaluate old and new interfaces.  Why the focus on only current users?  If one tested only current users, it would be another example of “the wrong people for the right question.”

    Wheel Rediscovery # 1:  “To identify problems on which to focus, these teams, and yours, can take a variety of approaches. Consider a revised workflow that begins with an expert-level heuristic evaluation used in conjunction with informal testing methods, followed by informal and formal testing. More specifically, consider using online tools and paid services to investigate hunches, then use more formal methods to test and validate revised solutions that involve a designer’s input.”  Yes, this sounds like a fairly thorough course of User-Centered Design (UCD) (see Vredenburg, Isensee, and Righi, 2002), though there are earlier steps of user-based requirements gathering that are also important.  (Though it seems odd to parry “Usability evaluation may be too costly” with “Go with a heuristic evaluation and informal methods, plus some more informal and formal testing.”)  Molich, in his CUE-2 summary, offers “Use an appropriate mix of methods.”

    Odd, unsubstantiated claim # 1:  “Here are several tools that can be used with a heuristic evaluation to identify trouble spots:  Five-second tests: . . . Click stats: . . . Usability testing services: . . .Click stats on screenshots: . . . .  In handling usability projects in this way, teams will identify priorities and achieve better outcomes, and can still gain all the benefits of being actively involved with usability tests.”  So, heuristic evaluation plus these remote, unmoderated testing tools yield the same benefits as usability testing?  I wonder.  It’s an empirical question, and in my opinion it’s the next big question for our field — the empirical comparison of the value of usability engineering methods; which methods at which points in the development cycle of which types of user interfaces?  (Alas, so far the National Science Foundation doesn’t agree with me, that answering this question is worthy of funding.)

    Odd, but widely-shared misconception #1:  “Obviously, not every team or organization can bear the expense of usability testing.”  Which teams would that be?  For which teams is it OK to “just get something out there and let our first users be our first test participants”?  (I am NOT quoting Hoekman Jr., here — rather, it’s a snarky but too-often-deserved characterization of development teams’ approach.)  Which teams are OK with the potential costs of a post-ship rework of the product, PLUS the alienating of those users who struggled to learn how to interact with that first design, given that “After all, the most usable application is the one you already know”?  Which teams (and ya’ gotta be able to identify “˜em in advance, right?) are going to be those teams that happen to get the design right the first time? 

    So, to summarize, in my should-be-humbler opinion:
    -  yes, usability evaluations can be pursued at the wrong time, and can be performed poorly even when the timing is good;
    -  but that is true of any method or tool in software (or any) engineering, and no reason for criticism of the method itself;
    -  usability evaluation, applied and conducted well, IS a tried-and-true technique for identifying potential usability problems;
    -  but maybe not all the problems; and so
    -  yes, we need to get better at choosing and applying usability engineering methods.

    I’m workin’ on that.

    Copy & paste the code below to embed this comment.
  11. Though the discussant gets no feedback on this (and the title does not appear in the preview when it is cut-and-pasted from the comment itself), there’s a limit to the length of a message title.  For my previous post the intended title was:  “Fish gotta swim, birds gotta fly.  And bloggers gotta blog.”

    Copy & paste the code below to embed this comment.
  12. But actually, I do agree that “bloggers go…” :-)

    Copy & paste the code below to embed this comment.
  13. As a certified human factors engineering professional CHFP and over 30 years experience with complex usability issues your piece grossly miss-represents the intent and structure of that from of usability analysis known as “heuristics”. To those with a serious background in usability the studies you mention are known to be grossly misleading and poorly executed. Finally, for the record Jacob Nielsen DID NOT invent heuristics. The process was well understood and used successfully in many military applications before JN was born.

    Charles L. Mauro CHFP

    Copy & paste the code below to embed this comment.
  14. Thanks for writing such a provocative article. While I agree that usability testing isn’t the right tool to identify the answer to every question about an interface, I’m not really sure I follow your logic here.

    In the examples you cite, the research teams made some pretty huge recruitment gaffes. Clearly, if you test with the wrong audience, and ask them the wrong questions, your findings aren’t going to be worth shit. But that doesn’t mean the method is lacking; the implementation is.

    Do you have any anecdotes illustrating the method’s shortcomings that DON’T involve research teams that made some serious newbie failures?

    I also don’t understand your assertion that:

    bq. “... usability testing fails wholly to do what many people think is its most pertinent and relevant purpose—to identify problems and point a team in the right direction ...”

    Setting aside your examples of poorly run testing scenarios, I really don’t see how you can make this kind of assertion. It’s been my experience that usability testing is a terrific method to identify problems in an existing interface. Am I misunderstanding your point? Could you clarify what you mean?

    After reading this article multiple times now, the main point I am left with is this: Don’t hire usability testers who don’t know what they’re doing. I wholeheartedly agree with this sentiment. But as far as I can tell, there’s no clear evidence given here to justify using inflammatory phrases like “the myth of usability testing” or “why usability evaluation is unreliable”.

    I just don’t buy it.

    Copy & paste the code below to embed this comment.
  15. Nice article, having studied psychology I know a lot about field studies, experiments and what to look for in them - for example, these usability tests may not have been in the right conditions so whilst they _may_ be correct do a degree, some of these usability problems may not be _real_ problems when used by a normal person.

    I’m not saying that the job these people do is unecessary, I’m saying that it should always be taken with a pinch of salt and analysed further.

    Copy & paste the code below to embed this comment.
  16. I am not sure what you mean by usability testing cannot drive team priorities? Your explanation is all about why BAD usability testing cannot drive priorities. I think I failed to see where you spoke of GOOD usability testing to drive priorities.

    But I do agree with you, usability testing should be in context. If your usability test cases capture the business needs properly, then it can direct the development efforts in making that test pass (TDD).

    Copy & paste the code below to embed this comment.
  17. The article ‘Why You Only Need to Test with 5 Users’ ( speaks of that.

    And I guess that could be an explanation as well the numbers in “User Interface Engineering”.

    Copy & paste the code below to embed this comment.
  18. Usability tests are great at telling a team what direction they should not pursue, but probably not much else.  Unfortunately that is perhaps the most important information regarding creative direction that “experts” may ever hope to receive that is not often appreciated strictly with this regard.  Web developers are, by the way, experts at knowing what the customer wants, which is why they are so good at telling the customer what they want.

    My employer subscribes to usability tests that provide incredible feedback.  How scientific is that information and how wonderful are those picky details?  I don’t know.  The evidence does not suggest a decisive direction to pursue, but it is quick to tell you when you are wrong in comparison to nearly identical expectations from competitor websites.  If you are wrong over and over… eventually a patter should emerge of what you should NOT be doing.  I find that information to be of profound value, although it is commonly in directly conflict with expectations of what usability should be.

    Copy & paste the code below to embed this comment.
  19. I agree with this article that Usability Testing isn’t the ultimate standard in catching issues and solving problems on the web. However, I’ve found in my own experience that testing can provide some insights into how your website is perceived, as well as being able to let you separate yourself from your own company jargon.

    We used testing on a redesign for a university in Philadelphia, and some of the best insight we gained was from what users wanted to get to first, second, etc. We also gained insight into the application process we had built, and was able to rewrite instructions in layman’s terms, instead of the “advertising” jargon we had been using.

    Copy & paste the code below to embed this comment.
  20. You can’t just sit there listening to everyone’s comments. Many users have a terrible sense of aesthetics or want the application to be tailor-made for them. _Every_ site and application needs to undergo usability testing of some sort, but more important is having the services of a designer (and hopefully developers too) who has usability patterns and standards down to a T. A designer with the right eye can pick the true usability flaws out of the sea of personal preferences expressed in during usability testing.

    Copy & paste the code below to embed this comment.
  21. Aside from the fact that I would absolutely believe that a typical Microsoft application like hotmail has AT LEAST 300 usability problems, there are some other oddities about this study.

    The reviewers of the study mention that there were reporting problems from the teams, as well as being pretty fine grained about wether ‘problems’ were in fact the same or not. It almost difficult to trust the outcomes of these studies.

    Also I find it hard to believe that if you continue to expand usability studies you won’t be able to find a correlation with major problems.  Most studies are able to zoom right in to 1-4 major problems right away.  This study did in fact that there were 9(?) problem that were reported by more than a few teams.  This makes perfect sense.  You need to prioritize and fix major problems.

    I agree that you need to understand what your expectations are from these kinds of studies.  You looking for ‘usability’ problems, not people’s opinions. If you want opinions go have a code review.  If people are unable to complete a well written task, well, then you have a problem - which is why these studies are run.

    It’s also my opinion that bringing in current users of a system is a problem.  Even sites that require a lot of domain knowledge are able to be tested with first time users and a well written script.  Bringing people back to test again is usually not a good idea because (like you mentioned) they’ve become familiar already with your screwed up navigation.

    Copy & paste the code below to embed this comment.
  22. Nice article and I do agree with many points. We find that the best method is to have a group of 10-15 external users all with different tasks to perform, e.g. buy a pair of jeans, sign up to the newsletter, find a course, etc. This way you get a good cross section and then you sit down with your creative and development teams and analyse the data. Then we would make any recommendations for design/fucntionality changes going forward.

    Copy & paste the code below to embed this comment.
  23. If you’re testing an interface for a product that is not strongly influenced by the user’s personal context or emotional state, then testing in a lab will yield decent results.  If you’re trying to measure the persuasive power or conversion potential of your site (which should be pretty high on your list of research objectives if you’re running an ecommerce site), then lab-based testing is a complete waste of time.  There are a bunch of cost-effective alternatives that can help you identify roadblocks to conversion in real-time.

    Copy & paste the code below to embed this comment.
  24. “Page views and time-spent-per-page metrics, while often foolishly considered standard measures of site effectiveness, are meaningless until they are considered in context of the goals of the pages being visited.

    Is a user who visits a series of pages doing so because the task flow is effective, or because he can’t find the content he seeks? Are users spending a lot of time on a page because they’re engaged, or because they’re stuck? While surely hopes readers will stay on a page long enough to read an article in full or scan all its headlines, Google’s goal is for users to find what they need and leave a search results page as quickly as possible. A lengthy time-spent metric on could indicate a high-quality or high-value article. For Google’s search workflow, it could indicate a team’s utter failure.”

    > You seem to be confusing analytics with usability.  The whole purpose behind a think-aloud test is to uncover what’s behind this sort of thing.  BTW, that’s all that Crazy Egg, Chalkmark, and five second test tell you too.  User Testing is a real usability test, but without any ability to follow up.  In that sense, it’s inferior to a moderated test.

    “And interestingly, many of the most compelling usability test insights come not from the elements that are evaluated, but rather those not evaluated. They come from the almost unnoticeable moments when a user frowns at a button label, or obviously rates a task flow as easier than it appeared during completion, or claims to understand a concept while simultaneously misdefining it. The unintended conclusions—the peripheral insights—are often what feed a designer’s instincts most.”

    > Most experienced facilitators ignore these in favor of verbalizations.  And if users exhibit some sort of body language and don’t verbalize, experienced facilitators prompt them to do so (“What are you thinking?”)

    “Finally, while testing alone is not a good indicator of where a team’s priorities should lie, it is most certainly part of the triangulation process. When put in context of other data, such as project goals, user goals, user feedback, and usage metrics, testing helps establish a complete picture.”

    > User feedback and usage metrics cannot be used if the system hasn’t been put into production yet.  Usability testing is usually done pre-release, to get feedback on something without it’s being exposed to the whole world.  Also, evals and tests, if done properly consider business and user goals in the tasks they test and the things they look for.

    “Without this context, however, testing can be misleading or misunderstood at best, and outright damaging at worst. This is also true for non-testing-based evaluation methods, such as heuristic reviews.”

    > Actually, it’s not the context so much as the things you covered earlier — basically, inexperienced usability practitioners.

    “There is a catch to all of the preceding arguments, however: They revolve around the notion that testing should be used primarily to identify problems with existing designs. This is where teams get into trouble—they assume testing is worth more than it truly is, resolve to address problems based purely on testing data, and revise strategies based entirely on comments made by test participants.”

    > Usability testing doesn’t prove anything.  It’s meant to inform your judgment.  You still have to make a decision.  Hopefully, there is now evidence (numbers, quotes, etc.) that you can use to make an *informed* decision.

    “As we’ve seen, test results and research can point teams toward solutions that are not only ill-advised, but in direct conflict with their goals.”

    “usability evaluation is unreliable”

    “While usability testing fails wholly to do what many people think is its most pertinent and relevant purpose—to identify problems and point a team in the right direction.”

    “Test for the right reasons and you stand a good chance of achieving a positive outcome. Test for the wrong ones, however, and you may not only produce misleading results, but also put your entire business at risk.”

    > These are pretty strong claims.  Given that, you really need to back them up.  You cite the CUE studies, but there’s a lot to these studies.  It’s actually rather complicated what he’s really saying, and trying to present it all in this way is (yes) very attention-getting, but also very wrong.

    “Asked how development teams could be confident they are addressing the right problems on their websites, Molich concluded, “It’s very simple: They can’t be sure!”

    > Well, here’s what Molich has also said, from the CUE site:

    >> Six - or even 15 - test participants are nowhere near enough to find 80% of the usability problems. Six test participants will, however, provide sufficient information to drive a useful iterative development process.

    >> The limited overlap may be a result of the large number of usability problems in [the system being tested]. It could also be due to the different approaches to usability testing that the participating teams took - in particular, the selection of different usability test scenarios.

    >> Realize that there is no foolproof way to identify usability flaws. Usability testing by itself can’t develop a comprehensive list of defects. Use an appropriate mix of methods.

    >> Place less focus on finding “all” problems. Realize that the number of usability problems is much larger than you can hope to find in one or even a few tests. Choose smaller sets of features to test iteratively and concentrate on the most important ones.

    >> Realize that single tests aren’t comprehensive. They’re still useful, however, and any problems detected in a single professionally conducted test should be corrected.

    >> Increase focus on quality and quality assurance. Prevent methodological mistakes in usability testing such as skipping high-priority features, giving hidden clues, or writing usability test reports that aren’t fully usable.

    Copy & paste the code below to embed this comment.
  25. Sorry, commenting is closed on this article.