It’s hard to quantify the customer experience. “Simpler and faster for users” is a tough sell when the value of our work doesn’t make sense to management. We have to prove we’re delivering real value—increased the success rate, or reduced time-on-task, for example—to get their attention. Management understands metrics that link with other organizational metrics, such as lost revenue, support calls, or repeat visits. So, we need to describe our environment with metrics of our own.
For the team I work with, that meant developing a remote testing method that would measure the impact of changes on customer experience—assessing alterations to an app or website in relation to a defined set of customer “top tasks.” The resulting metric is stable, reliable, and repeatable over time. We call it the Task Performance Indicator (TPI).
For example, if a task has a TPI score of 40 (out of 100), it has major issues. If you measure again in 6 months’ time but nothing has been done to address the issues, the testing score will again result in a TPI of 40.
In traditional usability testing, it has long been established that if you test with between three and eight people, you’ll find out if significant problems exist. Unfortunately, that’s not enough to reveal precise success rates or time-on-task measurements. What we’ve discovered from hundreds of tests over many years is that reliable and stable patterns aren’t apparent until you’re testing with between 13 and 18 people. Why is that?
When the number of participants ranges anywhere from 13–18 people, testing results begin to stabilize and you’re left with a reliable baseline TPI metric.
The following chart shows why we can do this (Fig. 1).
How TPI scores are calculated
We’ve spent years developing a single score that we believe is a true reflection of the customer experience when completing a task.
For each task, we present the user with a “task question” via live chat. Once they understand what they have to do, the user indicates that they are starting the task. At the end of the task, they must provide an answer to the question. We then ask people how confident they are in their answer.
A number of factors affect the resulting TPI score.
Time: We establish what we call the “Target Time”—how long it should take to complete the task under best practice conditions. The more they exceed the target time, the more it affects the TPI.
Time out: The person takes longer than the maximum time allocated. We set it at 5 minutes.
Confidence: At the end of each task, people are asked how confident they are. For example, low confidence in a correct answer would have a slight negative impact on the TPI score.
Minor wrong: The person is unsure; their answer is almost correct.
Disaster: The person has high confidence, but the wrong result; acting on this wrong answer could have serious consequences.
Gives up: The person gives up on the task.
A TPI of 100 means that the user has successfully completed the task within the agreed target times.
In the following chart, the TPI score is 61 (Fig. 2).
Developing task questions
Questions are the greatest source of potential noise in TPI testing. If a question is not worded correctly, it will invalidate the results. To get an overall TPI for a particular website or app, we typically test 10-12 task questions. In choosing a question, keep in mind the following:
Based on customer top tasks. You must choose task questions that are examples of top tasks. If you measure and then seek to improve the performance of tiny tasks (low demand tasks) you may be contributing to a decline in the overall customer experience.
Repeatable. Create task questions that you can test again in 6 to 12 months.
Representative and typical. Don’t make the task questions particularly difficult. Start off with reasonably basic, typical questions.
Universal, everyone can do it. Every one of your test participants must be able to do each task. If you’re going to be testing a mixture of technical, marketing, and sales people, don’t choose a task question that only a salesperson can do.
One task, one unique answer. Limit each task question to only one actual thing you want people to do, and one unique answer.
Does not contain clues. The participant will examine the task question like Sherlock Holmes would hunt for a clue. Make sure it doesn’t contain any obvious keywords that could be answered by conducting a search.
Short—30 words or less. Remember, the participant is seeing each task question for the first time, so aim to keep its length at less than 20 words (and definitely less than 30).
No change within testing period. Choose questions where the website or app is not likely to change during the testing period. Otherwise, you’re not going to be testing like with like.
Case Study: Task questions for OECD
Let’s look at some top tasks for the customers of Organisation for Economic Co-operation and Development (OECD), an economic and policy advice organization.
- Access and submit country surveys, reviews, and reports.
- Compare country statistical data.
- Retrieve statistics on a particular topic.
- Browse a publication online for free.
- Access, submit, and review working papers.
Based on that list, these task questions were developed:
- What are OECD’s latest recommendations regarding Japan’s healthcare system?
- In 2008, was Vietnam on the list of countries that received official development assistance?
- Did more males per capita die of heart attacks in Canada than in France in 2004?
- What is the latest average starting salary, in US dollars, of a primary school teacher across OECD countries?
- What is the title of Box 1.2 on page 73 of OECD Employment Outlook 2009?
- Find the title of the latest working paper about improvements to New Zealand’s tax system.
Running the test
To test 10-12 task questions usually takes about one hour, and you’ll need between 13 and 18 participants (we average 15). Make sure that they’re representative of your typical customers.
We’ve found that remote testing is better, faster, and cheaper than traditional lab-based measurement for TPI testing. With remote testing, people are more likely to behave in a natural way because they are in their normal environment—at home or in the office—and using their own computer. That makes it much easier for someone to give you an hour of their time, rather than spend the morning at your lab. And since the cost is much lower than lab-based tests, we can set them up more quickly and more often. It’s even convenient to schedule them using Webex, GoToMeeting, Skype, etc.
The key to a successful test is that you are confident, calm, and quiet. You’re there to facilitate the test—not to guide it or give opinions. Aim to become as invisible as possible.
Prior to beginning the test, introduce yourself and make sure the participant gives you permission to record the session. Next, ask that they share their screen. Remember to stress that you are only testing the website or app—not them. Ask them to go to an agreed start point where all the tasks will originate. (We typically choose the homepage for the site/app, or a blank tab in the browser.)
Explain that for each task, you will paste a question into the chat box found on their screen. Test the chat box to confirm that the participant can read it, and tell them that you will also read the task aloud a couple of times. Once they understand what they have to do, ask them to indicate when they start the task, and that they must give an answer once they’ve finished. After they’ve completed the task, ask the participant how confident they are in their answer.
Analyzing the results
As you observe the tests, you’re looking for patterns. In particular, look for the major reasons people give for selecting the wrong answer or exceeding the target time.
Video recordings of your customers as they try—and often fail—to complete their tasks have powerful potential. They are the raw material of empathy. When we identify a major problem area during a particular test, we compile a video containing three to six participants who were affected. For each participant, we select less than a minute’s worth of video showing them while affected by this problem. We then edit these participant snippets into a combined video (that we try to keep under three minutes). We then get as many stakeholders as possible to watch it. You should seek to distribute these videos as widely, and as often as possible.
How Cisco uses the Task Performance Indicator
Every six months or so, we measure several tasks for Cisco, including the following:
Task: Download the latest firmware for the RV042 router.
The top task of Cisco customers is downloading software. When we started the Task Performance Indicator for software downloads in 2010, a typical customer might take 15 steps and more than 300 seconds to download a piece of software. It was a very frustrating and annoying experience. The Cisco team implemented a continuous improvement process based on the TPI results. Every six months, the Task Performance Indicator was carried out again to see what had been improved and what still needed fixing. By 2012—for a significant percentage of software—the number of steps to download software had been reduced from 15 to 4, and the time on task had dropped from 300 seconds to 40 seconds. Customers were getting a much faster and better experience.
According to Bill Skeet, Senior Manager of Customer Experience for Cisco Digital Support, implementing the TPI has had a dramatic impact on how people think about their jobs:
Troubleshooting and bug fixing are also top tasks for Cisco customers. Since 2012, we’ve tested the following.
Task: Ports 2 and 3 on your ASR 9001 router, running v4.3.0 software, intermittently stop functioning for no apparent reason. Find the Cisco recommended fix or workaround for this issue.
For a variety of reasons, it was difficult to solve the underlying problems connected with finding the right bug fix information on the Cisco website. Thus, the scores from February 2012 to February 2013 did not improve in any significant way.
For the May 2013 measurement, the team ran a pilot to show how (with the proper investment) it could be much easier to find bug fix information. As we can see in the preceding image, the success rate jumped. However, it was only a pilot and by the next measurement it had been removed and the score dropped again. The evidence was there, though, and the team soon obtained resources to work on a permanent fix. The initial implementation was for the July 2014 measurement, where we see a significant improvement. More refinements were made, then we see a major turnaround by December 2014.
Task: Create a new guest account to access the Cisco.com website and log in with this new account.
This task was initially measured in 2014; the results were not good.
In fact, nobody succeeded in completing the task during the March 2014 measurements, resulting in three specific design improvements to the sign-up form. These involved:
- Clearly labelling mandatory fields
- Improving password guidance
- Eliminating address mismatch errors.
A shorter pilot form was also launched as a proof of concept. Success jumped by 50% in the July 2014 measurements, but dropped 21% by December 2014 because the pilot form was no longer there. By June 2015, a shorter, simpler form was fully implemented, and the success again reached 50%.
The team was able to show that because of their work:
- The three design improvements improved the success rate by 29%.
- The shorter form improved the success rate by 21%.
That’s very powerful. You can isolate a piece of work and link it to a specific increase in the TPI. You can start predicting that if a company invests X it will get a Y TPI increase. This is control and the route to power and respect within your organization, or to trust and credibility with your client.
If you can link it with other key performance indicators, that’s even more powerful.
The following table shows that improvements to the registration form halved the support requests connected with guest account registration (Fig. 5).
A more simplified guest registration process resulted in:
- A reduction in support requests—from 1,500 a quarter, to less than 700
- Three fewer people were required to support customer registration
- 80% productivity improvement
- Registration time down to 2 minutes from 3:25.
Task: Pretend you have forgotten the password for the Cisco account and take whatever actions are required to log in.
When we measured the change passwords task, we found that there was a 37% failure rate.
A process of improvement was undertaken, as can be seen by the following chart, and by December 2013, we had a 100% success rate (Fig. 6).
100% success rate is a fantastic result. Job done, right? Wrong. In digital, the job is never done. It is always an evolving environment. You must keep measuring the top tasks because the digital environment that they exist within is constantly changing. Stuff is getting added, stuff is getting removed, and stuff just breaks (Fig. 7).
When we measured again in March 2014, the success rate had dropped to 59% because of a technical glitch. It was quickly dealt with, so the rate shot back up to 100% by July.
At every step of the way, the TPI gave us evidence about how well we were doing our job. It’s really helped us fight against some of the “bright shiny object” disease and the tendency for everyone to have an opinion on what we put on our webpages ... because we have data to back it up. It gave us more insight into how content organization played a role in our work for Cisco, something that Jeanne Quinn (senior manager responsible for the Cisco Partner) told us kept things clear and simple while working with the client.
The TPI allows you to express the value of your work in ways that makes sense to management. If it makes sense to management—and if you can prove you’re delivering value—then you get more resources and more respect.