Testing Content
Issue № 320

Testing Content

Nobody needs to convince you that it’s important to test your website’s design and interaction with the people who will use it, right? But if that’s all you do, you’re missing out on feedback about the most important part of your site: the content.

Article Continues Below

Whether the purpose of your site is to convince people to do something, to buy something, or simply to inform, testing only whether they can find information or complete transactions is a missed opportunity: Is the content appropriate for the audience? Can they read and understand what you’ve written?

A tale of two audiences#section2

Consider a health information site with two sets of fact sheets: A simplified version for the lay audience and a technical version for physicians. During testing, a physician participant reading the technical version stopped to say, “Look. I have five minutes in between patients to get the gist of this information. I’m not conducting research on the topic, I just want to learn enough to talk to my patients about it. If I can’t figure it out quickly, I can’t use it.” We’d made some incorrect assumptions about each audience’s needs and we would have missed this important revelation had we not tested the content.

You’re doing it wrong#section3

Have you ever asked a user the following questions about your content?

How did you like that information?

Did you understand what you read?

It’s tempting to ask these questions, but they won’t help you assess whether your content is appropriate for your audience. The “like” question is popular—particularly in market research—but it’s irrelevant in design research because whether you like something has little to do with whether you understand it or will use it. Dan Formosa provides a great explanation about why you should avoid asking people what they like during user research. For what’s wrong with the “understand” question, it helps to know a little bit about how people read.

The reading process#section4

Reading is a product of two simultaneous cognitive elements: decoding and comprehension.

When we first begin to read, we learn that certain symbols stand for concepts. We start by recognizing letters and associating the forms with the sounds they represent. Then we move to recognizing entire words and what they mean. Once we’ve processed those individual words, we can move on to comprehension: Figuring out what the writer meant by stringing those words together. It’s difficult work, particularly if you’re just learning to read or you’re one of the nearly 50% of the population who have low literacy skills.

While it’s tempting to have someone read your text and ask them if they understood it, you shouldn’t rely on a simple “yes” answer. It’s possible to recognize every word (decode), yet misunderstand the intended meaning (comprehend). You’ve probably experienced this yourself: Ever read something only to reach the end and realize you don’t understand what you just read? You recognize every word, but because the writing isn’t clear, or you’re tired, the meaning of the passage escapes you. Remember, too, that if someone misinterpreted what they read, there’s no way to know unless you ask questions to assess their comprehension.

So how do you find out whether your content will work for your users? Let’s look at how to predict whether it will work (without users) and test whether it does work (with users).

Estimate it#section5

Readability formulas measure the elements of writing that can be quantified, such as the length of words and sentences, to predict the skill level required to understand them. They can be a quick, easy, and cheap way to estimate whether a text will be too difficult for the intended audience. The results are easy to understand: many state the approximate U.S. grade level of the text.

You can buy readability software. There are also free online tools from Added Bytes, Juicy Studio, and Edit Central; and there’s always the Flesch-Kincaid Grade Level formula in Microsoft Word.

But there is a big problem with readability formulas: Most features that make text easy to understand—like content, organization, and layout—can’t be measured mathematically. Using short words and simple sentences doesn’t guarantee that your text will be readable. Nor do readability formulas assess meaning. Not at all. For example, take the following sentence from A List Apart’s About page and plug it into a readability formula. The SMOG Index estimates that you need a third grade education to understand it:

We get more mail in a day than we could read in a week.

Now, rearrange the words into something nonsensical. The result: still third grade.

In day we mail than a week get more in a could we read.

Readability formulas can help you predict the difficulty level of text and help you argue for funding to test it with users. But don’t rely on them as your only evaluation method. And don’t rewrite just to satisfy a formula. Remember, readability formulas estimate how difficult a piece of writing is. They can’t teach you how to write understandable copy.

Do a moderated usability test#section6

To find out whether people understand your content, have them read it and apply their new knowledge. In other words, do a usability test! Here’s how to create task scenarios where participants interpret and use what they read:

  • Identify the issues that are critical to users and the business.
  • Create tasks that test user knowledge of these issues.
  • Tell participants that they’re not being tested; the content is.

Let’s say you’re testing SEPTA, a mass transit website. It offers several types of monthly passes that vary based on the mode of transportation used and distance traveled: For example, a TransPass lets you ride on the subway, bus or trolley. A TrailPass also lets you ride the train, etc. If you only wanted to test the interface, you might phrase the task like this:

Buy a monthly TrailPass.

But you want to test how well the content explains the difference between each pass so that people can choose the one that’s right for them. So phrase your task like this:

Buy the cheapest pass that suits your needs.

See the difference? The first version doesn’t require participants to consider the content at all. It just tells them what to choose. The second version asks them to use the content to determine which option is the best choice for them. Just make sure to get your participants to articulate what their needs are so you can judge whether they chose the right one.

Ask participants to think aloud while they read the content. You’ll get some good insight on what they find confusing and why. Ideally, you want readers to understand the text after a single reading. If they have to re-read anything, you must clarify the text. Also, ask them to paraphrase some sections; if they don’t get the gist, you’d better rewrite it.

To successfully test content with task scenarios and paraphrasing, you’ve got to know what the correct answer looks like. If you need to, work with a subject matter expert to create an answer key before you conduct the sessions. You can conduct live moderated usability tests either in person or remotely. But, there are also asynchronous methods you can use.

Do an unmoderated usability test#section7

If you need a larger sample size, you’re on a small budget, or you’re squeezed for time, try a remote unmoderated study. Send people to the unmoderated user testing tool of your choice like Loop11 or OpenHallway, give them tasks, and record their feedback. You can even use something like SurveyMonkey and set up your study as a multiple-choice test: It takes more work up front than than open-ended questions because you must define the possible answers beforehand, but it will take less time for you to score.

The key to a successful multiple-choice test is creating strong multiple choice questions.

  • State the question in a positive, not negative, form.
  • Include only one correct or clearly best answer.
  • Come up with two–four incorrect answers (distractors) that would be plausible if you didn’t understand the text.
  • Keep the alternatives mutually exclusive.
  • Avoid giving clues in any of the answers.
  • Avoid “all of the above” and “none of the above” as choices.
  • Avoid using “never,” “always,” and “only.”

You may also want to add an option for “I don’t know” to reduce guessing. This isn’t the SAT after all. A lucky guess won’t help you assess your content.

Task scenario:

You want to buy traveler’s checks with your credit card. Which percentage rate applies to the purchase?

Possible answers:

  • The Standard APR of 10.99%
  • The Cash Advance APR of 24.24%*
  • The Penalty APR of 29.99%
  • I don’t know

(*This is the correct answer, based on my own credit card company’s cardmember

As with moderated testing, make it clear to participants that they’re not being tested, the content is.

Use a Cloze test#section8

A Cloze test removes certain words from a sample of your text and asks users to fill in the missing words. Your test participants must rely on the context as well as their prior knowledge of the subject to identify the deleted words. It’s based on the Gestalt theory of closure—where the brain tries to fill in missing pieces—and applies it to written text.

It looks something like this:

If you want to __________ out whether your site __________ understand your
  content, you __________ test it with them.

It looks a lot like a Mad Lib, doesn’t it? Instead of coming up with a sentence that sounds funny or strange or interesting, participants must guess the exact word the author used. While Cloze tests are uncommon in the user experience field, educators have used them for decades to assess whether a text is appropriate for their students, particularly in English-as-an-additional-language instruction.

Here’s how to do it:

  • Take a sample of text—about 125-250 words or so.
  • Remove every fifth word, replacing it with a blank space.
  • Ask participants to fill in each space with the word they think was removed.
  • Score the answers by counting the number of correct answers and dividing
      that by the total number of blanks.

A score of 60% or better indicates the text is appropriate for the audience. Participants who score 40-60%, will have some difficulty understanding the original text. It’s not a deal breaker, but it does mean that the audience may need some additional help to understand your content. A score of less than 40% means that the text will frustrate readers and should be rewritten.

It might sound far fetched, but give this method a try before you dismiss it. In a government study on healthcare information readability, an expert panel categorized health articles as either easy or difficult. We ran a Cloze test using those articles with participants—who had low to average literacy skills—and found that the results reflected the expert panel’s findings. The average score for the “easy” version was 60, indicating the article was written at an appropriate level for these readers. The average score for the “difficult” version was 39: too hard for this audience.

Cloze tests are simple to create, administer, and score. They give you a good idea as to whether the content is right for the intended audience. If you use Cloze tests—either on their own or with more traditional usability testing methods—know that it takes a lot of cognitive effort to figure out those missing words. Aim for at least 25 blanks to get good feedback on your text; more than 50 can be very tiring.

When to test#section9

Test your content at any point in your site development process. As long as you have content to test, you can test it. Need to convince your boss to budget for content testing? Run it through a readability formula. Got content but no wireframes or visual design? Run a Cloze test to evaluate content appropriateness. Is understanding the content key to a task or workflow? Display it in context during usability testing.

What to test#section10

You can’t test every sentence on your site, nor do you need to. Focus on tasks that are critical to your users and your business. For example, does your help desk get calls about things the site should communicate? Test the content to find out if and where the site falls short.

So get to it#section11

While usability testing watches what users do, not what they say they do, content testing determines what users understand, not what they say they understand.

Whatever your budget, timeline, and access to users, there’s a method to test whether your content is appropriate for the people reading it. So test! And then, either rest assured that your content works, or get cracking on that rewrite.

About the Author

Angela Colter

Angela Colter has been evaluating the usability of web sites for the better part of a decade. She’s a Principal of Design Research at Electronic Ink in Philadelphia, tweets frequently, and blogs occasionally.

22 Reader Comments

  1. Thanks for that pretty good article about testing web content. Actually, what I am missing a bit – the big advantage that the web gave us in communications is interactivity. Don’t communicate like a book, take responses and act with interactive content when you understand your audience.
    Let the user choose her detail, give dynamic hints with bubbles, automatic scaling areas …
    Thats a marvelous feature of the web, but it will make testing quite more complicated, I think. How can we handle this non-deterministic user experience in content (communication)?

  2. Thanks for writing such an informative article. As you outlined, testing content can be challenging. Your example of the mass transit website passes really hits home. I will definitely apply this to future testing!

  3. Thank you Angela. I still come across clients who don’t understand that benefits motivate consumers (even b2b) more than features do. Testing benefit content versus feature content looks like a great way to help such clients understand what works and what doesn’t. In closing, content testing should be added to any comprehensive site inventory.

  4. @kanzlei
    You’ve hit on what’s beautiful about usability testing: It often reveals that people do behave in unique ways that we may not have predicted when we built the interface or wrote the content. What you’re calling “non-deterministic UX.”

    With so many moving parts, I maintain that it’s possible–necessary, even–to test how well the content supports what you’re trying to do. How you do this will in part depend on what you’re testing, but even with complex interactions I’d encourage you to start by identifying what task people are trying to user your interface and content to accomplish. If they fail, is it clear why they failed and at what point in the process it happened? Getting users to think aloud and paraphrase what they’re reading can help you zero in on the issue, regardless of whether the interaction is simple or complex.

  5. @Jessica Ivins
    Glad you found the article useful!

    @Robert Moss
    I like the distinction you draw between “benefit content” and “feature content” because you could ask different questions to assess whether people get each type. For features, you might ask “what” questions (What is it? What does it do?) where for benefits you’d ask “so what” questions (How would this help you?) I’d expect the benefit content questions might trigger more personal, colorful responses. Great idea.

    I love how you summarized this: moving usability “from the level of objects to words.” Beautiful!

  6. Angela, this is a great article rising some key points on user testing for content. I specially like the ideas to do “poor men tests” for the cases where a moderated usability test is simply not possible.

    Thanks a lot for the info.

  7. You’re right, nobody really tests content. My unofficial test for content is whether my average time on the site is going up or down.

    The other two tests you mentioned would be quite difficult to pull off on my site.

    Great post. Makes you think.

  8. I think this is a great entry on the readability of website content. Also if you’re creating text to try to drive up your SEO and it sounds like a robot created the paragraph, then you’ve pretty much lost your reader and thus credibility. The section about the physician and lay person viewing a site was particularly relevant; we all look and read websites relatively the same…with a 15 second attention span!

  9. I too rely on the Flesch-Kincaid Grade Level formula in Microsoft Word. I can’t get over how simple and effective the Cloz test is. The best content article I’ve read in a while. (And the last best one was on A List Apart too.)

  10. Thanks for the excellent article. I’ll be using this as a resource to plan for testing my company’s content.

    I do have a question. You said:

    “Need to convince your boss to budget for content testing? Run it through a readability formula.”

    Forgive me if I’m being dense, but can you clarify how the readability formula reinforces the business case for testing? Do you mean by showing when the content is too complex for the average user to understand?

  11. Melanie, that’s exactly what I mean; showing stakeholders that the content is too complex for the average user to understand.

    The first question is, what does the average user understand? There is an oft-cited statistic that the average adult in the U.S. reads at or below an 8th-grade level. (The study that’s usually cited as the source, the National Adult Literacy Survey, never actually specifies a grade level. It states that nearly half of adults have literacy skills in the lowest two (out of 5) levels, meaning they have inadequate skills for coping with everyday tasks. But I digress.)

    If your organization has already specified a target reading level for it’s audience, use that.

    So the conversation might go something like this: “I’ve run some samples of this text through a readability formula, and it’s estimating that you’d need a 16th-grade education to understand it. That’s much higher than the 8th-grade reading level of the average adult. Maybe we should test this text with our users to see whether it really is a problem.”

  12. A very interesting article. Readablity is something that seems to fall right into a deep hoe with too many clients who want ‘sexy graphics’ and not much else. Rant over.

    My main point though is that using tools such as Kampyle can be very useful too for getting user feedback. It certainly helped us – though we don’t keep it on site all the time.

  13. I will be trying out Cloze testing on some current projects. Thanks for the tip on this. It’s simple, easy to set up and clear to score. Love it!

  14. Thanks, Angela, your response to comments is almost as helpful as the article. I had just finished reading the comment “benefits motivate consumers (even b2b) more than features do” and thinking, gosh, that sounds really imp, what does that mean?

    Then you said “I like the distinction you draw between “benefit content” and “feature content” because you could ask different questions to assess whether people get each type. For features, you might ask “what” questions (What is it? What does it do?) where for benefits you’d ask “so what” questions (How would this help you?) I’d expect the benefit content questions might trigger more personal, colorful responses.” Ooooh!

    I often work with health websites for mainstream audiences that are written much too formally and to college readers, when the audience is largely low-literacy adults. Thank you for your language, and your concrete tests, for supervisors. REALLY helpful.

    Your writing is so understandable and actionable (and kind). I went to your blog to sign up and saw you have an post about low-literacy audiences, I can’t wait to read it, thanks!

  15. Jakob Nielsen’s Alertbox has a nice, short overview of the Cloze test. Check it out at http://www.useit.com/alertbox/cloze-test.html

    This is a sidebar to a larger article about Mobile Content Comprehension at http://www.useit.com/alertbox/mobile-content-comprehension.html

    I’ve always maintained that the Cloze test can be done independent of how the text appears in an interface, but Nielsen’s article suggests that the interface can have a profound effect on comprehension as indicated by the difference between Cloze test results using a desktop format and those using a mobile interface format.

  16. Thanks for some great ideas and suggestions, especially some testing tools and protocols I hadn’t heard of! Three add-on comments/suggestions:

    1) Use an online survey to identify major content comprehension problem-areas you can then focus on (prioritize) during testing. As Content Strategist for a Fortune 500 telecomm company, I ran a survey that first segmented users by primary task purpose (Shop? Order? Learn?) and product interest (Internet? Phone? TV? etc.), then asked (among about a dozen other questions) “What information did you find confusing or hard to understand?”… followed by multiple choice answers. Results were a huge help designing a subsequent usability test that included comprehension of selected pages.

    2) During testing, either have an IA or UX designer join you as an observer or take notes on interaction issues that inevitably come up during content testing – users don’t differentiate between the two disciplines (obviously) – for them it’s all the same experience. So even if asked to read something for comprehension, they’ll see link labels they don’t understand, or comment on colors that are hard to see, etc. – nice tidbits for IA/UX enhancements.

    3) Maybe goes without saying, but when you’re testing comprehension of existing content, you’ll probably also identify *missing* content – info. your test subjects say the need but can’t find. Specifically ask subjects about that if they don’t bring it up themselves – quick ‘n dirty “content gap analysis”!

  17. Recently you mentioned this article in a tweet. I’m pretty sure I read it when it was new, but it was worth the time to read it again.

    I especially appreciate the advice for when to use what kind of testing. Working in a setting in which my audience varies among ditch diggers, petrochemical engineers, legislators, “the general public,” and others, it’s hard to know when well-structured copy is readable enough.

    As you’ve noted, usability testing is one way to assess that, but sometimes it’s hard to convince observers that the problem is the copy, not the lack of attractive images. The Cloze test, which I set aside as a novelty when I learned of it years ago, seems well suited to just that situation.

    Next time I run into that, I’ll be sure to give it a try. But I’ll also not wait for those circumstances to arise before I do.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA