I recently conducted a study into the helpfulness (or lack thereof) of zebra striping—the shading of alternate rows in a table or form. The study measured performance as users completed a series of tasks and found no statistically significant improvement in accuracy—and very little statistically significant improvement in speed when zebra stripes were implemented.
These results were a surprise to many readers of the corresponding article, published in A List Apart in May, 2008. I think it’s fair to say that the vast majority of us, myself included, expected to see that zebra striping helped a lot.
Because of these surprising results, I decided to conduct two more studies into the value of zebra striping. These studies aimed to test design elements deliberately excluded from the first study, and to address some of the issues raised by the results of the first study.
This article represents a summary of my research, presenting the findings of these two further studies and providing a recommendation on future use of zebra striping.
Is zebra striping research really necessary?#section1
Before jumping into the findings of the new studies, I would like to take a moment to reflect on whether research on this topic is actually a “waste of time,” because the usefulness of zebra striping is “obvious.”
Many years of conducting user research have taught me that what seems like an obvious choice to one person is disliked by another, and often for good reason. In the case of zebra striping, the benefit (i.e., of leading the eye) seemed likely, but I had a niggling concern about possible downsides (e.g., additional visual noise interfering with cognitive processing).
Interestingly, since writing the first article, many people have told me that they often debate the use of zebra striping in their work teams. These debates actually provided my original motivation—I expected the first study to find a clear benefit from zebra striping and therefore provide a statistic that could support such discussions. Instead, these original results made the picture less, rather than more, clear! To me, this suggests that more research was warranted.
Directing research energies#section2
The experimental data produced by the initial study didn’t support a strong argument for zebra striping. Yet a single experiment is not sufficient to disprove a theory: maybe zebra striping had little effect because of that experiment’s particular design. Sure, zebra stripes may not help much when you are presenting a table like the one from that study—which is in fact something worth knowing, all by itself—but what if the table was longer, the stripes were a different color, or the task was harder? This was a concern that my readers and I shared.
In an ideal world, I would conduct a study for every possible design attribute that could influence whether zebra striping helped or not. However, these attributes are likely to include at least the following:
- number of rows and/or columns,
- density of the text,
- amount of white space between rows and/or columns,
- how high the rows/wide the columns are,
- how many colors are used for the striping,
- which color(s) is/are used for the striping,
- the degree of contrast between the striped and non-striped colors,
- whether rows are shaded in groups or singly, and if the former, how many rows are in a group,
- what other styles of tables the user has been exposed to in the past,
- how much time the user spends with the table,
- how much the user has to switch between using the table and other tasks,
- the degree of pressure on the user when they are working with the table,
- the type of data that the table contains (e.g., text or figures),
- whether the user has a pointing device (e.g., mouse or electronic pen), and
- how important the task is to the user.
For a small business owner such as myself, it’s just not feasible to conduct all this research. But I did want to follow up on one aspect of the initial study design that particularly bothered me: the lack of real pressure experienced by people when participating. To my mind, if you give someone an unlimited amount of time to perform a task, they will use that time to perform the task well, especially if they believe accuracy matters. To put it another way, the initial study may have had a low number of errors because participants were being careful to get the answer right, so the benefit of this added attention outshone the benefit of zebra striping.
The second study: an alternate examination of performance#section3
As such, I felt it was important to conduct another study: one that limited the amount of time participants had to answer and created a bit of pressure around the task. I also felt it was important to make the task itself harder. These principles guided the design of the follow-up study that was linked to at the bottom of the original ALA article (and elsewhere).
Like the first study, the follow-up study asked participants to answer eight questions using a table of unfamiliar information (see figure below). Differently from the first study, users were presented with the questions in a random order, and the style of table (plain, lined, or striped) that accompanied each question was also randomly chosen.
A timer in the top right hand corner of the screen helped create a sense of pressure. Participants had 15 seconds to answer each question: when the time elapsed a message was displayed and the participant was taken to the next question. Any answer that the participant had entered by the time the 15 seconds elapsed, even if partial, was captured; participants could also commit their answer before the time elapsed, by using a “submit” button.
To add to the difficulty of the task, the table included blank cells and a greater number of rows, so that vertical scrolling was likely to be needed. In this sense, the table was similar to an online banking statement (a common use of tables online).
A whopping 3,674 browser sessions were begun for this follow-up study, which was conducted April 29–June 15, 2008. From these, I removed sessions in which more or less than eight questions had been answered. I also removed all but one session (chosen at random) when multiple sessions were submitted from the same IP address. (This sledgehammer approach was the only way to eliminate duplicates. It will have also meant the loss of some valid data, but we were concerned to prevent any bias from repeat participation.) These measures left 2,276 clean survey sessions that could be used for analysis.
Answers were classified as either correct or incorrect, so that accuracy could be analyzed. An answer was considered correct if it was at least partially correct (e.g., “philli” or “Philipins” for “Philippines”). This was done so that people were not as disadvantaged by their typing speed, if they had found the right answer but not entered it fully when the time expired.
This table of information was used in the second study on the benefits of zebra striping. Participants were asked to answer eight questions using this information, with 15 seconds given for each response. The table was presented either plain, lined, or zebra striped; the striped version is shown.
Despite there being only eight questions and three styles, this follow-up study has yielded an incredibly large amount of data.
I haven’t analyzed it all—and probably will never have time to—so if you’re interested in getting a copy of the data set and running your own analysis, feel free to contact me.
The key results, however, are shown below. Yellow highlighting indicates cases in which the better performance with the striped version of the table are statistically significant based on Pearson’s Chi-Squared Goodness of Fit tests. (These tests adjust for the different sample sizes each table style had.) Orange highlighting indicates the case that is extremely close to being statistically significant (p=0.0545).
The results of the second study showed that zebra striping improved accuracy on three of the eight questions asked.
The table shows that for three of the eight questions, the striped version yielded a more accurate response than did the plain and lined versions. A fourth question comes very close to being statistically significant. For the remaining four questions, the difference in accuracy between all three styles is so small that it cannot be statistically separated from random noise. In these cases, performance with zebra striping is just as good as—and certainly no worse than—the plain or lined version.
This means that, in this study at least, zebra striping doesn’t harm performance—and in many cases, it actually leads to an improvement.
The third study: user preference#section4
A second issue raised by a number of readers was that the value of zebra striping is related to aesthetics and/or subjective preference as much as actual improvement in performance. This is a solid argument and one that I made at the end of the original article: if users like zebra striping, then it’s almost a moot point whether it actually helps them read tables more easily (provided it doesn’t make reading tables harder).
In order to use preference as an argument for implementing zebra striping, we need a statistic that shows that the majority of the user audience prefers it. For this third study, I chose the general public as the user audience, so that the results were as broadly applicable as possible (i.e., not just derived predominantly from the web developer community). With the generous support of Newspoll, I was able to ask a question about preference on one of their National Online Omnibus studies. The National Online Omnibus is a web-based survey of just over 1,200 Australians aged 18-64. The survey contains different questions every fortnight—organizations buy one or more questions for inclusion. The survey participants are members of Newspoll’s rigorously maintained market research panel, and the results are weighted to produce estimates for the Australian population. Therefore, we can be confident in the representativeness of these results, for the Australian population at least (and probably also for the populations of other similar countries, such as the United States, Canada, New Zealand, and Britain).
Participants were shown the following image —which presented the same table six times, each using a different style of formatting—and text:
The tables of information used to measure user preference for plain, lined, and variations of zebra striped tables.
The tables were accompanied by the following instructions:
Obviously, there are many other styles (e.g., three color or mix of lined and striped) that could be tested, but a technical limitation meant the number of styles could be no more than six. In choosing these six styles out of all the styles possible, we took into account the following factors:
- The desire to definitively answer the question: “Do people prefer zebra striped tables when compared explicitly to plain and lined tables?”
- Certain alternatives to plain, lined, or single-color, single-row seemed to be popular amongst readers of the original ALA article (e.g., two color, single row).
- That gray is neutral and therefore should be able to be used for zebra striping in most cases (e.g., regardless of corporate branding constraints and monitor type).
- That green is easy on the eye (we have more green receptors than red and blue ones) and is considered in many cases to be calming.
The need to avoid styles that are likely to create too much visual noise (and thus make readability worse rather than better).
To minimize order effects, it would have been preferable for the order in which the tables appeared in the image to be random, but unfortunately this was not possible. As a “next best thing,” similar styles were physically separated, to make the order appear random. Also, the style that was thought to be least favored (plain) was put in the position that eye tracking studies tell us receives the most attention (the top left-hand corner). This way, if there was a preference away from this style, this finding couldn’t be discounted as a result of the poor location of the table.
The results of this third study are shown in the graph below. The typical zebra striping approach (single-color, single-row) is the most preferred: 31% of participants rated it as the table that helps the most and only 4% rated it as the table that helps the least. (Note that the maximum margin of error on these estimates is 2.8%.)
The third study showed participants preferred single-row, single-color, zebra striped tables.
The two-color, single-row approach seems to divide the population, with 23% of the population considering it the best approach but 15% considering it the worst approach. These results make sense: we can imagine the use of two different colors could constitute extra visual cues for some but extra visual noise for others.
Interestingly, at 20%, the lined version is preferred almost as much as the two-color, single-row version. It also has—like the single-color, single-row version—only a small proportion of people who dislike it, at 4%. The other interesting result is the poor outcome for double and triple-striped, each being preferred by less than 10%. Triple-striped is also the least preferred option for almost a third of the population (28%). While this may be a reflection of a true dislike for these approaches, the size of the tables used in the survey could be a factor. Because of space limitations, the tables could only have seven rows, potentially making it less clear that the double and triple striping was a pattern that would be repeated throughout the table. I personally think the rationale behind it—that it provides more visual information than single row striping, via “chunking” of data—makes a lot of sense. However, it is possible that people just don’t like it as much as the other styles.
The results of the three studies conducted to date suggest that the safest option is to shade the alternating, individual rows of your table with a single color. Taking this approach is likely to ensure that:
- task performance is better, or at least no worse, than with other table styles, and
- the aesthetic sensibilities and subjective preferences of the majority of your users are catered for.
If zebra striping of this type cannot be done easily, then ruling a line between each row may be the next best option.
One door closes and another one opens#section6
It feels as if we are getting somewhere in terms of having reliable and valid statistical data to back up our choice of zebra striping for tabular data online. However it continues to be possible to punch holes in the studies yielding this data. For example, maybe people have a subjective preference for single-color, single-row striping because it’s what they’re used to (like using blue underlining for hyperlinks). Familiarity could also explain the lack of support for double and triple striping, approaches that may well improve task performance significantly.
What are we to do? I think the answer is twofold.
Firstly, if, in your particular circumstance, the cost associated with implementing single-color, single-row zebra striping is acceptable—and I have been told of several cases where this is not the case—then this should be done. Otherwise, stick with a plain or lined design.
Secondly, if you are designing an application or website that contains data tables, don’t let personal preference, habit, or the (untested) status quo drive your design decisions—go out there and get some user data. Run some tests using your preferred approach and one or more of the alternatives described here. And if you can, share your results with us, so our knowledge of the efficacy or otherwise, of different styles of tabular data, can grow.
 Note that the image was shown at a size of 749 pixels wide by 403 pixels high.