Illustration by

Why We Should All Be Data Literate

Recently, I was lucky enough to see the great Jared Spool talk (spoiler: all Spool talks are great Spool talks). In this instance, the user interface icon warned of the perils of blindly letting data drive design.

Article Continues Below

I am in total agreement with 90 percent of his premise. Collecting and analyzing quantitative data can indeed inform your design decisions, and smart use of metrics can fix critical issues or simply improve the user experience. However, this doesn’t preclude a serious problem with data, or more specifically, with data users. Spool makes this clear: When you don’t understand what data can and can’t tell you and your work is being dictated by decisions based on that lack of understanding—well, your work and product might end up being rubbish. (Who hasn’t heard a manager fixate on some arbitrary metric, such as, “Jane, increase time on page” or “Get the bounce rate down, whatever it takes”?) Designing to blindly satisfy a number almost always leads to a poorer experience, a poorer product, and ultimately the company getting poorer.

Where Spool and I disagree is in his conclusion that all design teams need to include a data scientist. Or, better yet, that all designers should become data scientists. In a perfect world, that would be terrific. In the less-perfect world that most of us inhabit, I feel there’s a more viable way. Simply put: all designers can and should learn to be data literate. Come to think of it, it’d be nice if all citizens learned to be data literate, but that’s a different think piece.

For now, let’s walk through what data literacy is, how to go about getting it for less effort and cost than a certificate from Trump University, and how we can all build some healthy data habits that will serve our designs for the better.

What Data Literacy Is and Isn’t#section1

Okay, data literacy is a broad term—unlike, say, “design.” In the education field, researchers juggle the terms “quantitative literacy,” “mathematical literacy,” and “quantitative reasoning,” but parsing out fine differences is beyond the scope of this article and, probably, your patience. To keep it simple, let’s think about data literacy as healthy skepticism or even bullshit detection. It’s the kind of skepticism you might adopt when faced with statements from politicians or advertisers. If a cookie box is splashed with a “20% more tasty!” banner, your rightful reaction might be “tastier than what, exactly, and who says?” Yes. Remember that response.

Data literacy does require—sorry, phobics—some math. But it’s not so bad. As a designer, you already use math: figuring pixels, or calculating the square footage of a space, or converting ems to percent and back. The basics of what you already do should give you a good handle on concepts like percentages, probability, scale, and change over time, all of which sometimes can hide the real meaning of a statistic or data set. But if you keep asking questions and know how multiplication and division work, you’ll be 92 percent of the way there. (If you’re wondering where I got that percentage from, well—I made it up. Congratulations, you’re already on the road to data literacy.)

Neil Lutsky writes about data literacy in terms of the “construction, communication, and evaluation of arguments.” Why is this relevant to you as a designer? As Spool notes, many design decisions are increasingly driven by data. Data literacy enables you to evaluate the arguments presented by managers, clients, and even analytics packages, as well as craft your own arguments. (After all, a key part of design is being able to explain why you made specific design decisions.) If someone emails you a spreadsheet and says, “These numbers say why this design has to be 5 percent more blue,” you need to be able to check the data and evaluate whether this is a good decision or just plain bonkers.

Yes, this is part of the job.

It’s So Easy#section2

Look, journalists can get pretty good at being data literate. Not all journalists, of course, but there’s a high correlation between the ability to question data and the quality of the journalism—and it’s not high-level or arcane learning. One Poynter Institute data course was even taught (in slightly modified form) to grade schoolers. You’re a smart cookie, so you can do this. Not to mention the fact that data courses are often self-directed, online, and free (see “Resources” listed below).

Unlike data scientists who face complex questions, large data sets, and need to master concepts like regressions and Fourier transforms, you’re probably going to deal with less complex data. If you regularly need to map out complex edge-node relationships in a huge social graph or tackle big data, then yes, get that master’s degree in the subject or consult a pro. But if you’re up against Google Analytics? You can easily learn how to ask questions and look for answers. Seriously, ask questions and look for answers.

Designers need to be better at data literacy for many of the same reasons we need to work on technical literacy, as Sarah Doody explains. We need to understand what developers can and can’t do, and we need to understand what the data can and can’t do. For example, an A/B test of two different designs can tell you one thing about one thing, but if you don’t understand how data works, you probably didn’t set up the experiment conditions in a way that leads to informative results. (Pro tip: if you want to see how a change affects click-through, don’t test two designs where multiple items differ, and don’t expect the numbers to tell you why that happened.) Again: We need to question the data.

So we’ve defined a need, researched our users, and identified and defined a feature called data literacy. What remains is prototyping. Let’s get into it, shall we?

How to Build Data Literacy by Building Habits#section3

Teaching data literacy is an ongoing topic of academic research and debate, so I’ll leave comprehensive course-building to more capable hands than mine. But together, we can cheaply and easily outline simple habits of critical thought and mathematical practice, and this will get us to, let’s say, 89 percent data literacy. At the least, you’ll be better able to evaluate which data could make your work better, which data should be questioned more thoroughly, and how to talk to metric-happy stakeholders or bosses. (Optional homework: this week, take one metric you track or have been told to track at work, walk through the habits below, and report back.)

Habit one: Check source and context#section4

This is the least you should do when presented with a metric as a fait accompli, whether that metric is from a single study, a politician, or an analytics package.

First, ask about the source of the data (in journalism, this is reflex—“Did the study about the health benefits of smoking come from the National Tobacco Profiteering Association?”). Knowing the source, you can then investigate the second question.

The second question concerns how the data was collected, and what that can tell you—and what it can’t. Let’s say your boss comes in with some numbers about time-on-page, saying “Some pages are more sticky than others. Let’s redesign the others to keep customers on all the other pages longer.” Should you jump to redesign the less-sticky pages, or is there a different problem at play?

It’s simple, and not undermining, to ask how time-on-page was measured and what it means. It could mean a number of things, things that that single metric will never reveal. Things that could be real problems, real advantages, or a combination of the two. Maybe the pages with higher time-on-page numbers simply took a lot longer to load, so potential customers were sitting there as a complex script or crappy CDN was slooooowly drawing things on the not-a-customer-any-more’s screen. Or it could mean some pages had more content. Or it could mean some were designed poorly and users had to figure out what to do next.

How can you find this out? How can you communicate that it’s important to find out? A quick talk with the dev team or running a few observations with real users could lead you to discover what the real problem is and how you can redesign to improve your product.

What you find out could be the difference between good and bad design. And that comes from knowing how a metric is measured, and what it doesn’t measure. The metric itself won’t tell you.

For your third question, ask the size of the sample. See how many users were hitting that site, whether the time-on-page stat was measured for all or some of these users, and whether that’s representative of the usual load. Your design fix could go in different directions depending on the answer. Maybe the metric was from just one user! This is a thing that sometimes happens.

Fourth, think and talk about context. Does this metric depend on something else? For example, might this metric change over time? Then you have to ask over what time period the metric was measured, if that period is sufficient, and whether the time of year when measured might make a difference.

Remember when I said change over time can be a red flag? Let’s say your boss is in a panic, perusing a chart that shows sales from one product page dropping precipitously last month. Design mandates flood your inbox: “We’ve got to promote this item more! Add some eye-catching design, promote it on our home page!”

What can you do to make the right design decisions? Pick a brighter blue for a starburst graphic on that product page?

Maybe it would be more useful to look at a calendar. Could the drop relate to something seasonal that should be expected? Jack o’lantern sales do tend to drop after November 1. Was there relevant news? Apple’s sales always drop before their annual events, as people expect new products to be announced. A plethora of common-sense questions could be asked.

The other key point about data literacy and change is that being data literate can immunize against common errors when looking at change over time. This gets to numeracy.

Habit two: Be numerate#section5

I first learned about numeracy through John Allen Paulos’ book Innumeracy: Mathematical Illiteracy and its Consequences, though the term “innumeracy” was originated by Pulitzer Prize-winning scientist Douglas Hofstadter. Innumeracy is a parallel to illiteracy; it means the inability to reason with numbers. That is, the innumerate can do math but are more likely to trip up when mathematical reasoning is critical. This often happens when dealing with probability and coincidence, with statistics, and with things like percentages, averages, and changes. It’s not just you—these can be hard to sort out sort out! We’re presented with these metrics a lot, but usually given little time to think about them, so brushing up on that bit of math can really help put out (or avoid) a trash fire of bad design decisions.

Consider this: A founder comes in with the news that an app has doubled its market base in the two weeks it’s been available. It’s literally gone up 100 percent in that time. That’s pretty awesome, right? Time to break out the bubbly, right? But what if you asked a few questions and found that this really meant the founder was the first user, then eventually her mom got onto it. That is literally doubling the user base exactly 100 percent.

Of course that’s obvious and simple. You see right off why this startup probably shouldn’t make the capital outlay to acquire a bottle or two juuuust yet. But exactly this kind of error gets overlooked easily and often when the math gets a bit more complex.

Any time you see a percentage, such as “23% more” or “we lost 17%,” don’t act until you’ve put on your math hat. You don’t even need to assume malice; this stuff simply gets confusing fast, and it’s part of your job not to misread the data and then make design decisions based on an erroneous understanding.

Here’s an example from Nicolas Kayser-Bril, who looks into the headline, “Risk of Multiple Sclerosis Doubles When Working at Night“:

“Take 1,000 Germans. A single one will develop MS over his lifetime. Now, if every one of these 1,000 Germans worked night shifts, the number of MS sufferers would jump to two. The additional risk of developing MS when working in shifts is one in 1,000, not 100%. Surely this information is more useful when pondering whether to take the job.”

This is a known issue in science journalism that isn’t discussed enough, and often leads to misleading headlines. Whenever there’s a number suggesting something that affects people, or a number suggesting change, look not just at the percentage but at what this would mean in the real world; do the math and see if the result matches the headline’s intimation. Also ask how the percentage was calculated. How was the sausage made? Lynn Arthur Steen explains how percentages presented to you may not just be the difference of two numbers divided by a number. Base lesson: always learn what your analytics application measures and how it calculates things. Four out of five dentists agree…so that’s, what, 80 percent true?

Averages are another potentially deceptive metric that simple math can help; sometimes it’s barely relevant, if at all. “The average length of a book purchased on Amazon is 234.23 pages” may not actually tell you anything. Sometimes you need to look into what’s being averaged. Given the example “One in every 15 Europeans is illiterate,” Kayser-Bril points out that maybe close to one in 15 Europeans is under the age of seven. It’s good advice to learn the terms “mode,” “median,” and “standard deviation.” (It doesn’t hurt (much), and can make you a more interesting conversationalist at dinner parties!)

Habit three: Check your biases#section6

I know, that sounds horrible. But in this context, we’re talking about cognitive biases, which everyone has (this is why I encourage designers to study psychology, cognition studies, and sociology as much as they can). Though we have biases, it’s how aware we are of these issues and how we deal with them that counts.

It’s out of scope to list and describe them all (just thinking I know them all is probably an example of Dunning-Kruger). We’ll focus on two that are most immediately relevant when you’re handed supposedly-objective metrics and told to design to them. At least, these are two that I most often see, but that may be selection bias.

Selection bias#section7

Any metric or statistical analysis is only as good as (in part) what you choose to measure. Selection bias is when your choice of what to measure isn’t really random or representative. This can come from a conscious attempt to skew the result, from carelessly overlooking context, or due to some hidden process.

One example might be if you’re trying to determine the average height of the adult male in the United States and find it to be 6'4"—oops, we only collected the heights of basketball players. Online opinion polls are basically embodied examples of selection bias, as the readers of a partisan site are there because they already share the site operator’s opinion. Or you may be given a survey that shows 95 percent of users of your startup’s app say they love it, but when you dig in to the numbers, the people surveyed were all grandmothers of the startup team employees (“Oh, you made this, dear? I love it!”). This holds in usability testing, too: if you only select, say, high-level programmers, you may be convinced that a “to install this app, recompile your OS kernel” is a totally usable feature. Or end up with Pied Piper’s UI.

Now, these all seem like “sure, obvs” examples. But selection bias can show up in much more subtle forms, and in things like clinical studies. Dr. Madhukar Pai’s slides here give some great examples — especially check out Slide 47, which shows how telephone surveys have almost built-in selection biases.

So, what’s a designer to do? As you can see from Dr. Pai’s lecture slides, you can quickly get into some pretty “mathy” work, but the main point is that when you’re faced with a metric, after you’ve checked out the context, look at the sample. You can think about the claim on the cookie box in this way. It’s “20% more tasty”? What was the sample, 19 servings of chopped liver and one cookie?

Confirmation bias#section8

Storytelling is a powerful tool. Again, it’s how our brains are wired. But as with all tools, it can be used for good or for evil, and can be intentional or accidental. As designers, we’re told we have to be storytellers: how do people act, how do they meet-cute our product, how do they feel, what’s the character arc? This is how we build our knowledge of the world, by building stories about it. But, as Alberto Cairo explains in The Truthful Art this is closely linked to confirmation bias, where we unconsciously (or consciously) search for, select, shape, remember, interpret, or otherwise torture basic information so that it matches what we already think we know, the stories we have. We want to believe.

Confirmation bias can drive selection bias, certainly. If you only test your design with users who already know how your product works (say, power users, stakeholders, and the people who built the product), you will get distorted numbers and a distorted sense of how usable your product is. Don’t laugh: I know of a very large and popular internet company that only does user re-search with power users and stakeholders.

But even if the discovery process is clean, confirmation bias can screw up the interpretation. As Cairo writes, “Even if we are presented with information that renders our beliefs worthless, we’ll try to avoid looking at it, or we’ll twist it in a way that confirms them. We humans try to reduce dissonance no matter what.” What could this mean for your design practice? What could this mean for your designs when stakeholders want you to design to specific data?

Reading (Numbers) is Fundamental#section9

So, yes. If you can work with a data scientist in your design team, definitely do so. Try to work with her and learn alongside her. But if you don’t have this luxury, or the luxury of studying statistics in depth, think of data literacy as a vital part of your design practice. Mike Monteiro is passionate that designers need to know math, and he’s of course correct, but we don’t need to know math just to calculate visual design. We need to know math enough to know how to question and analyze any metric we’re given.

This is something you can practice in everyday life, especially in an election season. When you see someone citing a study, or quoting a number, ask: What was measured? How was it measured? What was the context? What wasn’t measured? Does that work out in real life? Keep looking up terms like selection bias, confirmation bias, Dunning-Kruger, sample size effect, until you remember them and their application. That is how you build habits, and how you’ll build your data literacy muscles.

I’ve long loved the Richard Feynman quote (that Cairo cites in The Truthful Art): “The first principle is that you must not fool yourself — and you are the easiest person to fool.” Consider always that you might be fooling yourself by blindly accepting any metric handed to you. And remember, the second-easiest person to fool is the person who likely handed you the metric, and is motivated to believe a particular outcome. Data literacy requires honesty, mastering numeracy, and stepping through the habits we’ve discussed. Practice every day with news from politics: does a statistic in the news give you that “of course, that’s how things are” feeling? Take a deep breath, and dig in; do you agree with a policy or action because it’s your political party proposing it? What’s the context, the sample size, the bias?

It’s tough to query yourself this way. But that’s the job. It’s tougher to query someone else this way, whether it’s your boss or your significant other. I can’t help you figure out the politics and social minefield of those. But do try. The quality of your work (and life) may depend on it.


About the Author

Dan Turner

Dan Turner is a semi-former journalist working now in the field of interaction and product design, most recently for PARC. Generally a generalist. He is a UC Berkeley masters graduate of the I School and has taught at SFSU. He’s also mentored at Code for America and many bike races.

7 Reader Comments

  1. If you want to get a better understanding of Dispersion and Standard Deviation, have a look at this animation which I created.

    Dispersion and Standard Deviation aren’t easy topics, many students struggle with them, but this should help you get your head around it.

  2. @SumitB, yes, knowing what the data represent and how everything was collected is key.

    @Barry — thanks so much! That’s great work.

    Admins, can we delete the spam from All Rounder?

  3. @Dan — Thanks for your kind comment. I’m glad to be able to share something helpful. It took me a bit of effort to get my head around the subject before I could create the animation.

  4. Ha ha Trump University….I love it. Unfortunately, some of the biggest data debacles of all time came down a single input error on an excel spreadsheet. Just ask Ken Rogoff and Carmen Reinhart the Harvard economics gurus who published the “excel error that changed history”. Proofread proofread proofread!

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

Nothing Fails Like Success

Our own @zeldman paints the complicated catch-22 that our free, democratized web has with our money-making capitalist roots. As creators, how do we untangle this web? #LetsFixThis