Big Data Visualization with Meaning

The web is not the traditional home of data visualization. You might come across a bar chart here or there in your online journey on any given day, but they’ve never been an artifact of web history. It seems like that’s been changing.

Article Continues Below

With the world becoming increasingly data-driven, we’re seeing more and more visualizations make their way onto our web pages and into our design briefs. They help us tell stories that better engage our users, and can even get them to take some kind of meaningful action.

The problem is that these datasets—sometimes so large they’re literally called “big data”—can make visualization with meaning difficult. But that’s something we as designers are equipped to tackle. We just have to know what our users are hoping to gain from viewing and interacting with visualizations, and what we have to do to make their effort worthwhile.

Data has a very strong power to persuade—powerful enough to change users’ everyday behavior, especially when data is informative, clear, and actionable. We should be putting data visualizations to work on our sites, enhancing our designs to show users how data is in service to the story they’ve come to learn about.

Data visualization on the web can be meaningful through allowing people to discover the smaller stories that resonate with them, customizing their user experience instead of putting them on a predetermined path.

Users attempting to interact with large and generally disconnected sets of data while navigating a site or trying to access relevant information end up facing a difficult, if not impossible, task. Our sites lose a certain measure of usability if they aren’t well-designed, even though the web is a natural medium for delivering truly interactive data.

As with all design, the approach we take when creating a user-minded visualization is based on the context and the constraints we have to work with. Good data visualizations—those with meaning—need to be accessible and human even though data is rarely described with those words.

Telling a story#section2

The key to designing visualizations is to focus on something in the dataset that is relatable to and resonates with your users. I stumbled upon this while creating a visualization from the publicly available Open Food Facts dataset, which contains crowd-sourced information on food products from all over the world.

Although the dataset covers an extensive range of information (even down to packaging materials and number of additives), I chose to focus on comparing average sugar consumption among different countries (Fig. 1) because I was personally concerned about that topic. It turned out to be a concern for others as well and became the most popular project for the dataset on Kaggle.

Bar graph depicting quantities of sugar consumption by country
Fig. 1: Average national sugar consumption

Even though I didn’t make extensive use of the dataset in my rough and ugly visualization, what I chose to focus on told a story that resonated with people because most were from the countries listed or had a growing general awareness of high sugar consumption and its effect on health. In retrospect, what’s more personal and important than your health?

Selecting data points that strengthen a story with a positive result (whether that’s eating less sugar or reducing large-scale chemical emissions) can be great, but it’s important to present a story that is as unbiased as possible and to make ethical decisions about which parts of the data we want to use while telling the story.

But what exactly is a story in the context of a data visualization? We can’t kick it off with “once upon a time,” so we have to approach the idea in a different way.

Whitney Quesenbery and Kevin Brooks provide these definitions of a story in their book Storytelling for User Experience:

  • Stories describe the context of a situation
  • Stories can illustrate problems
  • Stories can be used to help people remember
  • Stories can be used to persuade and entertain.

And I would add to the list:

  • Stories can make you question the state of a situation.

Addressing some or all of these attributes is a particular challenge for big datasets because the sheer amount of information can make finding a narrative difficult. But big or not, the principles remain the same. Visualizing any kind of data-driven story that resonates can have a powerful influence on users’ decisions.

It also stirs other questions the user might ask.

For instance, why do certain countries consume higher quantities of sugar? Are they the ones we expected? The information could challenge an assumption or two someone may have had prior to seeing the results. Just remember that visualization can be a stepping stone to further discovery, increasing the user’s knowledge and possibly affecting their everyday choices going forward.

If you’re trying to embed meaning into a large visualization through the story of a dataset’s subsection, it’s important to:

  • Discover what your users care about in the dataset. Make it relevant to their personal needs, desires, and interests.
  • Focus on that subsection ruthlessly. Get rid of anything that doesn’t further the story your visualization is telling.
  • Take care to make ethical, unbiased decisions about which data points you use to create visualizations that might influence your users.
  • Be careful not to give people all the answers; allow them to ask their own questions and make their own discoveries about the data.

This approach allows you to create something that not only resonates at a personal level, but also presents meaning in a way that encourages and allows users to take action.

But we already have a story#section3

Though large, some big datasets already revolve around a single story. An interesting way of dealing with this particular issue is to simultaneously display different aspects of such a dataset, allowing the user to discover that meaning. This is called the “small multiples” technique. (Fig. 2)

Assorted graphs visually illustrating data as seen from memory view, code view, and process view
Fig. 2: Memory stall visualizations from the Rivet project at Stanford.

The cluster of visualizations above, for example, deals with the “story” of memory stall issues on a computer. What I find interesting about the cluster is that the heading of every visualization starts with some variation of “memory stall time.” Despite being separate visualizations, they are linked by the single story they tell and they’re presenting it from simultaneous, distinct perspectives.

It’s possible for perspectives to look completely different from one another if they visualize different kinds of data. For instance, bar charts and area charts can harmoniously coexist if the representations are appropriate for the data they’re showing. The Australian Census Explorer illustrates how this might work (Fig. 3). It allows the user to establish their own narrative through choice of topic, such as language or place.

Screenshot of the native languages list in the Australian Census Explorer
Fig. 3: Given freedom to explore, users inherently craft a personal narrative.

Framing visualizations around a personal topic (like someone’s native language) affects all associated small multiples appropriately; reframing serves to personalize the data. (Fig. 4)

Screenshot comparing two horizontal bar graphs depicting gender and age data for Australians and English-speaking Australians
Fig. 4: Comparison of gender and age data for Australians and English-speaking Australians
Screenshot depicting Country of Birth data, listing percentages for various countries and a global map with lines linking from Australia to each country
Fig. 5: Country of Birth breakdown for Australian citizens
Screenshot comparing two vertical bar graphs depicting income ranges by gender for Australians and for English-speaking Australians
Fig. 6: Income ranges by gender for Australians and English-speaking Australians

Storytelling through interaction#section4

It can be very useful with this approach to include an interaction in one design that is capable of affecting the others—something to help the user see relationships between data points they might not have considered before. This example from essay site Polygraph shows all Kickstarter projects across space, organized here by category and American city. (Fig. 7)

Dot graphs depicting categories, number of projects, and size of projects in a selection of American cities
Fig. 7: Well-designed data visualizations can convey multiple concepts and information in parallel

The visualization is particularly interesting because it allows users to view the relationship of one variable (in this case, the project category) to others, such as American cities or project sizes. (Notice the prevalence of music projects in Nashville and game projects in Austin and Seattle).

The Lens does something similar in its visualization of the human genome (Fig. 8) by allowing users to change views by way of various filters.

Horizontal bars aligned with their respective segments of a human genome sequence
Fig. 8: Data can be filtered to display different views of the human genome.

This can be even more effective for small multiples shown across time. Fig. 9 shows how this approach is used on a fund manager’s website. Changing the time period of an investment fund’s performance also shows how risk rating and the growth of an investment change during that period. By leveraging intuitive web animation, we can view snapshots of the data at precise moments in time.

A stack of overlaid line graphs illustrating different tracking items, plus a vertical slider that reveals changes in the tracking items when moved to different points along the line
Fig. 9: Interacting with one “small multiple” affects others, revealing relationships at distinct points of time

If the dataset is already centered around some kind of overarching story, it can be a good idea to:

  • Display different parts of the dataset in separate visualizations simultaneously
  • Treat these separate visualizations as individuals tailored to the data they’re presenting. (Bar charts and area charts can live together in harmony if the data makes it appropriate.)
  • If there is interaction, ensure that it affects the entirety of your visualization approach so that the relationships between data points are more apparent
  • Apply well-considered web animation techniques to ensure that the interaction is intuitive.

There are too many stories#section5

What do we do when a dataset doesn’t have a single, big story to tell, yet we still need to visualize everything in it?

Although some datasets lack a specific focus (e.g., “memory stall time,” “fund performance,” or “all-Kickstarter-projects-ever”), data points may have internal relationships that reveal bite-sized stories. How do we create actionable meaning for those visualizations?

Simply showing data as-is, even in a visualization that seems to fit, rarely works well. In Fig. 10 we see relationships between Python code packages, but in a way that’s just as messy and incoherent as the data in its natural state. The lack of focus and narrative is notable. (That said, the dataset is extremely large, so a single narrative isn’t actually possible.)

A chart illustrating numerous types of data, with lines from type to type to show relationships
Fig. 10: This visualization presents nothing actionable, despite the tremendous amount of data

Since a single story isn’t possible in this situation, a better approach is to allow users to discover their own story. Your job is to facilitate that via the interaction design of the visualization.

This browser-based design in Fig. 11 (you can explore it here) visualizes code package relationships, too (in this case, JavaScript), but gives users what they need to explore the data in a meaningful way.

A computer-generated 3-D depiction of data relationships
Fig. 11: Use well-designed interactions to help users work with large, multi-narrative datasets.

Again, at first glance the visualization seems to be messy and incoherent—but look closer. Users can investigate any individual package of code, including its personal relationships (listed in the bottom left). A handy search bar has also been incorporated in the top left corner.

What makes this particular visualization more meaningful is that the user can explore it in 3D space via keyboard and mouse. Leveraging this uniquely digital capability in the browser allows users to start discovering their own story in the enormous swarm of data, “moving” toward areas in the visualization that they find more relevant to their interests or needs. (Fig. 12)

Detail view of specific data nodes in a computer-generated 3D depiction of data relationships
Fig. 12: “Moving” intuitively through the data allows users to find meaning that’s personally relevant to them

Once the user finds a package or groups of packages they’re interested in exploring, they can click on one for a specific and focused view of the package in isolation, including its relationships with other packages. A full breakdown of these relationships is posted on the left of the screen, including visual nodes linking directly to the Github page for that code package. (Fig. 13)

Isolated close-up of a specific data note and its associated information in a computer-generated 3-D depiction of data relationships
Fig. 13: Isolation view of a specific package.

This visualization, like the one shown before it, uses the idea of a network in order to display the immensity of the data, but it also uses intuitive interaction and lets the user explore in order to extract personally relevant meaning. It uses the modern advantages of the web to deal with the modern problems of big datasets, much like the following visualization from OpenCorporates. (Fig. 14)

Computer-generated display of country silhouettes sized according to the provided data and depicting lines of relationship from country to country and city to city
Fig. 14: Look for ways to “translate” data into simple and relatable concepts and simple explanations.

This design allows users to zero in on data they care about, choosing where they go and which breadcrumbs offer meaningful insight.

If a dataset needs to be fully visualized but has smaller stories within it, it may be useful to:

  • Show all data, but give users the ability to create chunks or segments they wish to explore
  • Leverage the advantages of being digital. For example, explore how input devices (e.g., keyboard and mouse) can facilitate how users interact with the data.
  • Use visual metaphors that support extensive and intricate relationship associations, such as a tree or network.

Visualization with meaning#section6

Data is powerful in the right hands, and something we’re skilled at presenting in our websites. But toss in words like “big data” or “data visualization” and we second-guess ourselves instead of owning it as part of our workflow. The web is actually a great place for data visualization.

Leveraging the benefits of “digital” environments and tools, we can help users get what they need from large, complicated datasets. They are looking for insights, for meaningful information presented simply, for stories that resonate—for data stories they care about. We can help them find those stories by blending in a few new techniques on our end, such as sub-selections of data, use of small multiples to show relationships between data points, or even allowing user-driven focus on the full dataset.

About the Author

Byron Houwens

Byron Houwens is a designer, developer, writer, and speaker enjoying the sun in Cape Town, South Africa. He’s generally either trying to be fancy in the kitchen, retweeting things at @BHouwens, or thinking about the future relationship between data science and design.

9 Reader Comments

  1. what you guys think about the impact of AI [for example – narrative science] on this data visualization thing. Already big news agencies are using them for this purpose.

  2. Great article. It touches on a lot of my favorite soap-boxes: story-telling, combining visuals to create a more holistic view, understanding user needs, etc.

    One nit: Small multiples usually refers to a grid of identical charts with a different dimension or filter applied to each cell (https://en.wikipedia.org/wiki/Small_multiple).

    What you describe is usually referred to as a dashboard—albeit a sophisticated one with visual queries, filters, etc.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

I am a creative.

A List Apart founder and web design OG Zeldman ponders the moments of inspiration, the hours of plodding, and the ultimate mystery at the heart of a creative career.
Career