A List Apart

Illustration of a hand pointing with many symbolic noises surrounding it.

Illustration by Justin Mezzell

Multimodal Perception: When Multitasking Works

Word on the street is that multitasking is impossible. The negative press may have started with HCI pioneer Clifford Nass, who published studies showing that people who identify as multitaskers are worse at context switching, worse at filtering out extraneous information, worse at remembering things over the short term, and have worse emotional development than unitaskers.

Article Continues Below

With so much critical attention given to multitasking, it’s easy to forget that there are things our brains can do simultaneously. We’re quite good at multimodal communication: communication that engages multiple senses, such as visual-tactile or audio-visual. Understanding how we process mixed input can influence the design of multimedia presentations, tutorials, and games.

When I began researching multimodal communication, I discovered a field brimming with theories. The discipline is still too new for much standardization to have evolved, but many studies of multimodality begin with Wickens’s multiple resource theory (MRT). And it’s that theory that will serve as a launch point for bringing multimodality into our work.

Wickens’s multiple resource theory

Luckily, Wickens saved us some heavy lifting by writing a paper summarizing the decades of research (PDF) he spent developing MRT. Its philosophical roots, he explains, are in the 1940s through 1960s, when psychologists theorized that time is a bottleneck; according to this view, people can’t process two things simultaneously. But, Wickens explains, such theories don’t hold up when considering “mindless” tasks, like walking or humming, that occupy all of a person’s time but nevertheless leave the person free to think about other things.

Several works from the late 1960s and early 1970s redefine the bottleneck theory, proposing that what is limited is, in fact, cognitive processing power. Following this train of thought, humans are like computers with a CPU that can only deal with a finite amount of information at once. This is the “resource” part of MRT: the limitation of cognitive resources to deal with incoming streams of information. (MRT thus gives credence to the “mobile first” approach; it’s often best to present only key information up front because of people’s limited processing power.)

The “multiple” part of the theory deals with how processing is shared between somewhat separate cognitive resources. I say somewhat separate because even for tasks using seemingly separate resources, there is still a cost of executive control over the concurrent tasks. This is again similar to computer multiprocessing, where running a program on two processors is not twice as efficient as running it on one, because some processing capacity must be allocated to dividing the work and combining the results.

To date, Wickens and others have examined four cognitive resource divisions.

Processing stage

Perception and cognition share a structure separate from the structure used for responding. Someone can listen while formulating a response, but cannot listen very well while thinking very hard. Thus, time-based presentations need ample pauses to let listeners process the message. Video players should have prominent pause buttons; content should be structured to include breaks after key parts of a message.

Visual channel

Focal and ambient visual signals do not drain the same pool of cognitive resources. This difference may result from ambient vision seemingly requiring no processing at all. Timed puzzle games such as Tetris use flashing in peripheral vision to let people know that their previous action was successful—the row was cleared!—even while they’re focusing on the next piece falling.

Processing code

Spatial and verbal processing codes use resources based in separate hemispheres of the brain. This may account for the popularity of grid-based word games, which use both pools of resources simultaneously.

Perceptual modes

It’s easier to process two simultaneous streams of information if they are presented in two different modes—one visual and one auditory, for example. Wickens notes that this relative ease may result from the difficulties of scanning (between two visual stimuli) and masking (of one auditory stimulus by another) rather than from us actually having separate mental structures. Tower defense games are faster paced (and presumably more engaging) when accompanied by an audio component; players can look forward to the next wave of attackers while listening for warning signals near their tower base. Perceptual modes is the cognitive division most applicable to designing multimedia, so it’s the one we’ll look at further.

A million and one other theories

Now that we’ve covered Wickens’s multiple resource theory, let’s look at some of the other theories vying for dominance to explain how people understand multimodal information.

The modality effect (PDF) focuses on the mode (visual, auditory, or tactile) of incoming information and states that we process incoming information in different modes using separate sensory systems. Information is not only perceived in different modes, but is also stored separately; the contiguity effect states that the simultaneous presentation of information in multiple modes supports learning by helping to construct connections between the modes’ different storage areas. An educational technology video, for instance, will be more effective if it includes an audio track to reinforce the visual information.

This effect corresponds with the integration step of Richard Mayer’s generative theory of multimedia learning (PDF), which states that we learn by selecting relevant information, organizing it, and then integrating it. Mayer’s theory in turn depends upon other theories. (If you’re hungry for more background, you can explore Baddeley’s theory of working memory, Sweller’s cognitive load theory, Paivo’s dual-coding theory, and Penney’s separate stream hypothesis.) Dizzy yet? I remember saying something about how this field has too many theories…

What all these theories point to is that people generally understand better, remember better, and suffer less cognitive strain if information is presented in multiple perceptual modes simultaneously. The theories provide academic support for incorporating video into your content, for example, rather than providing only text or text with supporting images (says, ahem, the guy writing only text).

Visual-tactile vs. visual-auditory communication

Theories are all well and good, but application is even better. You may well be wondering how to put the research on multimodal communication to use. The key is to recognize that certain combinations of modes are better suited to some tasks than to others.


Use visual-tactile presentation to support quick responses. It will:

  • reduce reaction time
  • increase performance (measured by completion time)
  • capture attention effectively (for an alert or notification)
  • support physical navigation (by vibrating more when you near a target, for example)


Use visual-auditory presentation to prevent errors and support communication. “Wait, visual-auditory?” you may be thinking. “I don’t want to annoy my users with sound!” It’s worth noting, though, that one of the studies (PDF) found that as long as sounds are useful, they are not perceived as annoying. Visual-auditory presentation will:

Mode combination

You might also select a combination of modes depending on how busy your users are:

  • Visual-tactile presentation is more effective with a high workload or when multitasking.
  • Visual-auditory presentation is more effective with a single task and with a normal workload.

Multimodal tension

A multimodal tug-of-war goes on between the split-attention effect and the redundancy effect. Understanding these effects can help us walk the line between baffling novices with split attention and boring experts with redundancy:

  • The split-attention effect states that sequential presentation in multiple modes is bad for memory, while simultaneous presentation is good. Simultaneity helps memorization because it is necessary to encode information in two modes simultaneously in order to store cross-references between the two in memory.
  • In contrast, presenting redundant information through multiple channels simultaneously can hinder learning by increasing cognitive load without increasing the amount of information presented. Ever try reading a long quote on a slide while a presenter reads the same thing aloud? The two streams of information undermine each other because of the redundancy effect.

Which effect occurs is partially determined by whether users are novices or experts (PDF). Information that is necessary to a novice (suggesting that it should be presented simultaneously to avoid a split-attention effect) could appear redundant to an expert (suggesting that it should be removed to avoid a redundancy effect).

Additionally, modality effects appear only when limiting visual presentation time. When people are allowed to set their own time (examining visual information after the end of the auditory presentation), studied differences disappear. It is thus particularly important to add a secondary modality to your presentation if your users are likely to be in a hurry.

Go forth, multiprocessing human, and prosper

So the next time you hear someone talking about how multitasking is impossible, pause. Consider how multitasking is defined. Consider how multiprocessing may be defined separately. And recognize that sometimes we can make something simpler to learn, understand, or notice by making it more complex to perceive. Sometimes the key to simplifying presentation isn’t to remove information—it’s to add more.

And occasionally, some things are better done one after another. The time has come for you to move on to the next processing stage. Now that you’ve finished reading this article, you have the mental resources to think about it.

No Comments (yet)