As Alexa, Cortana, Siri, and even customer support chat bots become the norm, we have to start carefully considering not only how our content looks but how it could sound. We can—and should—use HTML and ARIA to make our content structured, sensible, and most importantly, meaningful.
Most bots and digital assistants work from specially-coded data sets, APIs, and models, but there are more than 4.5 billion pages of content on the web, trapped, in many cases, within our websites. Articles, stories, blog posts, educational materials, books, and marketing messages—all on the web, but in many cases unusable in a non-visual context. A few projects—search spiders most notably—are working to turn our messy, unstructured web pages into something usable. But we can do more—a lot more—to facilitate that and enable our web pages to be more usable by both real people and the computers that power voice-based user experiences.
Let’s release our content from the screen and empower it to go anywhere and everywhere. We can help it find its way into virtual assistants and other voice-response technologies—and even voiceless chat bots—without having to code and re-code that content over and over into multiple, redundant formats. We can even enable our users to actively engage with our content by filling in forms and manipulating widgets on the web purely via voice. It’s all possible, but we need to start by taking a long, hard look at our markup.
I’m <em>really</em> happy to see you.
Sure, it is visually rendered as italics, but it also adds emphasis to the content within. HTML is chock full of elements that are useful for conveying meaning, nuance, and relationships. Being aware of them enables us to author more expressive documents. Ignoring them can undermine the usability of the content we’re marking up. When we create a web page, we need to be mindful of the conversation we are creating with our customers in the process, and choose elements with intent and care.
One of the best indicators for how HTML will make it into our virtual assistants is another assistive technology: screen readers. Not only do screen readers do as their name implies, they also enable users to rapidly navigate a page in various ways, and provide mechanisms that translate visual design constructs—proximity, proportion, etc.—into useful information. At least they do when documents are authored thoughtfully.
So, let’s jump in and look at some solid examples of how we can both create more meaningful documents and empower them to be more usable in “headless” UIs.
We’ll start by looking at what are called “phrasing” elements. The emphasis you saw earlier is an example of this element type. We used to call them “inline” elements because, by default, they are visibly displayed as inline text. But “phrasing” is a much more accurate description of the role they play in our web pages, because, well, they mark up phrases.
We saw this example earlier:
I’m <em>really</em> happy to see you.
Here, the word “really” is marked for emphasis. I’m unaware of any current speech synthesizer that audibly emphasizes text like we do, but it’s still early days in the grand scheme of things. I’m sure it’ll happen—there’s been a lot of focus on building more human-sounding voices—and it could sound something like this:
Sometimes emphasis is not enough. When we want to indicate that content is vital for our customers to pay attention to, the
strong element is the right way to go. “Strong” means “of strong importance.”
Please fill out the form below to contact us. <strong>All fields are required.</strong>
strong are displayed as italics (as mentioned previously) and bold, respectively.
Now we also have the
b elements, which are rendered exactly the same as
strong, respectively. In the early days of the web, that led many of us—myself included—to believe they were interchangeable. And with
i being shorter to write, they proliferated on the web. Semantically, however, the
b elements are quite different from their doppelgängers.
i element is similar to the emphasis element, but more generic. It is used to indicate an alternate voice or mood. It could be used to indicate sarcasm, idiomatic remarks, and shifts in language.
It's a terrible movie and it made $200 million. <i>Go figure!</i> She is admired for her energy and <i lang="fr">joie de vivre</i>.
In the latter example, you might also notice that I’ve indicated that the phrase “joie de vivre” is in another language—French—using the
lang attribute. This attribute lets the digital assistant know it may want to shift its pronunciation.
Admittedly, replicating this using the
speechSynthesis API is still a little rough, but with time, this too will no doubt improve.
b element is used for content that should be set apart—or “stylistically offset”—from the surrounding text. It does not indicate that the phrase is of any greater importance though. I like to use it for names of people and products. Keywords would be another option. Books, films, and other media have their own element, which I’ll get to in a moment.
For 12 years and running, over 100,000 companies have adopted the <b>Basecamp</b> way of working. Not just tried, but signed up, said “ah-ha!”, and never looked back. There’s nothing else like <b>Basecamp</b>.
b element is a lot like a
span—generic phrasing content albeit with a shorter tag.
Since I mentioned movies and books, I’ll quickly bring up the
cite element, which is for the title of cited or referenced works.
I wrote the book <cite>Adaptive Web Design</cite>. If you like this article, you’ll find in-depth information about semantics (and a whole lot more) in there.
HTML has other specialized phrasing constructs, such as
abbr for abbreviations and acronyms. Traditionally, we’d recommended using title to provide an expansion:
<abbr title="Hypertext Markup Language">HTML</abbr> is the standard markup language for creating web pages and web applications.
Sadly—as with many things on the web—black hat SEO practices involving
title spurred screen readers to ignore the attribute altogether. Visual browsers do still provide tooltips, so they’re not completely useless, but given that screen readers don’t pay attention to the
title attribute currently, it’s pretty unlikely they will be surfaced by a virtual assistant.
To be honest, it’s best to avoid
title altogether. For the purposes of absolute clarity, you should introduce and explain important abbreviations and acronyms the first time they are used. There’s even an element that signals a defining context:
<dfn id="dfn-html">Hypertext Markup Language (HTML)</dfn> is the standard markup language for creating web pages and web applications.
For more technical writing, the
code elements can be quite useful. They indicate keys a user might need to press and words and phrases that are used in writing software or coding documents:
Press <kbd>Tab</kbd> to move from link to link within a document. The <code>kbd</code> element is used to indicate keyboard key names.
Then there’s the
span element, which is used for generic phrases, as I noted earlier. It’s a meaningless element, so will not be spoken in any way differently by default.
There is <span>nothing particularly interesting</span> in this sentence.
There are more phrasing elements, but these are the ones you’re most likely to want in most projects.
Links are also phrasing elements, but I want to call them out specifically because they provide a much richer set of options for fine-tuning how our users interact with our pages.
The primary way we use links is to connect related content. It’s incredibly important to choose meaningful words and phrases as link text. Links that read generically like “click here” and “read more” are not terribly useful, especially when the text of every link is being read out to you—which is a key way headless UI users skim web pages. Make it clear where you are linking. Restructure sentences if you need to in order to provide good link text.
If you are drawn to “read more” style links for their brevity, you can have your cake and eat it too by including non-visible text within a link. This gives you brief, uniform links from a visual standpoint, but also lets you provide context in headless scenarios. Here’s an example from my site’s navigation. I’ve broken it up across a few lines to make it a little easier to follow:
<a href="/speaking-engagements/"> <b class="hidden">A List of My</b> Speaking <b class="hidden">Engagements</b> </a>
Within the link, I have two
b elements classified as “hidden.” In my CSS, I hide the content within them from sighted users, but I hide them in a way that they remain available to assistive technology. So a sighted user will only see “speaking,” but a screen reader or digital assistant will read “a list of my speaking engagements.”
You could also offer an expansion with
aria-label on the anchor element. If that “aria-” bit in
aria-label looks weird to you, it comes from the Accessible Rich Internet Applications (ARIA) spec, an ongoing effort to map complex operating-system-like UI constructs into accessible ones. I chose the hidden text route to give myself the flexibility to display the hidden content in certain scenarios.
Some of you may be wondering why I didn’t bring up
aria-label when I mentioned the
abbr element. It seems like a good fit, and the
aria-label spec currently allows the attribute on
abbr elements. The issue isn’t the spec, but rather the reality that the info in
aria-label isn’t always exposed by browsers or sought out by assistive technology on elements like
abbr. With good reason, they’ve been much more focused on exposing
aria-label (and it’s kin) on interactive elements, landmarks, and widgets.
It’s worth noting that hidden text in links can cause issues for folks who rely on a combination of screens and dictation software to interact with their computers. If the link text that’s displayed does not match the actual link text in the markup, a user saying the visible link text—like the word “Speaking” in the case of my site’s navigation—won’t actually activate the link. It’s also worth reiterating the importance of quality link text; don’t use
aria-label to paper over poorly-worded links or unnecessary redundancy like “read more.”
We can also use links to reference content within the current document or even at a specifically-identified position in another document:
To illustrate the concept of layering styles, perhaps it’s best to start at the beginning: with no style applied. <a href="#figure-3-3">Figure 3.3</a> shows the lodging article in Safari with only the default browser styles applied. … <figure id="figure-3-3"> … </figure>
At the tail end of this code sample, we have a
figure element that is referenced elsewhere in the document. Rather than leaving it up to the reader to find “Figure 3.3,” we can use a fragment identifier to jump the reader directly to the reference. Adding a unique id attribute to each important element in your design makes it easy for you—or others—to link directly to them.
As with the
i element example I shared earlier, you can inform your readers about the language of a linked page using
<a href="…" hreflang="es"><i lang="es"> <b class="hidden">Lea esta página en</b> español </i></a>
That’s Spanish for “read this page in Spanish,” and the link points to a Spanish-language translation of the page. The hidden content approach is in use here, too, with sighted users only seeing “español.”
You can indicate the kind of content being linked to, using the
<a href="giant.mp4" type="video/mp4">Download this movie</a>
And we also have the
download keyword, which informs the browser that the file in question should be downloaded rather than presented. Again, a simple attribute that makes a simple HTML document capable of doing so much more:
<a href="giant.mp4" type="video/mp4" download>Download this movie</a>
When encountering this type of link in a voice context, your digital assistant could prompt you to save the file to a connected storage account, like Dropbox. That’s pretty cool, but it’s worth noting that browsers will ignore the
download attribute on cross-origin links for security purposes. Unfortunately that means you can’t use this approach to download files from your Content Delivery Network (CDN).
Anchor elements also support non-web “pseudo” protocols. Two of the most common examples are “mailto:” for email links and “tel:” for phone numbers, but “sms:” and “webcal:” are also common.
<a href="mailto:email@example.com">Send me an email</a> <a href="tel:18009346489">Call Comcast Customer Service</a>
Some operating systems (and browsers) allow installed apps to register custom protocols that can provide access to in-app functionality. A word of caution though: unrecognized protocols may prompt the user to search for an application that can use it.
All of this phrasing content is great, but I’ve spent a good deal of time in the weeds. Let’s pull back a bit and look at documents themselves.
As you’re no doubt aware, headless UIs place a greater cognitive load on our users. It’s hard to keep track of where you are in an interface when you can’t see it. It can also be challenging to move around when you can’t gather information about the interface based on visual cues. The more complex an interface is, the more challenging this becomes.
The same is true in visual interfaces, which is why “mobile first” thinking encourages us to focus each page on a single task. This reduces the noise and raises the signal. But most web pages are the antithesis of clear and straightforward. As our screen sizes enlarged, we found more stuff to fill that space. Sharing links, related content, cross-promotions, and so on. Sometimes it’s easy to lose sight of the actual content.
To combat this, screen readers provide numerous mechanisms that enable users to gather information about the UI and move through it efficiently. One of the most common involves moving the focus carat from one interactive element to another. Traditionally that movement is done via the keyboard Tab key, but it’s also possible via voice using keywords like “next” and “previous.” In most documents, users are moving from link to link. This is why it’s so important to offer informative link text.
<p>This twist is what <a href="https://en.wikipedia.org/wiki/John_Harsanyi">John Harsanyi</a>—an early game theorist—refers to as the “<a href="https://en.wikipedia.org/wiki/Veil_of_ignorance">Veil of Ignorance</a>,” and what Rawls found, time and time again, was that individuals participating in the experiment would gravitate toward creating the most egalitarian societies.</p>
It’s worth noting that form elements—buttons, inputs, etc.—are also part of the default tab order of a web page.
Elements that would not traditionally be focusable can be included in the tab order by adding a
tabindex attribute with a value of “0” (zero) to them. This ensures critical interface components are not accidentally bypassed by users who are skimming an interface by tabbing. Incidentally, it can also give sighted users keyboard control over scrollable elements.
Another mode of document traversal is browsing by heading. The various heading levels in HTML create a natural document outline, and assistive technologies can enable users to skim content using these headings:
<h1>This is the title of the page</h1> … <h2>This titles a section</h2> … <h3>This titles a subsection</h3> … etc.
Since only the contents of the heading elements are read out in this mode, it’s best to avoid cutesy marketing phrases, and stick to summarizing the contents of a section.
More recently, document “landmarks” have come along, providing quick access to key parts of the page. Landmark elements were first introduced as part of ARIA. Using the
role attribute, you can define the function of specific regions of a page. Consider the following:
<div id="nav"> <ul> <li> <a href="/about/"><b class="hidden">A Bit </b>About<b class="hidden"> Me</b></a> </li> … </ul> </div>
In this example, the navigation list is sitting in a
div with an
id of “nav.” While that’s a meaningful identifier for the purposes of styling, scripting, and anchoring, the
div is not actually exposed to assistive technology as navigation. Adding a
role of “navigation”, however, makes that function explicit:
<div id="nav" role="navigation"> <ul> <li> <a href="/about/"><b class="hidden">A Bit </b>About<b class="hidden"> Me</b></a> </li> … </ul> </div>
There are numerous role values that qualify as landmarks:
Landmarks also give users the opportunity to jump directly to a location within an interface, which is incredibly helpful. In a voice context, a user might be able to ask their digital assistant to “read me the navigation for this page” or “search for wooden baby toys,” and the assistant could use these landmarks to quickly respond to those commands.
It’s worth noting that most of these landmarks have equivalent HTML elements. This is because HTML5 and ARIA were being developed at the same time, and both were looking to address the same limitations of the web. Here’s a rundown of ARIA landmark roles with HTML equivalents:
- banner – first
headerelement not inside sectioning content
- navigation –
- main –
- complementary –
- contentinfo – first
footerelement not inside sectioning content
Each HTML5 element shown here is automatically assigned its corresponding ARIA
role by modern browsers and is recognized by modern assistive technologies. However, in older browser and assistive technology combinations, the automatic role assignment may not happen. That’s why it’s not uncommon to see
nav elements with a “navigation”
role or similar even though validators will flag it as unnecessary.
One last bit I want to touch on before I wrap up is the
<div> This is simply a generic division of content. </div>
We often employ a
div when we want to group some elements together. That’s fine, but
div is a meaningless element that adds nothing to the interface in terms of context. By contrast, other organizational elements do add value to a page:
p– a paragraph; a voice synthesizer will naturally pause between them
ol– a list of items whose order matters
ul– a list of items whose order doesn’t matter
li– an item in a list
dl– a list of terms and their associated descriptions
dt– a term described within a description list
dd– a description of a term (or terms) in a description list
blockquote– a long piece of quoted content
figure– referenced content (images, tables, etc.)
figcaption– the caption for a figure
Some of these are among the elements categorized as “flow” content. At a higher level, there are numerous organizational elements to choose from:
article– a piece of content that can stand on its own
section– a section of a document or article
header– preamble content for a document, article, or section
footer– supplementary information for a document, article, or section
main– the primary content of a document
nav– navigational content
aside– complementary content
There are a ton of meaningful elements out there that can enable our digital assistants to do more for our customers. And the more we use them, the more useful our assistants become, and the more powerful our users feel. For instance, using
article and heading elements can enable voice commands like “Read me the top three headlines in the New York Times today” without involving any sort of specialized data feed.
div gets you none of these benefits.
HTML is a truly robust and expressive language that is often overlooked and undervalued, but it has the incredible potential to nurture conversations with our users without requiring a lot of effort on our part. Simply taking the time to code web pages well will enable our sites to speak to our customers like they speak to each other. Thinking about how our sites are experienced as headless interfaces now will set the stage for more natural interactions between the real world and the digital one.