Semantics to Screen Readers

As a child of the ’90s, one of my favorite movie quotes is from Harriet the Spy: “there are as many ways to live as there are people in this world, and each one deserves a closer look.” Likewise, there are as many ways to browse the web as there are people online. We each bring unique context to our web experience based on our values, technologies, environments, minds, and bodies.

Article Continues Below

Assistive technologies (ATs), which are hardware and software that help us perceive and interact with digital content, come in diverse forms. ATs can use a whole host of user input, ranging from clicks and keystrokes to minor muscle movements. ATs may also present digital content in a variety of forms, such as Braille displays, color-shifted views, and decluttered user interfaces (UIs).

One more commonly known type of AT is the screen reader. Programs such as JAWS, Narrator, NVDA, and VoiceOver can take digital content and present it to users through voice output, may display this output visually on the user’s screen, and can have Braille display and/or screen magnification capabilities built in.

If you make websites, you may have tested your sites with a screen reader. But how do these and other assistive programs actually access your content? What information do they use? We’ll take a detailed step-by-step view of how the process works.

(For simplicity we’ll continue to reference “browsers” and “screen readers” throughout this article. These are essentially shorthands for “browsers and other applications,” and “screen readers and other assistive technologies,” respectively.)

The semantics-to-screen-readers pipeline#section2

Accessibility application programming interfaces (APIs) create a useful link between user applications and the assistive technologies that wish to interact with them. Accessibility APIs facilitate communicating accessibility information about user interfaces (UIs) to the ATs. The API expects information to be structured in a certain way, so that whether a button is properly marked up in web content or is sitting inside a native app taskbar, a button is a button is a button as far as ATs are concerned. That said, screen readers and other ATs can do some app-specific handling if they wish.

On the web specifically, there are some browser and screen reader combinations where accessibility API information is supplemented by access to DOM structures. For this article, we’ll focus specifically on accessibility APIs as a link between web content and the screen reader.

Here’s the breakdown of how web content reaches screen readers via accessibility APIs:

The web developer uses host language markup (HTML, SVG, etc.), and potentially roles, states, and properties from the ARIA suite where needed to provide the semantics of their content. Semantic markup communicates what type an element is, what content it contains, what state it’s in, etc.

The browser rendering engine (alternatively referred to as a “user agent”) takes this information and maps it into an accessibility API. Different accessibility APIs are available on different operating systems, so a browser that is available on multiple platforms should support multiple accessibility APIs. Accessibility API mappings are maintained on a lower level than web platform APIs, so web developers don’t directly interact with accessibility APIs.

The accessibility API includes a collection of interfaces that browsers and other apps can plumb into, and generally acts as an intermediary between the browser and the screen reader. Accessibility APIs provide interfaces for representing the structure, relationships, semantics, and state of digital content, as well as means to surface dynamic changes to said content. Accessibility APIs also allow screen readers to retrieve and interact with content via the API.

Again, web developers don’t interact with these APIs directly; the rendering engine handles translating web content into information useful to accessibility APIs.

Examples of accessibility APIs#section3

Windows: Microsoft Active Accessibility (MSAA), extended with another API called IAccessible2 (IA2)
Windows: UI Automation (UIA), the Microsoft successor to MSAA. A browser on Windows can choose to support MSAA with IA2, UIA, or both.
MacOS: NSAccessibility (AXAPI)
Linux/Gnome: Accessibility Toolkit (ATK) and Assistive Technology Service Provider Interface (AT-SPI). This case is a little different in that there are actually two separate APIs: one through which browsers and other applications pass information along to (ATK) and one that ATs then call from (AT-SPI).

The screen reader uses client-side methods from these accessibility APIs to retrieve and handle information exposed by the browser. In browsers where direct access to the Document Object Model (DOM) is permitted, some screen readers may also take additional information from the DOM tree. A screen reader can also interact with apps that use differing accessibility APIs.

No matter where they get their information, screen readers can dream up any interaction modes they want to provide to their users (I’ve provided links to screen reader commands at the end of this article). Testing by site creators can help identify content that feels awkward in a particular navigation mode, such as multiple links with the same text (“Learn more”), as one example.

Example of this pipeline: surfacing a button element to screen reader users#section4

Let’s suppose for a moment that a screen reader wants to understand what object is next in the accessibility tree (which I’ll explain further in the next section), so it can surface that object to the user as they navigate to it. The flow will go a little something like this:

Diagram showing the client (screen reader) making a call to the accessibility API, which passes along the request to the provider (browser), which checks the content in the web document, which sends the information back up the chain — Diagram illustrating the steps involved in presenting the next object in a document; detailed list follows

The screen reader requests information from the API about the next accessible object, relative to the current object.
The API (as an intermediary) passes along this request to the browser.
At some point, the browser references DOM and style information, and discovers that the relevant element is a non-hidden button: <button>Do a thing</button>.
The browser maps this HTML button into the format the API expects, such as an accessible object with various properties: Name: Do a thing, Role: Button.
The API returns this information from the browser to the screen reader.
The screen reader can then surface this object to the user, perhaps stating “Button, Do a thing.”

Suppose that the screen reader user would now like to “click” this button. Here’s how their action flows all the way back to web content:

Diagram showing a user using a 'primary action' command to a client (screen reader), which passes the command to the accessibility API, which passes the command along to the provider (browser), which passes the command as a click event to the web document — Diagram illustrating the steps involved in routing a screen reader click to web content; detailed list follows

The user provides a particular screen reader command, such as a keystroke or gesture.
The screen reader calls a method into the API to invoke the button.
The API forwards this interaction to the browser.
How a browser may respond to incoming interactions depends on the context, but in this case the browser can raise this as a “click” event through web APIs. The browser should give no indication that the click came from an assistive technology, as doing so would violate the user’s right to privacy.
The web developer has registered a JavaScript event listener for clicks; their callback function is now executed as if the user clicked with a mouse.

Now that we have a general sense of the pipeline, let’s go into a little more detail on the accessibility tree.

The accessibility tree#section5

Screenshot showing the accessibility tools in Microsoft Edge — Dev Tools in Microsoft Edge showing the DOM tree and accessibility tree side by side; there are more nodes in the DOM tree

The accessibility tree is a hierarchical representation of elements in a UI or document, as computed for an accessibility API. In modern browsers, the accessibility tree for a given document is a separate, parallel structure to the DOM tree. “Parallel” does not necessarily mean there is a 1:1 match between the nodes of these two trees. Some elements may be excluded from the accessibility tree, for example if they are hidden or are not semantically useful (think non-focusable wrapper divs without any semantics added by a web developer).

This idea of a hierarchical structure is somewhat of an abstraction. The definition of what exactly an accessibility tree is in practice has been debated and partially defined in multiple places, so implementations may differ in various ways.

For example, it’s not actually necessary to generate accessible objects for every element in the DOM whenever the DOM tree is constructed. As a performance consideration, a browser could choose to deal with only a subset of objects and their relationships at a time—that is, however much is necessary to fulfill the requests coming from ATs. The rendering engine could make these computations during all user sessions, or only do so when assistive technologies are actively running.

Generally speaking, modern web browsers wait until after style computation to build up any accessible objects. Browsers wait in part because generated content (such as ::before and ::after) can contain text that can participate in calculation of the accessible object’s name. CSS styles can also impact accessible objects in other various ways: text styling can come through as attributes on accessible text ranges. Display property values can impact the computation of line text ranges. These are just a few ways in which style can impact accessibility semantics.

Browsers may also use different structures as the basis for accessible object computation. One rendering engine may walk the DOM tree and cross-reference style computations to build up parallel tree structures; another engine may use only the nodes that are available in a style tree in order to build up their accessibility tree.

User agent participants in the standards community are currently thinking through how we can better document our implementation details, and whether it might make sense to standardize more of these details further down the road.

Let’s now focus on the branches of this tree, and explore how individual accessibility objects are computed.

Building up accessible objects#section6

From API to API, an accessible object will generally include a few things:

Role, or the type of accessible object (for example, Button). The role tells a user how they can expect to interact with the control. It is typically presented when screen reader focus moves onto the accessible object, and it can be used to provide various other functionalities, such as skipping around content via one type of object.
Name, if specified. The name is an (ideally short) identifier that better helps the user identify and understand the purpose of an accessible object. The name is often presented when screen focus moves to the object (more on this later), can be used as an identifier when presenting a list of available objects, and can be used as a hook for functionalities such as voice commands.
Description and/or help text, if specified. We’ll use “Description” as a shorthand. The Description can be considered supplemental to the Name; it’s not the main identifier but can provide further information about the accessible object. Sometimes this is presented when moving focus to the accessible object, sometimes not; this variation depends on both the screen reader’s user experience design and the user’s chosen verbosity settings.
Properties and methods surfacing additional semantics. For simplicity’s sake, we won’t go through all of these. For your awareness, properties can include details like layout information or available interactions (such as invoking the element or modifying its value).

Let’s walk through an example using markup for a simple mood tracker. We’ll use simplified property names and values, because these can differ between accessibility APIs.

<form>
  <label for="mood">On a scale of 1–10, what is your mood today?</label>
  <input id="mood" type="range"
       min="1" max="10" value="5"
       aria-describedby="helperText" />
  <p id="helperText">Some helpful pointers about how to rate your mood.</p>
  <!-- Using a div with button role for the purposes of showing how the accessibility tree is created. Please use the button element! -->
  <div tabindex="0" role="button">Log Mood</div>
</form>

First up is our form element. This form doesn’t have any attributes that would give it an accessible Name, and a form landmark without a Name isn’t very useful when jumping between landmarks. Therefore, HTML mapping standards specify that it should be mapped as a group.

Here’s the beginning of our tree:

Role: Group

Next up is the label. This one doesn’t have an accessible Name either, so we’ll just nest it as an object of role “Label” underneath the form:

Role: Group
- Role: Label

Let’s add the range input, which will map into various APIs as a “Slider.” Due to the relationship created by the for attribute on the label and id attribute on the input, this slider will take its Name from the label contents. The aria-describedby attribute is another id reference and points to a paragraph with some text content, which will be used for the slider’s Description. The slider object’s properties will also store “labelledby” and “describedby” relationships pointing to these other elements. And it will specify the current, minimum, and maximum values of the slider. If one of these range values were not available, ARIA standards specify what should be the default value. Our updated tree:

Role: Group
- Role: Label
- Role: Slider
  Name: On a scale of 1–10, what is your mood today?
  Description: Some helpful pointers about how to rate your mood.
  LabelledBy: [label object]
  DescribedBy: helperText
  ValueNow: 5
  ValueMin: 1
  ValueMax: 10

The paragraph will be added as a simple paragraph object (“Text” or “Group” in some APIs):

Role: Group
- Role: Label
- Role: Slider
  Name: On a scale of 1–10, what is your mood today?
  Description: Some helpful pointers about how to rate your mood.
  LabelledBy: [label object]
  DescribedBy: helperText
  ValueNow: 5
  ValueMin: 1
  ValueMax: 10
- Role: Paragraph

The final element is an example of when role semantics are added via the ARIA role attribute. This div will map as a Button with the name “Log Mood,” as buttons can take their name from their children. This button will also be surfaced as “invokable” to screen readers and other ATs; special types of buttons could provide expand/collapse functionality (buttons with the aria-expanded attribute), or toggle functionality (buttons with the aria-pressed attribute). Here’s our tree now:

Role: Group
- Role: Label
- Role: Slider
  Name: On a scale of 1–10, what is your mood today?
  Description: Some helpful pointers about how to rate your mood.
  LabelledBy: [label object]
  DescribedBy: helperText
  ValueNow: 5
  ValueMin: 1
  ValueMax: 10
- Role: Paragraph
- Role: Button
  Name: Log Mood

On choosing host language semantics#section7

Our sample markup mentions that it is preferred to use the HTML-native button element rather than a div with a role of “button.” Our buttonified div can be operated as a button via accessibility APIs, as the ARIA attribute is doing what it should—conveying semantics. But there’s a lot you can get for free when you choose native elements. In the case of button, that includes focus handling, user input handling, form submission, and basic styling.

Aaron Gustafson has what he refers to as an “exhaustive treatise” on buttons in particular, but generally speaking it’s great to let the web platform do the heavy lifting of semantics and interaction for us when we can.

ARIA roles, states, and properties are still a great tool to have in your toolbelt. Some good use cases for these are

providing further semantics and relationships that are not naturally expressed in the host language;
supplementing semantics in markup we perhaps don’t have complete control over;
patching potential cross-browser inconsistencies;
and making custom elements perceivable and operable to users of assistive technologies.

Notes on inclusion or exclusion in the tree#section8

Standards define some rules around when user agents should exclude elements from the accessibility tree. Excluded elements can include those hidden by CSS, or the aria-hidden or hidden attributes; their children would be excluded as well. Children of particular roles (like checkbox) can also be excluded from the tree, unless they meet special exceptions. The full rules can be found in the “Accessibility Tree” section of the ARIA specification. That being said, there are still some differences between implementers, some of which include more divs and spans in the tree than others do.

Notes on name and description computation#section9

How names and descriptions are computed can be a bit confusing. Some elements have special rules, and some ARIA roles allow name computation from the element’s contents, whereas others do not. Name and description computation could probably be its own article, so we won’t get into all the details here (refer to “Further reading and resources” for some links). Some short pointers:

aria-label, aria-labelledby, and aria-describedby take precedence over other means of calculating name and description.
If you expect a particular HTML attribute to be used for the name, check the name computation rules for HTML elements. In your scenario, it may be used for the full description instead.
Generated content (::before and ::after) can participate in the accessible name when said name is taken from the element’s contents. That being said, web developers should not rely on pseudo-elements for non-decorative content, as this content could be lost when a stylesheet fails to load or user styles are applied to the page.

When in doubt, reach out to the community! Tag questions on social media with “#accessibility.” “#a11y” is a common shorthand; the “11” stands for “11 middle letters in the word ‘accessibility.’” If you find an inconsistency in a particular browser, file a bug! Bug tracker links are provided in “Further reading and resources.”

Not just accessible objects#section10

Besides a hierarchical structure of objects, accessibility APIs also offer interfaces that allow ATs to interact with text. ATs can retrieve content text ranges, text selections, and a variety of text attributes that they can build experiences on top of. For example, if someone writes an email and uses color alone to highlight their added comments, the person reading the email could increase the verbosity of speech output in their screen reader to know when they’re encountering phrases with that styling. However, it would be better for the email author to include very brief text labels in this scenario.

The big takeaway here for web developers is to keep in mind that the accessible name of an element may not always be surfaced in every navigation mode in every screen reader. So if your aria-label text isn’t being read out in a particular mode, the screen reader may be primarily using text interfaces and only conditionally stopping on objects. It may be worth your while to consider using text content—even if visually hidden—instead of text via an ARIA attribute. Read more thoughts on aria-label and aria-labelledby.

Accessibility API events#section11

It is the responsibility of browsers to surface changes to content, structure, and user input. Browsers do this by sending the accessibility API notifications about various events, which screen readers can subscribe to; again, for performance reasons, browsers could choose to send notifications only when ATs are active.

Let’s suppose that a screen reader wants to surface changes to a live region (an element with role="alert" or aria-live):

Diagram showing a client (screen reader), which is already subscribed to live region events and can request more info about the live region, which receives a notification from the accessibility API, which gets a notification that a live region has changed from the provider (browser), which has a live region changed by the web document — Diagram illustrating the steps involved in announcing a live region via a screen reader; detailed list follows

The screen reader subscribes to event notifications; it could subscribe to notifications of all types, or just certain types as categorized by the accessibility API. Let’s assume in our example that the screen reader is at least listening to live region change events.
In the web content, the web developer changes the text content of a live region.
The browser (provider) recognizes this as a live region change event, and sends the accessibility API a notification.
The API passes this notification along to the screen reader.
The screen reader can then use metadata from the notification to look up the relevant accessible objects via the accessibility API, and can surface the changes to the user.

ATs aren’t required to do anything with the information they retrieve. This can make it a bit trickier as a web developer to figure out why a screen reader isn’t announcing a change: it may be that notifications aren’t being raised (for example, because a browser is not sending notifications for a live region dynamically inserted into web content), or the AT is not subscribed or responding to that type of event.

Testing with screen readers and dev tools#section12

While conformance checkers can help catch some basic accessibility issues, it’s ideal to walk through your content manually using a variety of contexts, such as

using a keyboard only;
with various OS accessibility settings turned on;
and at different zoom levels and text sizes, and so on.

As you do this, keep in mind the Web Content Accessibility Guidelines (WCAG 2.1), which give general guidelines around expectations for inclusive web content. If you can test with users after your own manual test passes, all the better!

Robust accessibility testing could probably be its own series of articles. In this one, we’ll go over some tips for testing with screen readers, and catching accessibility errors as they are mapped into the accessibility API in a more general sense.

Screen reader testing#section13

Screen readers exist in many forms: some are pre-installed on the operating system and others are separate applications that in some cases are free to download. The WebAIM screen reader user survey provides a list of commonly used screen reader and browser combinations among survey participants. The “Further reading and resources” section at the end of this article includes full screen reader user docs, and Deque University has a great set of screen reader command cheat sheets that you can refer to. Some actions you might take to test your content:

Read the next/previous item.
Read the next/previous line.
Read continuously from a particular point.
Jump by headings, landmarks, and links.
Tab around focusable elements only.
Get a summary of all elements of a particular type within the page.
Search the page for specific content.
Use table-specific commands to interact with your tables.
Jump around by form field; are field instructions discoverable in this navigational mode?
Use keyboard commands to interact with all interactive elements. Are your JavaScript-driven interactions still operable with screen readers (which can intercept key input in certain modes)? WAI-ARIA Authoring Practices 1.1 includes notes on expected keyboard interactions for various widgets.
Try out anything that creates a content change or results in navigating elsewhere. Would it be obvious, via screen reader output, that a change occurred?

Tracking down the source of unexpected behavior#section14

If a screen reader does not announce something as you’d expect, here are a few different checks you can run:

Does this reproduce with the same screen reader in multiple browsers on this OS? It may be an issue with the screen reader or your expectation may not match the screen reader’s user experience design. For example, a screen reader may choose to not expose the accessible name of a static, non-interactive element. Checking the user docs or filing a screen reader issue with a simple test case would be a great place to start.
Does this reproduce with multiple screen readers in the same browser, but not in other browsers on this OS? The browser in question may have an issue, there may be compatibility differences between browsers (such as a browser doing extra helpful but non-standard computations), or a screen reader’s support for a specific accessibility API may vary. Filing a browser issue with a simple test case would be a great place to start; if it’s not a browser bug, the developer can route it to the right place or make a code suggestion.
Does this reproduce with multiple screen readers in multiple browsers? There may be something you can adjust in your code, or your expectations may differ from standards and common practices.
How does this element’s accessibility properties and structure show up in browser dev tools?

Inspecting accessibility trees and properties in dev tools#section15

Major modern browsers provide dev tools to help you observe the structure of the accessibility tree as well as a given element’s accessibility properties. By observing which accessible objects are generated for your elements and which properties are exposed on a given element, you may be able to pinpoint issues that are occurring either in front-end code or in how the browser is mapping your content into the accessibility API.

Let’s suppose that we are testing this piece of code in Microsoft Edge with a screen reader:

<div class="form-row">
  <label>Favorite color</label>
  <input id="myTextInput" type="text" />
</div>

We’re navigating the page by form field, and when we land on this text field, the screen reader just tells us this is an “edit” control—it doesn’t mention a name for this element. Let’s check the tools for the element’s accessible name.

1. Inspect the element to bring up the dev tools.

Screenshot showing the Microsoft Edge dev tools inspecting an input element — The Microsoft Edge dev tools, with an input element highlighted in the DOM tree

2. Bring up the accessibility tree for this page by clicking the accessibility tree button (a circle with two arrows) or pressing Ctrl+Shift+A (Windows).

Screenshot showing the Microsoft Edge tools inspecting an input element with the Accessibility Tree panel open — The accessibility tree button activated in the Microsoft Edge dev tools

Reviewing the accessibility tree is an extra step for this particular flow but can be helpful to do.

When the Accessibility Tree pane comes up, we notice there’s a tree node that just says “textbox:,” with nothing after the colon. That suggests there’s not a name for this element. (Also notice that the div around our form input didn’t make it into the accessibility tree; it was not semantically useful).

3. Open the Accessibility Properties pane, which is a sibling of the Styles pane. If we scroll down to the Name property—aha! It’s blank. No name is provided to the accessibility API. (Side note: some other accessibility properties are filtered out of this list by default; toggle the filter button—which looks like a funnel—in the pane to get the full list).

4. Check the code. We realize that we didn’t associate the label with the text field; that is one strategy for providing an accessible name for a text input. We add for="myTextInput" to the label:

<div class="form-row">
  <label for="myTextInput">Favorite color</label>
  <input id="myTextInput" type="text" />
</div>

And now the field has a name:

Screenshot showing the Microsoft Edge tools inspecting an input element with the Accessibility Tree panel open, where the input's Name attribute now has a value — The accessible Name property set to the value of “Favorite color” inside Microsoft Edge dev tools

In another use case, we have a breadcrumb component, where the current page link is marked with aria-current="page":

<nav class="breadcrumb" aria-label="Breadcrumb">
  <ol>
    <li>
      <a href="/cat/">Category</a>
    </li>
    <li>
      <a href="/cat/sub/">Sub-Category</a>
    </li>
    <li>
      <a aria-current="page" href="/cat/sub/page/">Page</a>
    </li>
  </ol>
</nav>

When navigating onto the current page link, however, we don’t get any indication that this is the current page. We’re not exactly sure how this maps into accessibility properties, so we can reference a specification like Core Accessibility API Mappings 1.2 (Core-AAM). Under the “State and Property Mapping” table, we find mappings for “aria-current with non-false allowed value.” We can check for these listed properties in the Accessibility Properties pane. Microsoft Edge, at the time of writing, maps into UIA (UI Automation), so when we check AriaProperties, we find that yes, “current=page” is included within this property value.

Screenshot showing the Microsoft Edge tools inspecting an input element with the Accessibility Tree panel open, where the input's AriaProperties attribute now has a value of current=page — The accessible Name property set to the value of “Favorite color” inside Microsoft Edge dev tools

Now we know that the value is presented correctly to the accessibility API, but the particular screen reader is not using the information.

As a side note, Microsoft Edge’s current dev tools expose these accessibility API properties quite literally. Other browsers’ dev tools may simplify property names and values to make them easier to read, particularly if they support more than one accessibility API. The important bit is to find if there’s a property with roughly the name you expect and whether its value is what you expect. You can also use this method of checking through the property names and values if mapping specs, like Core-AAM, are a bit intimidating!

Advanced accessibility tools#section16

While browser dev tools can tell us a lot about the accessibility semantics of our markup, they don’t generally include representations of text ranges or event notifications. On Windows, the Windows SDK includes advanced tools that can help debug these parts of MSAA or UIA mappings: Inspect and AccEvent (Accessible Event Watcher). Using these tools presumes knowledge of the Windows accessibility APIs, so if this is too granular for you and you’re stuck on an issue, please reach out to the relevant browser team!

There is also an Accessibility Inspector in Xcode on MacOS, with which you can inspect web content in Safari. This tool can be accessed by going to Xcode > Open Developer Tool > Accessibility Inspector.

Diversity of experience#section17

Equipped with an accessibility tree, detailed object information, event notifications, and methods for interacting with accessible objects, screen readers can craft a browsing experience tailored to their audiences. In this article, we’ve used the term “screen readers” as a proxy for a whole host of tools that may use accessibility APIs to provide the best user experience possible. Assistive technologies can use the APIs to augment presentation or support varying types of user input. Examples of other ATs include screen magnifiers, cognitive support tools, speech command programs, and some brilliant new app that hasn’t been dreamed up yet. Further, assistive technologies of the same “type” may differ in how they present information, and users who share the same tool may further adjust settings to their liking.

As web developers, we don’t necessarily need to make sure that each instance surfaces information identically, because each user’s preferences will not be exactly the same. Our aim is to ensure that no matter how a user chooses to explore our sites, content is perceivable, operable, understandable, and robust. By testing with a variety of assistive technologies—including but not limited to screen readers—we can help create a better web for all the many people who use it.

No Comments

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.