Illustration: A small piece of paper is torn from a very large book with some glasses, a bookmark, and a steaming mug of something delicious nearby.
Illustration by

Everyday Information Architecture: Auditing for Structure

A note from the editors: We’re pleased to share an excerpt from Chapter 4 of Lisa Maria Martin’s Everyday Information Architecture, from A Book Apart.

Just as we need to understand our content before we can recategorize it, we need to understand the system before we try to rebuild it.

Article Continues Below

Enter the structural audit: a review of the site focused solely on its menus, links, flows, and hierarchies. I know you thought we were done with audits back in Chapter 2, but hear me out! Structural audits have an important and singular purpose: to help us build a new sitemap.

This isn’t about recreating the intended sitemap—no, this is about experiencing the site the way users experience it. This audit is meant to track and record the structure of the site as it really works.

Setting up the template#section2

First, we’re gonna need another spreadsheet. (Look, it is not my fault that spreadsheets are the perfect system for recording audit data. I don’t make the rules.)

Because this involves building a spreadsheet from scratch, I keep a “template” at the top of my audit files—rows that I can copy and paste into each new audit (Fig 4.1). It’s a color-coded outline key that helps me track my page hierarchy and my place in the auditing process. When auditing thousands of pages, it’s easy to get dizzyingly lost, particularly when coming back into the sheet after a break; the key helps me stay oriented, no matter how deep the rabbit hole.

Fig 4.1: I use a color-coded outline key to record page hierarchy as I move through the audit. Wait, how many circles did Dante write about?

Color-coding#section3

Color is the easiest, quickest way to convey page depth at a glance. The repetition of black text, white cells, and gray lines can have a numbing effect—too many rows of sameness, and your eyes glaze over. My coloring may result in a spreadsheet that looks like a twee box of macarons, but at least I know, instantly, where I am.

The exact colors don’t really matter, but I find that the familiar mental model of a rainbow helps with recognition—the cooler the row color, the deeper into the site I know I must be.

The nested rainbow of pages is great when you’re auditing neatly nested pages—but most websites color outside the lines (pun extremely intended) with their structure. I leave my orderly rainbow behind to capture duplicate pages, circular links, external navigation, and other inconsistencies like:

  • On-page navigation. A bright text color denotes pages that are accessible via links within page content—not through the navigation. These pages are critical to site structure but are easily overlooked. Not every page needs to be displayed in the navigation menus, of course—news articles are a perfect example—but sometimes this indicates publishing errors.
  • External links. These are navigation links that go to pages outside the domain. They might be social media pages, or even sites held by the same company—but if the domain isn’t the one I’m auditing, I don’t need to follow it. I do need to note its existence in my spreadsheet, so I color the text as the red flag that it is. (As a general rule, I steer clients away from placing external links in navigation, in order to maintain a consistent experience. If there’s a need to send users offsite, I’ll suggest using a contextual, on-page link.)
  • Files. This mostly refers to PDFs, but can include Word files, slide decks, or anything else that requires downloading. As with external links, I want to capture anything that might disrupt the in-site browsing experience. (My audits usually filter out PDFs, but for organizations that overuse them, I’ll audit them separately to show how much “website” content is locked inside.)
  • Unknown hierarchy. Every once in a while, there’s a page that doesn’t seem to belong anywhere—maybe it’s missing from the menu, while its URL suggests it belongs in one section and its navigation scheme suggests another. These pages need to be discussed with their owners to determine whether the content needs to be considered in the new site.
  • Crosslinks. These are navigation links for pages that canonically live in a different section of the site—in other words, they’re duplicates. This often happens in footer navigation, which may repeat the main navigation or surface links to deeper-but-important pages (like a Contact page or a privacy policy). I don’t want to record the same information about the page twice, but I do need to know where the crosslink is, so I can track different paths to the content. I color these cells gray so they don’t draw my attention.

Note that coloring every row (and indenting, as you’ll see in a moment) can be a tedious process—unless you rely on Excel’s formatting brush. That tool applies all the right styles in just two quick clicks.

Outlines and page IDs#section4

Color-coding is half of my template; the other half is the outline, which is how I keep track of the structure itself. (No big deal, just the entire point of the spreadsheet.)

Every page in the site gets assigned an ID. You are assigning this number; it doesn’t correspond to anything but your own perception of the navigation. This number does three things for you:

  1. It associates pages with their place in the site hierarchy. Decimals indicate levels, so the page ID can be decoded as the page’s place in the system.
  2. It gives each page a unique identifier, so you can easily refer to a particular page—saying “2.4.1” is much clearer than “you know that one page in the fourth product category?”
  3. You can keep using the ID in other contexts, like your sitemap. Then, later, when your team decides to wireframe pages 1.1.1 and 7.0, you’ll all be working from the same understanding.

Let me be completely honest: things might get goofy sometimes with the decimal outline. There will come a day when you’ll find yourself casually typing out “1.2.1.2.1.1.1,” and at that moment, a fellow auditor somewhere in the universe will ring a tiny gong for you.

In addition to the IDs, I indent each level, which reinforces both the numbers and the colors. Each level down—each digit in the ID, each change in color—gets one indentation.

I identify top-level pages with a single number: 1.0, 2.0, 3.0, etc. The next page level in the first section would be 1.1, 1.2, 1.3, and so on. I mark the homepage as 0.0, which is mildly controversial—the homepage is technically a level above—but, look: I’ve got a lot of numbers to write, and I don’t need those numbers to tell me they’re under the homepage, so this is my system. Feel free to use the numbering system that work best for you.

Criteria and columns#section5

So we’ve got some secret codes for tracking hierarchy and depth, but what about other structural criteria? What are our spreadsheet columns (Fig 4.2)? In addition to a column for Page ID, here’s what I cover:

  • URL. I don’t consistently fill out this column, because I already collected this data back in my automated audit. I include it every twenty entries or so (and on crosslinks or pages with unknown hierarchy) as another way of tracking progress, and as a direct link into the site itself.
  • Menu label/link. I include this column only if I notice a lot of mismatches between links, labels, and page names. Perfect agreement isn’t required; but frequent, significant differences between the language that leads to a page and the language on the page itself may indicate inconsistencies in editorial approach or backend structures.
  • Name/headline. Think of this as “what does the page owner call it?” It may be the H1, or an H2; it may match the link that brought you here, or the page title in the browser, or it may not.
  • Page title. This is for the name of the page in the metadata. Again, I don’t use this in every audit—particularly if the site uses the same long, branded metadata title for every single page—but frequent mismatches can be useful to track.
  • Section. While the template can indicate your level, it can’t tell you which area of the site you’re in—unless you write it down. (This may differ from the section data you applied to your automated audit, taken from the URL structure; here, you’re noting the section where the page appears.)
  • Notes. Finally, I keep a column to note specific challenges, and to track patterns I’m seeing across multiple pages—things like “Different template, missing subnav” or “Only visible from previous page.” My only caution here is that if you’re planning to share this audit with another person, make sure your notes are—ahem—professional. Unless you enjoy anxiously combing through hundreds of entries to revise comments like “Wow haha nope” (not that I would know anything about that).
Fig 4.2: A semi-complete structural audit. This view shows a lot of second- and third-level pages, as well as pages accessed through on-page navigation.

Depending on your project needs, there may be other columns, too. If, in addition to using this spreadsheet for your new sitemap, you want to use it in migration planning or template mapping, you may want columns for new URLs, or template types. 

You can get your own copy of my template as a downloadable Excel file. Feel free to tweak it to suit your style and needs; I know I always do. As long as your spreadsheet helps you understand the hierarchy and structure of your website, you’re good to go.

Gathering data#section6

Setting up the template is one thing—actually filling it out is, admittedly, another. So how do we go from a shiny, new, naive spreadsheet to a complete, jaded, seen-some-stuff spreadsheet? I always liked Erin Kissane’s description of the process, from The Elements of Content Strategy:

Big inventories involve a lot of black coffee, a few late nights, and a playlist of questionable but cheering music prominently featuring the soundtrack of object-collecting video game Katamari Damacy. It takes quite a while to exhaustively inventory a large site, but it’s the only way to really understand what you have to work with.

We’re not talking about the same kind of exhaustive inventory she was describing (though I am recommending Katamari music). But even our less intensive approach is going to require your butt in a seat, your eyes on a screen, and a certain amount of patience and focus. You’re about to walk, with your fingers, through most of a website.

Start on the homepage. (We know that not all users start there, but we’ve got to have some kind of order to this process or we’ll never get through it.) Explore the main navigation before moving on to secondary navigation structures. Move left to right, top to bottom (assuming that is your language direction) over each page, looking for the links. You want to record every page you can reasonably access on the site, noting navigational and structural considerations as you go.

My advice as you work:

  • Use two monitors. I struggle immensely without two screens in this process, which involves constantly switching between spreadsheet and browser in rapid, tennis-match-like succession. If you don’t have access to multiple monitors, find whatever way is easiest for you to quickly flip between applications.
  • Record what you see. I generally note all visible menu links at the same level, then exhaust one section at a time. Sometimes this means I have to adjust what I initially observed, or backtrack to pages I missed earlier. You might prefer to record all data across a level before going deeper, and that would work, too. Just be consistent to minimize missed links.
  • Be alert to inconsistencies. On-page links, external links, and crosslinks can tell you a lot about the structure of the site, but they’re easy to overlook. Missed on-page links mean missed content; missed crosslinks mean duplicate work. (Note: the further you get into the site, the more you’ll start seeing crosslinks, given all the pages you’ve already recorded.)
  • Stick to what’s structurally relevant. A single file that’s not part of a larger pattern of file use is not going to change your understanding of the structure. Neither is recording every single blog post, quarterly newsletter, or news story in the archive. For content that’s dynamic, repeatable, and plentiful, I use an x in the page ID to denote more of the same. For example, a news archive with a page ID of 2.8 might show just one entry beneath it as 2.8.x; I don’t need to record every page up to 2.8.791 to understand that there are 791 articles on the site (assuming I noted that fact in an earlier content review).
  • Save. Save frequently. I cannot even begin to speak of the unfathomable heartbreak that is Microsoft Excel burning an unsaved audit to the ground.  

Knowing which links to follow, which to record, and how best to untangle structural confusion—that improves with time and experience. Performing structural audits will not only teach you about your current site, but will help you develop fluency in systems thinking—a boon when it comes time to document the new site.

About the Author

Lisa Maria Martin

Lisa Maria Martin is an independent consultant based in Boston. She practices content-driven information architecture, helping organizations to understand, organize, and structure their web content for empowering user experiences. She is the managing editor of A Book Apart, as well as a writer, speaker, workshop facilitator, and poet.

No Comments

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

I am a creative.

A List Apart founder and web design OG Zeldman ponders the moments of inspiration, the hours of plodding, and the ultimate mystery at the heart of a creative career.
Career