You should put the same care into marking up your PDFs that you put into marking up websites
Issue № 201

Facts and Opinions About PDF Accessibility

PDF files on the web are
sometimes annoying and very often unnecessary. But when they aren’t either of those things, we need to make them accessible for the same reasons we make other web content accessible.

Article Continues Below

Contrary to popular opinion – and also contrary to quasi-judicial claims in some places – PDF documents can be no less accessible than HTML. While this may be a shocking revelation, it is nonetheless true. This article will explain how PDF does and does not support accessibility.

Summary#section1

      

  1. Most PDFs on the web should be HTML.
  2.   

  3. Some documents really should be PDFs.
  4.   

  5. You can add XML-like tags to give structure to a PDF.
        

    1. Tags weren’t available until a recent upgrade to the PDF file format.
  6. Most screen readers in common use can read PDFs.

    1. Screen readers had to be upgraded to understand tags.
    2.   

    3. Screen readers have been continuously updated throughout their history, and even today some screen readers cannot handle parts of the HTML spec.
  7. Even an untagged PDF can be accessible if you’re using the right technology.
  8. Posting a PDF online with no HTML alternative does not automatically constitute discrimination.

Dedications#section2

Let me dedicate this discussion to two nations that struggle under the yoke of lies and misunderstandings concerning PDF accessibility.

  1. First, to the people of Australia, whose federal human-rights body, the Human Rights and Equal Opportunity Commission (HREOC), takes the official position that posting a document online only as a PDF is inaccessible, hence a violation of the Disability Discrimination Act.

    I learned this the hard way from Bruce Maguire. He vanquished Juan Antonio Samaranch in a hearing with that selfsame HREOC over the inaccessibility of the Sydney Olympics website, and he now works for the HREOC. Maguire monologuized endlessly about the topic when I met him in 2004. When I got a word in edgewise, Maguire was able to agree, as most of us will, that HTML is the preferred format, but he also suggested Microsoft Word as an alternative to PDF. This may tell you all you need to know about HREOC’s understanding of accessibility and interoperability.

    I told Maguire then and I’m telling everyone now: Don’t believe the HREOC if they come after you claiming your PDFs are inaccessible, hence illegal. If you created your PDFs incorrectly, HREOC may be right, but only sometimes, and there are many cases where your PDFs may be just fine.

    If you get hauled in front of HREOC for “illegal, inaccessible” PDFs, consider this article a case for the case for the defense, to paraphrase Christopher Hitchens. (See discussion below.)

  2. Second, to the people of Canada, where the federal government’s guidelines for its own websites – known euphemistically as Common Look and Feel, and the source of great pain – state falsely that PDF is “not directly accessible to persons with (primarily) visual impairments” and that only “minimum version 2.1” should be used. Oddly, there is no such thing as a PDF version 2.1.

    The Alliance for the Equality of Blind Canadians, a lobby group that definitely is not the Canadian National Institute “for” the Blind, passed a resolution at its 2005 annual meeting stating that “Portable Document Format (PDF) continues to provide barriers to blind, deaf-blind and partially sighted people and their enabling technologies; [t]herefore, be it resolved that the AEBC advocate that PDF not be used as a standard for providing documents on all websites.” Well, who was suggesting that in the first place?

A view of the landscape#section3

Let’s begin at the beginning and discuss the entire landscape of PDF on the web before we learn what makes a PDF accessible or inaccessible.

The first thing to do with a PDF#section4

…is Google the URL. Seriously. Most of the time, Google does a half-decent job of making a PDF readable in HTML. Poor character encoding, ill-constructed multicolumn PDFs, and document security features can prevent Google from indexing a PDF at all, or doing so readably. Nonetheless, that’s what I do first.

PDF is overused#section5

There are too many PDFs on the web. Most every PDF should be something other than PDF. Any simple text-and-graphics document that is typeset in a single column should be provided as an ordinary HTML+CSS+JavaScript web page. Really, I can’t think of any exceptions for simple documents.

There aren’t many categories of online document that really should be PDFs and nothing else. And the list has decreased by one in the last year, since presentation slides can now be adequately handled by Eric Meyer’s S5 method. (Hence I have no excuse anymore for publishing my own presentation slides in PDF, so I’m going to stop.)

But if your document is one of the following, PDF may be fine:

      

  1. Footnoted, endnoted, or sidenoted, since there is no way to mark up any of those structures in HTML. (You can use a hack like sub or sup for the footnote reference, but there are no footnote, endnote, sidenote, or even note elements. That hack may be adequate for simple footnoted documents, but try rendering David Foster Wallace’s footnotes-within-footnotes in HTML 4.)
  2.   

  3. An interactive form, since PDF interactivity can do more than HTML can. (Use with caution and only if HTML really cannot do what you want.) For examples, check Jeremy Tankard’s order forms, especially for TypeBookOne (PDF).
  4.   

  5. A multimedia presentation, since later versions of PDF can truly embed multimedia rather than simply refer to or call multimedia, as HTML does. (Same warning as above.) PDF multimedia can include captions and/or audio descriptions.
  6.   

  7. Combined accessible and inaccessible versions. A typical case is a scan of a historical document that also includes live text. (You really need that live text. The Smoking Gun’s scanned court documents wouldn’t pass muster here.) Another example – one that is legal in Canada under a copyright exemption – is a sign-language translation inside or alongside a written text or audio recording.
  8.   

  9. Custom-crafted solely for printing. I really mean that, and not a document so badly designed that people have no choice but to print it out because reading onscreen is so tedious. Your service-bureau files, if they are on the web at all, can stay PDFs.
  10.   

  11. Designed for annotation and round-trip travel: If you’re posting something to elicit comments, which are then sent back to you, PDF has useful structures that HTML doesn’t.
  12.   

  13. A type specimen, which are all but impossible to create in HTML, unless the specimen involved is a “typeface” like Arial.
  14.   

  15. A sample of a format that cannot be rendered in a browser (e.g, Illustrator or Photoshop documents) or can only be rendered unsatisfactorily (CAD drawings where GIF and JPEG don’t have enough resolution). (In theory you could use SVG for CAD, but SVG remains mostly theoretical, doesn’t it?) This case also includes PDF files meant as samples of PDF files.
  16.   

  17. A record of a document’s state at a specific moment. In this context, PDF is useful as a preservation format even for HTML web pages.
  18.   

  19. A document in a language whose script has no satisfactory support in web browsers. This example must be used with caution: In 2005, there aren’t many “minority” languages that cannot be rendered in a browser. Perhaps this case must be limited to scripts that have not been encompassed by Unicode (of which there are several). This can also be a subset of the type-sample case if your PDF is meant as an illustration or documentation of the writing system used by a language.
  20.   

  21. Mathematical, since even MathML cannot render certain notations.
  22.   

  23. Documents with a legally restricted format, like U.S. tax forms.
  24.   

  25. Documents with digital rights management, which everybody hates and which has likely accessibility barriers. (The use of 128-bit encryption with PDF is compatible with screen readers.)
  26.   

  27. Multicolumnar, particularly if figures and illustrations are included, since multicolumn web layouts are a mere hack and are unreliable as a method of reproducing print layouts. (Your multicolumn document should be HTML if it is presented that way merely to save paper and it can work as a single column. It can be difficult to distinguish that case from a document that is structurally multicolumnar, and this category is somewhat iffy.)

So let me say the same thing one more time: If your document isn’t one of those or isn’t truly exceptional in some way, it’s your responsibility to do what the rest of us did and learn how to use HTML, CSS, and JavaScript correctly. Use those technologies unless you have an airtight reason not to.

PDF is not Acrobat, or even Adobe#section6

Let’s get something else out of the way: Acrobat isn’t PDF and PDF isn’t Acrobat. Many programs can display PDFs other than Acrobat. GSview is a popular choice and works on Windows, Mac, OS/2, and Linux. Other options:

  • On Windows: Jaws PDF Editor (not the screen reader)
  • On Mac OS X: Preview, GraphicConverter, Safari, and OmniGraffle
  • On Linux or Unix: OpenOffice
  • On PocketPC and Windows CE: Primer PDF Viewer
  • On Symbian: PDF+
  • On Amiga (!): Apdf

And many programs can create PDFs other than Acrobat. Nearly every application on Mac OS X can save a PDF, for example. Utilities to export to PDF without Acrobat on platforms like Windows and Linux are too numerous to mention, but sites like VersionTracker list them. Moreover, you don’t even need application software to create a PDF; some elite developers directly write their own native PDF files.

Versions#section7

Your Acrobat version number and the PDF file format version number are two different things. It may come as a surprise that PDF actually has a version number, but it does. There is no single “PDF” format any more than there’s a single HTML format.

      

  • The latest version of the Adobe PDF file format is Version 1.6 (released November 2004).
  •   

  • Each new version has added a few features, many of them structural. PDF tags, which are rather important for accessibility, were added in PDF 1.4.
  •   

  • PDF versions for archiving (PDF/A) and “exchange of print-ready pages”  (PDF-X1a) are already ratified or in the process of ratification by standards bodies.
  •   

  • There’s a working group to define an “accessible” PDF format (I’m on it) and another for engineering.

The easiest way to keep things straight is to take the current Acrobat release number, subtract 1, and put the result after the decimal point. Acrobat 7 can read PDF 1.6 documents, for example.

Proprietary vs. open#section8

You can’t make the categorical statement that PDF is “proprietary” and HTML is “open.” The World Wide Web Consortium copyrights its specs, for example, though with reasonable usage terms. Adobe publishes its versions of the PDF format, and has done so since Version 1.3.

I’ve never understood the objection that Adobe could change the PDF format overnight and render your documents useless. That objection applies to the mysterious Microsoft Word file format, but not here. (Word’s XML schemas have been published, but I’m not talking about those. Microsoft’s PR apparatchiks in the U.S. and Canada promised to get back to me about the actual state of disclosure of the Word file format, but never did.) PDF specs are published, and any of your documents that comply with the published spec will remain unchanged when the spec is updated. Just as a document validated against HTML 3.2 remained unchanged after XHTML 1.1 came out, your PDF 1.4 documents (for example) will continue to work into the indefinite future.

The entire discussion of proprietary vs. open is bogus. The relevant distinction is between published and secret. PDF and HTML are both published formats. End of story.

Multiple formats need to be accessible#section9

The goal of the accessibility advocate is to improve accessibility for people with disabilities, period. We’re not interested in making only HTML web pages accessible. The entirety of web content is our purview, and that includes formats like PDF and indeed Flash. (Same goes for multimedia.)

To draw a historical analogy, when we in Canada and the U.S. managed to get one or two open-captioned TV shows on the air in the 1970s, we didn’t stop there. We invented a closed-captioning system that could be applied to all programs (and the Europeans reused their existing teletext system for the same purpose). Then we made sure that home videos and laserdiscs (remember those?) were captioned. Then we started releasing a very few prints of open-captioned movies.

Then we figured out a way to add audio descriptions for the blind to television programs. Then we hacked a method to produce described home videos. Then we developed closed-captioning and -description systems for first-run movies (and new systems for open captioning there). Then we used closed captions, subpictures, and audio tracks to make DVDs accessible. Then we developed methods to caption and describe online video. We kept up with technology and made each new format accessible.

Even if you never create PDFs yourself, I’m sure you will admit that it is necessary for this widespread format to be accessible for the same reasons we made the widespread format of HTML accessible.

We’re not just talking about blind people#section10

And keep in mind that accessibility is not about making things work fine for blind people and no one else. Everybody falls into that trap at one time or another – the Web Accessibility Initiative included. In PDF accessibility, notable additional groups include deaf and hard-of-hearing users and people with learning disabilities. Motor or dexterity impairment becomes an issue in scrolling a PDF.

Keep this in mind the next time someone complains that a certain PDF is inaccessible because he or she couldn’t get it to work with a certain version of Jaws. We’re not working just for you.

Content vs. user agent#section11

To finish this preliminary discussion, we need to understand the interaction between content and “user agents,” the latter being a term for browsers, media players, and other devices that present content to the user.  As web authors, we’re so concerned with HTML and CSS that we either forget the role of the user agent completely – or we forget it until it bites us in the arse, as with CSS bugs in browsers. We tend not to notice the fact that web content is (in almost all cases) rendered by a browser; the user agent becomes invisible.

But because we have to switch to another program most of the time to read a PDF, we are suddenly reminded that a user agent is actually in play. This, too, is a source of confusion between Acrobat and PDF. The interaction of user agent and PDF content is as important as it is with HTML, as we’ll see; it’s just that we’re more conscious of that interaction.

The complaint that you have to use a “special program” to read a PDF document is bogus. You’re already using a special program to read an HTML document. It’s just that you use that program so much it no longer seems special.

Tags and structure#section12

As with HTML, what makes PDFs “robust,” reformattable, and otherwise accessible to many people with disabilities is structure. An unstructured data format like a JPEG picture is hard to make accessible, at least to a blind person, but wrap it in the HTML structure of an img element and suddenly accessibility becomes a real option.

A PDF is a database of different data types. You can include a wide range of text, graphics, and multimedia formats inside a PDF, a fact that led to a common misunderstanding that PDFs are glorified pictures. (They certainly can be, but they aren’t necessarily.) There really was no such thing as a structure to PDF until tags were introduced in PDF 1.4. It is OK to call them tags and not elements.

PDF tags are XML-like and will be immediately understandable to anyone with HTML knowledge.  Many tags are functionally equivalent to analogues in HTML, such as P, headings (including a generic, unnumbered Heading element), and Figure (image). But some of those tags have more features than their analogues in HTML. For images, you’ve got three levels of replacement text – “actual text,” useful for text rendered as an image, a drop capital, or an illuminated manuscript; “alternate text,” exactly as in HTML; and “title,” also as in HTML. You can and still should declare a language for your PDF document, just as with HTML.

PDF tags are extensible and you can create your own. However, there’s a predefined set.

A key difference here is that you cannot just fire up a text editor to add tags to your document, as you can with (X)HTML. Currently, you need to use application software, very much including Acrobat, to add the tags; because PDF is a binary format, that is unlikely to change.

As with semantic HTML, a tagged PDF can be reused and reformatted, since the application software knows what you meant by the data in the document. It knows that this text is a headline and this other text is a paragraph, so the software can, for example, reflow your text from a two-column to a single-column document. Reflow turns your multicolumn PDF into a zoom layout.

In general, we can say that a PDF is probably accessible if it is tagged. As with HTML, you can tag things improperly or unsemantically, though there is no concept of valid tagging with PDF. The mere presence of tags does not guarantee accessibility, because you might be using them wrong, but the absence of tags guarantees that the PDF itself is not accessible. Note the emphasis on itself; this is not the end of the story.

The user agent’s job#section13

Here is where the user agent comes in. Just as web browsers have had to be engineered, at vast effort and expense, to cope with tag-soup HTML, Acrobat in particular has had to be engineered to cope with real-world PDFs. It’s a bigger problem, since few PDFs are tagged and the free-form database structure of PDF lacks even the quasi-structure of tag-soup HTML.

In this case, the user agent overcomes the inaccessibility of the content, and that’s how it should be even with HTML: The entire chain from author to reader has to be accessible, and any link in the chain can take up the slack. In accordance with the classic advice to be strict in what you produce and lenient in what you accept, if the PDF author creates inaccessible content, the reader software should try to fix it.

  • Acrobat versions since 4.05 (with a Windows-only plug-in) have been at least adequately competent some of the time in making “inaccessible” PDFs functionally accessible.
  • Acrobat 5 and later can infer a reading order and reflow text.
  • Acrobat 6 and later can read text out loud on Windows and Macintosh, functioning as a de facto screen reader.

Note that this discussion mostly relates to blind or learning-disabled readers. A deaf person might just read the document with no trouble. Mobility or dexterity impairment is also involved here; later Acrobat versions can autoscroll a document without having to tediously click or actuate a scrollbar using your slow adaptive technology.

Thus it is impossible to state that even an untagged PDF is inaccessible. Acrobat (or some other program) may be able to use artificial intelligence to hack its way through a document to make it adequately accessible. If someone complains that your PDF isn’t accessible, you need to ask them what program they’re using to read it. Given that Adobe Reader (né Acrobat Reader) is free for Windows, Mac OS, and Linux and has all the accessibility features listed above, and is, moreover, compatible with many screen readers, it is a bit of a stretch to say that an untagged PDF could not possibly be read accessibly. Some PDFs will be inaccessible even with a really good reading program, but many will work adequately well.

Screen readers#section14

Another recurring complaint in PDF accessibility –  also bogus – is that screen readers either cannot handle PDFs or require costly upgrades to handle them.

All leading screen readers in use on Windows can read PDFs, including Jaws, Window-Eyes, IBM Home Page Reader, and Hal.

Remember Bruce Maguire? His presentation at the Web Essentials 2004 conference in Sydney – in whose lunchroom he and I talked – stated the following:

The PDF format has become widely used for making documents available on Web pages. Despite considerable work done by Adobe, PDF remains a relatively inaccessible format to people who are blind or vision-impaired. Software exists to provide some access to the text of some PDF documents, but for a PDF document to be accessible to this software, it must be prepared in accordance with the guidelines that Adobe have developed. Even when these guidelines are followed (and there are 32 pages of them), the resulting document will only be accessible to those people who have the required software and the skills to use it. Many blind or vision-impaired people do not have the financial freedom to spend the $1,000+ typically required to upgrade their screen-reader software to take advantage of the latest accessibility features. Requiring a user to upgrade to this extent in order to read a standard document is like designing Web content presentation in such a way that most people will have to buy a new computer in order to read it. Clearly, this is not a reasonable approach to the discharge of a government’s social responsibility to provide relevant information to its citizens. In any case, some of the PDAs used by blind people have no facilities for accessing PDF files.

Let’s unpack these objections.

      

  • Preparing PDFs “in accordance with the guidelines that Adobe has developed” is in no way different from preparing HTML pages in accordance with the guidelines the W3C has developed. (Want to print those out? It’s a 34-page PDF.)
  •   

  • It’s not like PDFs are the only item on your computer for which you require software and skills. You require both of those to surf the web and use HTML pages.
  •   

  • It’s false to claim that blind people are “typically required” to pay “$1,000+” to upgrade their screen readers. Some devices that can read PDFs aloud are free, like Adobe Reader. Nobody is requiring an expensive software upgrade.
  • Let’s look at the upgrade prices for Windows screen readers. Assumptions: All prices in U.S. dollars; you already own a copy of a screen reader that cannot read PDFs (increasingly unlikely).
        

    1. A Jaws for Windows Software Maintenance Agreement gives you the two subsequent releases for $180 or $260. Hence an upgrade to a PDF-capable version might be “free” under this plan.
    2.   

    3. A similar scheme provides for three upgrades of Window-Eyes for $299.
    4.   

    5. Upgrading Home Page Reader from any Version 2.5 or 3 release to Version 3.04 is free. Otherwise you buy the whole package again for a discounted price of $79.
    6.   

    7. Upgrading from V5 to V6 of Hal costs $160 or $220.
  • “PDAs used by blind people” need to be upgraded if they don’t understand PDF. Essentially, this objection boils down to “if it doesn’t work with what I’ve already got, it doesn’t work, period.” I guess time does not march on for these people. In that case, I hope you’re enjoying HTML 2.0 and your Geocities homepage.

The assumption seems to be that blind people only ever or only can or only must use Windows, and, as we all know, Windows screen readers are overpriced. Well, yes, they are, but you the blind person have many options now for computer accessibility.

Of course you may have to update your Windows screen reader and of course that might cost you money. You can’t complain that PDF is inaccessible (as it was for many years) and then act as though the problem hasn’t been addressed. Adobe rewrote the PDF spec to include tagging for accessibility, and, just as with any improved technology, your screen reader had to be upgraded to handle features that never existed before. You asked for something new and you need something new to make it work.

Most of the time when I run across this complaint, it strikes me as a peevish attempt to cling to the discredited idea that PDF isn’t accessible and Adobe (in particular) doesn’t care about the problem. Really, we need these complainers to grow up and face facts.  You can’t ask for a format to be upgraded to include accessibility and then complain that your own software has to be upgraded.

If you don’t want to fork over the money for a Windows screen reader, you can use Mac or Linux. VoiceOver on Mac OS X 10.4 Tiger can read PDF 1.5 files and earlier, though not always very well. The Sun accessibility package for Linux (part of Solaris 10), which is free of charge, includes built-in screen reading. There’s now a version of Adobe Reader 7 for Linux, though it doesn’t have speech output.

If you’re really concerned about cost, install the free Linux software or install Tiger on a used Mac. (Actually, a new Mac Mini without monitor costs less than a new license for Jaws.) If you think that Windows-screen-reader makers are overcharging, complain to the makers or vote with your feet. The actual issue – PDF accessibility – is being handled.

Let’s be consistent about screen-reader flaws#section15

Also, if you’re going to complain about how long it’s taking screen readers to handle PDF, even though that problem is behind us, let’s look at how well screen readers handle HTML.

The reality is that HTML is a stable standard that screen readers have had a long time to get right. (HTML 4.01 was published in 1999, XHTML 1.0 in 2000 [revised 2002], XHTML 1.1 in 2001.) But in reality, screen readers are still catching up. How is this different from PDF support? It isn’t, except in one way: It’s worse, because HTML has been around longer. PDF support went from nothing to pretty good in the space of two years, while screen readers are still moping along barely able to handle the full HTML spec.

If you’re trying to suggest that the combination of PDF-plus-screen-reader is a problem, what happens if HTML-plus-screen-reader is also a problem? The complaint that screen readers have trouble with PDF and no trouble with HTML is false both ways. Why don’t we hear any complaints about having to upgrade screen readers to handle HTML?

Let’s look at the evidence.

Jaws
Version HTML support added
Version HTML support added
4.01
  1. headings
  2. longdesc
  3. accesskey
  4. onmouseover and onclick
  5. table headers

4.02

  1. table headers that contain links or graphics (whoops!)
  2. fieldset
  3. legend
  4. empty accesskey
  5. [a]ccented characters in English when they follow immediately after a link”

6.0

scope (and that’s apparently it)

Window-Eyes
Version HTML support added
Version HTML support added
4.0 (apparently)
      

  1. frames
  2.   

  3. tables with headers

4.5 and later

  1. accesskey
  2. abbr, acronym
  3. title (and alt [sic]) on form elements
  4. headings
  5. language codes
  6. lists
  7. longdesc
  8. object
  9. q, blockquote
  10. thead, tbody, tfoot while navigating through table cells; th
  11. select frames by document title or frame name, title, or longdesc
  12. title on most, if not all, elements (also alt)
IBM Home Page Reader
Version HTML support added
Version HTML support added
2.5
  1. caption, headers, and summary for tables
  2. alt for images and areas
  3. title and longdesc for images
  4. noframes and noscript
  5. apparently also understands “paragraphs, table cells/rows/columns, headings… lists, forms, maps, and select menus”
  6. “HPR supports HTML 4.0 including common deprecated elements, but not SMIL, CSS, MathML, or DOM
  7. does not support accesskey
  8. [s]tylesheets not supported in this release” (!)

3.04

  1. label for select
  2. optgroup
  3. legend for fieldset
  4. title attribute for abbr and acronym
  5. cite attribute for blockquote
  6. ruby (!)
  7. meta refresh
  8. Flash objects and the title for object
  9. frame title and longdesc
  10. table summary
  11. disabled attribute for form controls
  12. mouse and keyboard event handlers (such as onmouseover, onclick, onkeypress)
  13. start attribute for ol
  14. readonly attribute for input and textarea
  15. accesskey
  16. tabindex
  17. maxlength attribute for input
  18. “unsupported lang attributes [handled] by providing a setting to announce, but not speak, text which requires speech engines not available with HPR, such as Greek”

Notes:

  1. IBM and GW Micro have a habit of ritually destroying release notes for previous versions when new ones come out. Here, Window-Eyes release notes were gathered mostly through Internet Archive documents. A review of HPR 2.5 was used.
  2. Jaws 5.0 is not listed above. It was documented only in rambling audio files (14 MB .exe). Apparently it added support for lists and blockquote (used for “indentation,” the recordings tell us).
  3. HTML support in Hal is difficult to ascertain even after Dolphin Computer Access sent me the various release notes. Version 6.51 fixed a Flash problem and a problem with an offscreen-positioned skip-navigation link (marginally relevant to spec support); version 6.03 announced, hence recognized, links, frames and headings, also abbr, acronym, and label.

So you see, HTML support in screen readers has evolved and is still evolving. But all of a sudden when screen readers had to be upgraded to handle PDF, some critics pretended that such upgrades were unreasonable and unique. You’ve been upgrading your screen readers all along just to handle HTML documents using specifications that are up to six years old.

Authoring tools#section16

Where PDF accessibility falls down embarrassingly is with “authoring tools,” the software used to create PDFs. Only a few programs can natively create a tagged PDF file, including InDesign; PageMaker 7.0 (!); FrameMaker 6.0 and later; and Microsoft Office with an Adobe export plug-in (Office 2000 and later only, Windows only). Products that use PDFlib 6.0 and later can produce tagged PDFs. There may be a few other minor utilities here and there.

The average person, however, will be faced with touching up an untagged or poorly-tagged original. You pretty much have no choice but to use the tagging function built into Adobe Acrobat (the full version, not just the Reader, and for some functions you need the Pro version). There are already a few not-very-helpful tutorials on tagging with Acrobat, and, at the risk of disappointing my readers, I’m not going to write another one, as life is too short. However, the basics of what you have to do are easy to state:

      

  1. Open your PDF.
  2.   

  3. The Description pane of the Document Properties screen (File menu) will tell you if the document is tagged or not.
  4.   

  5. If it isn’t, dismiss that screen. Go to the Advanced menu and choose Accessibility → Add Tags to Document.
  6.   

  7. Run a full accessibility check from that same menu.
       Tags palette
      
  8. If the checker reports any problems, open the little-known Tags palette (View → Navigation Tabs → Tags).Use the disclosure triangles to step through your document’s new tag structure. You’re better off if you select Highlight Content from the palette’s Options menu, as Acrobat will then draw a hard-to-see border around the object whose tag you select.

To handle the most common problems:

      

  • If Acrobat complains that your document lacks a language specification, find the topmost tag in your document (immediately within the self-referential Tags tag). Right- or Ctrl-click it and select Properties. Select a language from the pop-up menu in the Language field, or type your own two-letter language code.
  •   

  • For images lacking a text equivalent, do something similar, except you have to manually locate the Figure element that lacks the text equivalent. Context-click the Figure, select Properties, and fill in Alternate Text (exactly like alt in HTML) or Actual Text (for a picture of text).
    Properties screen

    (There is a semi-automated way to find all Figures without text equivalents. If you run Advanced → Accessibility → Full Check and select “alternate descriptions are provided,” Acrobat will find all the figures without text equivalents and provide you links to them in its report.)

  • A document from a printed source may contain “artifacts” like headers and footers that you never want screen-reader users to hear. You can context-click on those items (which may be deemed Figure, Part, P, or something else) and Create Artifact, which will cause Acrobat and compliant screen readers to ignore them when voicing. (You can also use the Touch-Up Reading Order tool to select the artifact on the actual page and mark it as Background.)

If this task seems tedious, it is, and it’s also quite inaccessible to many people with disabilities. Remember, we are not working toward a web in which nondisabled people create content for disabled people; people with disabilities must also be creators.

Acrobat is an unusual program in that it must arguably comply both with the Authoring Tools Accessibility Guidelines and with the User Agent Accessibility Guidelines, because you can create and view content using Acrobat. (And PDFs themselves are subject to Web Content Accessibility Guidelines; they will be covered in WCAG 2.0, which is expected to be technology-neutral.) Acrobat and PDF are not fully compliant with any of those guidelines, but few things are – and, when it comes to ATAG, nothing is.

Conclusion#section17

PDF accessibility is not as straightforward as HTML accessibility. But we need to stand up to the untruths that are spoken about PDF, especially since many of those untruths come from authorities with the power to find authors guilty of discrimination.

PDF accessibility is OK some of the time when it’s handled by competent authors with what few tools are available. All of those components need improvement, but let’s not pretend we don’t already have the power to create accessible PDFs. We do.

Acknowledgements#section18

      

  • Jacques Distler
  •   

  • Andy Dulson
  •   

  • Loretta Guarino Reid
  •   

  • Phill Jenkins
  •   

  • Greg Pisocky
  •   

  • Ted Padova

52 Reader Comments

  1. Joe, that’s an excellent piece of work – thanks!

    The main reason we are using PDFs is because “they are cheaper to create than HTML versions”. And yet, they are not tagged, nor tested for accessibility. Even worse – some of the PDFs are really just massive pictures of text.

    At least you’ve given me some hope of resolving the “PDF crisis” we have here. Step 1, tackle PDFs that shouldn’t be PDFs. Step 2, make sure the hand-authored PDFs are tagged. Step 3, see if we can’t replace FOP with PDFLib. The FOP documentation clearly states it does not generate tagged PDFs. Otherwise, start talking with the FOP guys and see if we cant start work there.

  2. A few clarifications regarding PDFs and the Canadian Common Look and Feel standards:

    First off, a minor point: the correct URL for the CLF (Common Look and Feel) reference is “www.tbs–sct.gc.ca/clf–nsi/inter/inter–01–02_e.asp”:http://www.tbs-sct.gc.ca/clf-nsi/inter/inter-01-02_e.asp

    Also, it should be clearly stated that the CLF specification is now almost 6 years old. In 1999, when the Government of Canada was drafting their specifications for Federal web “publishers”, the then current practice was to “print” text articles, spreadsheets, etc. as PDF (one button publishing) and then post these files to the web. (There was also a tendency to convert PowerPoint presentations to “web presentations”? and then dump them upon an unsuspecting and helpless public). Thus when the specifications were being written, better to err in favor of the non-mainstream then to continue to foster this type of detritus on the masses — in 1999 PDF *was* “not directly accessible to persons with (primarily) visual impairments”?.

    It is true, and you have illustrated with depth and research, that today PDFs can be made more accessible if they are done properly. However, as your article points out, much care and “hand finessing”? is required today to ensure that these documents are accessible. Will these obstacles be over-come? Sure, some day, but as your article also points out, more often than not the final output can, and should be, presented in a format other than PDF — *and* *I* *truly* *hope* *that* *the* *readers* *of* *your* *article* *remember* *that* *more* *than* *anything* *else*.

    Next, you neatly side-step the very real issue of PDF’s incompatibility with the current Web Content Accessibility Guidelines, which state clearly: “Priority 2: 11.1”:http://www.w3.org/TR/WCAG10/wai-pageauth.html#tech-latest-w3c-specs “Use W3C technologies when they are available and appropriate for a task and use the latest versions when supported.”? For many developers working within the confines of these “Guidelines”? (that have become, sadly, pseudo-standards — even though they were never written to be such), PDF is not a W3C technology. Is it “open”? — yes, but is it W3C sanctioned? No. But time changes everything”¦

    Finally, however, a thank-you. Well written, well researched, and relatively opinion neutral — exactly the kind of dialogue needed today.

  3. *Absolutley* *stellar* *article*!

    I work for a state government agency where I’m just a minor web coding droid, but I try my best to make my code accessable as much as I possibly can.

    Accessability is touted as an important issue here but usually is just paid lip-service in practice mostly because, in my opinion, no one really understands the issue or the limits of the technology – particularly when it comes to PDF documents. I’ve been adding as much header info to all my PDF’s as I possibly could for the last few years in spite of the hassle I get from our IT folks that it’s unecessary or a waste of time. I don’t think they’ll be saying that when we change over to a content management system later, but what do I know.

    I was just recently given permission to use Acrobat 7.0 and noticed several new and useful features, but I was unaware of the tagging tools! I’m going to be spending whatever free time I have to studying this feature and I’m going to bookmark, print, read and forward this article to as many folks as I can!

    What a great article to launch this _*incredible*_ redesign of A List Apart!

    Thanks!

  4. John, the requirement to use W3C technologies when appropriate is merely chauvinism on the part of the World Wide Web Consortium. We’ll use whatever accessible format we like, thank you very much.
    Nonetheless, I’m sure you noticed that I gave what I think is an exhaustive list of the circumstances when using the non-W3C format of PDF *is* appropriate. So that’s been handled.

  5. Thanks you for a fair and smooth running document. I believe it deserves more than one reading.

    By the way, I can’t tell you if I like the new design better than the old design. To me, the audience, they both worked. I never felt lost, or bored, or impatient, when searching for something … or just cruising. I guess there are less choices and more whitespace. I like that.

    Have a nice day. Guess I better pull out my FrameMaker 7 manual ….

    The audience thanks you.

  6. First off, thanks for a wonderful article. My comments are both pro-PDF and con-PDF.

    Re *PDF is overused*:

    May I nominate:

    15. Word documents with tables that people have dragged around to look nice. They maintain the proper number of columns when converted to PDF, but become unusable hash with extra and mismatched columns when converted to HTML.

    We used to spend hours trying to fix these. Now I’m a team of one instead of a team of three, and I couldn’t survive if I had to fix these documents.

    16. Documents that are so huge that when saved as Word HTML our other authoring tools can’t clean them up.

    Re *We’re not just talking about blind people*:

    And of course, there are low-vision folks who need Reader’s zoom feature (which is, nevertheless, clunky for viewing wide tables).

    Re *Content vs. user agent*:

    There is the issue that if the PDF opens in the browser window, the browser’s Back button no longer has the expected effect when a user mistakenly uses it partway through a PDF document and suddenly finds themselves out of the document instead of elsewhere in the document. This is similar to the AJAX problem.

    Re *Authoring tools*:

    One is not even home free with the tagging tools provided for Office 2000+. People need to be trained to produce clean source documents in the first place. I’ve run into the following problems, which apply both to saving as HTML and to converting to PDF:

    1. Office is set by default (changeable under Tools > AutoCorrect) to assign styles based on your formatting. This sometimes results in a table cell in the middle of a table being marked as

    or some such. Ideally, your IT department would disable this option as part of their installation procedure.

    2. Excel spreadsheets pasted into Word documents may be output as an image. The only way to ensure they stay as a table is to select the desired cells and paste into Word using Ctrl-V, *not* Paste Special. The downside of this is that sometimes the layout of the table will get messed up, but it will be semantically sound. (And let’s not forget folks who spread their column headings across three spreadsheet rows instead of using Wrap, or who put the title of the spreadsheet in the first row instead of in Page setup.)

    3. Some folks align tabular data using tabs instead of Word tables. This produces inaccessible data non-tables. Word’s Text to table can sometimes fix this, but there is often pre-and-post cleanup involved.

    4. Some folks align data in tables by adding extra table columns instead of pressing Ctrl-Tab to indent. This produces hard-to-navigate tables, requiring tedious manual cleanup effort.

    5. As mentioned above, people drag table columns around, sometimes producing extra columns (problem for HTML only, not for PDF, so far as I know).

    6. Some people put the title of the table *inside* the table, making in the first row of the table. Word’s *Split Table*, followed by *Table to text* on the title only, can cure these.

    Hope this helps.

  7. **ahem**…love the look and feel of the site — but has anyone tried actually PRINTING anything? I get three pages of 6pt font and then a bunch of (mostly) blank pages.

    ALA seems to be slipping in its old age… 🙂

  8. Hey, I didn’t know that you could do forms in PDF until I read this article, but I don’t really see why/when I should use PDF over regular HTML. The example you give – TypeBookOne – appears to be a series of ordinary text boxes (and asking for a credit card number over an unencrypted connection to boot). Why not use an HTML form?

    A good article nonetheless. I just hope it will be used by thoughtful developers to produce accessible pdf documents where appropriate, rather than by lazy developers to just dump their Word docs to the web.

  9. Sadly, yes, Chris, Jeremy Tankard’s _TypeBookOne_ example was less fabulous than his other PDFs, like his old order forms, which auto-calculated your total for you. I opted to leave it in and simply be ashamed.

  10. I support Tommy Olsson’s efforts to make the Web more universal, but it is not to be confused with Web accessibility for people with disabilities.

    Go to town, but just don’t use that term to describe it.

  11. A lot of the academic documents are written in LaTeX, and as a result, converted from PostScript to PDF.

    The majority of academic material is printed and used in the academic establishment… so it makes sense that the PDF format is used.

    However, when these academic papers get placed on the internet, they are placed online as PDF documents.

  12. Suggestion: XML and yesLogic for conversion to pdf.

    “SVG remains mostly theoretical, doesn’t it?”

    Whatever you say genius.

    “Footnoted, endnoted, or sidenoted, since there is no way to mark up any of those structures in HTML.”

    XML. Warning: ‘PDF authoring software’ == Terrifying. Lame. Bad idea. Nail in coffin. Horrible Horrible. Misery.

    “An interactive form, since PDF interactivity can do more than HTML can.”

    Oh really? And I’d want that because…

    “A multimedia presentation, since later versions of PDF can truly embed multimedia rather than simply refer to or call multimedia, as HTML does.”

    Doesn’t PDF embed the ActiveX control of said ‘multimedia’ precisely (give or take) what HTML does? The point is mute anyway because there it is in the XML permitting conversion to all things.

    “Combined accessible and inaccessible versions. A typical case is a scan of a historical document that also includes live text. (You really need that live text. The Smoking Gun’s scanned court documents wouldn’t pass muster here.)”

    What are you smoking… this is easily done in HTML and is no argument for using PDF.

    I really can’t be bothered carrying on… The author shows such an arrogant misunderstanding of the W3Cs objectives, intentions, and technology that this article was obsolete even before it was written.

  13. Yeah, I’m so out of touch with what the W3C is doing that I’ve attended two face-to-face meetings and am an Invited Expert with the WCAG Working Group.

    Good call, _genius_.

    Now, does anybody have criticisms they’d like to levy that actually make sense and to which they have the integrity to sign their names?

  14. PDF files on the net are something of a problem for me and others I’ve known…
    While there is no excusing the fact that they can be a Good Thing (thx, Martha) if used appropriately, they are often unnecessary, as Joe describes.

    My mother has taken constant issue with the PDF plugin (she is the average internet user). The issue involved, I think, is that if you must use a PDF, make sure it is properly identified as PDF! I tend to shy away from sites where I am apt to run across non-declared PDF files, linked as regular old web pages.

    Good job o nthe article for putting things in perspective

  15. Joe,

    An excellent article indeed!

    Nevertheless, I believe that the reference of Word as an alternative option to PDF comes from the fact that Word is a more well known and used format amongst the masses. But perhaps, you could write about that?

    I know of people with visual impairments that usually copy the text from a website and pastes it into MS Word in order to increase font size or use it with a screen reader. Now if that is the best practice, hmmm I have my doubts. If Maguire were a bigger player in the Accessibility arena, it would be useful to know the context in which he commented about Microsoft Word and whether as a user or a member of the institution he works for.

    In any case, I am still to be bought into the idea of having a PDF as the sole conveyor of my content. There is an implication ons findability: if I am not mistaken, just Google indexes PDF files.

    Nevertheless, I will be doing more PDF files after reading this article.

    Cheers,

    Luis

  16. Thank you for a very nice re-introduction into a very wide-spread, yet developer-wise, unknown territory.

    We are using PDF in production, and generate up to 200 pdf documents a day, yet noone in the organisation seems to know anything about the PDF anatomy.

    This article is sure to be an eye-opener for many people around me, thanks a lot.

  17. _”I’m with Joe: accessibility is about people with disabilities. Full stop.”_

    Nonsense. Accessibility should mean “access for all”. If a local cinema has a ramp and a large automatically-opening entrance door for wheelchairs, but non-disabled people have to climb two flights of narrow winding stairs to get in (through a tiny door that sometimes doesn’t open easily and is always a tight squeeze), then the cinema is not what non-disabled people might class “accessible”. A poor example but hopefully you get the point. Accessibility doesn’t mean “it also works for disabled people”, but that it is “accessible” (as far as possible). Full stop.

    _”I know of people with visual impairments that usually copy the text from a website and pastes it into MS Word in order to increase font size.”_

    Why not use Firefox or Opera? Then they can enlarge the text.

    (From the main article:) _”You can add XML-like tags to give structure to a PDF.” / “PDF tags are XML-like.”_

    As far as I can tell, PDF tags are actual XML, not XML-like. I’ve learnt of two main reasons for them so far, but there may be others. One is to enable _importing_ of XML, the other _exporting_. This means you can create a PDF template, for example a design for a menu. When your prices or products change, you can update the XML and not have to redesign a whole new template, or spend ages editing an existing one. Exporting gives you a standard XML document containing all your text and links to images etc in a way that can be reformatted any way you like (eg: turned into HTML).

    I have followed the advice in the article to view tags, but it seems the Standard version of Adobe Acrobat 7 isn’t capable of some things mentioned. I only have a “Quick Check” option, and there’s no Tags palette! (There is, however in InDesign CS2.) So I was not able to view the tags after adding them to a PDF. As Joe rightly says “for some functions you need the Pro version”.

  18. “?I know of people with visual impairments that usually copy the text from a website and paste it into MS Word in order to increase font size.”?

    “Why not use Firefox or Opera? Then they can enlarge the text.”

    For the same reason quite a few women (and certain men) wear high heeled shoes when no bone in the human foot was designed for that kind of strain, certain designers prefer to use notepad to Dreamweaver and some people prefer to watch films at home than going to the cinema: people get used to the choices they make when they are in control.

    One could easily say, why people still read in Braille when they could have a freeware text-to-speech software installed in a Pentium II get all the classics in TXT format from the Guttenberg Project and spend the afternoons just listening to Shakespeare or Harry Potter instead.

    This is the beauty of being human after all, one can hardly distinguish an “a” from a “b” and still they use whatever they have in hand (and in their hard drives) to access the world. Within our limitations we adapt using the resources around us and this is a feat too precious to be weighed against the browser one uses to access the Web or by the technology one could be using instead.

    But yes, whenever another opportunity arises, I will be, again, the first one to advise, guide and help whoever needs the help (be them blind or just a novice user) and if they feel comfortable with the challenge, Opera will be the first addtion to their hard drives. But once again, that is just the cold part of the job; the technical elements become such nonsense when you see the face of self-accomplishment when one is able to reconnect with the world, write ‘silly’ poems, send an email, dream.

    After all accessibility in its nature is not a property of the digerati or an industry standard but more, a call for human beings in general, since the beginning of history.

  19. In my 3 years of creating so-called accessible PDFs, I have learned that, between the limited variety of PDF tags and the capabilities of the PDF readers (by that, I am refering to the Read Out Loud functionality of Acrobat or the use of JAWS to read PDFs), the accessibility of PDFs is no better than a text-only page containing the same text. Rather than waste a huge amount of space on this forum, I decided to post my own facts and opinions about PDF accessibility in my blog at http://pen-and-ink.ca/?p=40.

  20. I read Joe’s article with much interest. While it contains a lot that is very pertinent and interesting, I believe his aggressive, combative approach has resulted in a misrepresentation of the conversation he had with Bruce Maguire and the way the Human Rights and Equal Opportunity Commission (HREOC) in Australia operates.

    I introduced Joe to Bruce in the lunchroom at the WE04 Conference and was present during the conversation they had regarding PDFs. Joe portrays the conversation as monologue by Bruce with him trying to get a word in edgewise. Joe the shrinking violet unable to get a word in? Come on give me a break! To me it seemed more like an attack on Bruce by Joe, with Bruce desperately trying to get Joe to acknowledge his responses.

    This is how I remember the overall nature of the conversation: Joe was mainly interested in making the point that it is now possible to produce PDF documents that can be accessed by screen readers. Bruce agreed with this, but said that many PDF documents are not prepared correctly and a significant proportion of screen reader users in Australia do not use readers that can access even well prepared PDFs. (By the way, the reference to Microsoft Word was a side comment that Word was more accessible to most reader users than PDF).

    In essence, it was a classic dispute between a theoretician and a person who has to deal with something in a practical way on a daily basis. I knew the conversation was off the rails when Joe started talking about how if HREOC took someone to court over the use of PDFs, he would willing be an expert witness against Bruce (and HREOC) to say PDFs were accessible (no if, buts or maybes!).

    Protecting the rights of people with disabilities can be a difficult and unpopular task, and advances in technology often add another dimension to the problem. A quick, non-web example: In Australia, builders of large multi-story buildings are required to provide a lift (elevator) with buttons that are positioned so that they can be used by someone in a wheelchair. An increasing number of disabled people now have wheelchairs that can elevate the user up to the level of a standing adult. Does this mean that a person in a standard wheelchair can no longer claim they are being discriminated against if the buttons in all the lifts are too high for them to reach? Should we change the requirement to provide buttons at wheelchair height?

    It is not a question of what is theoretically possible or not, or a question of how much a piece of hardware or software might cost. Minimising the discrimination against people with disabilities is a question of basic rights. Unlike Joe, I do not believe the people of Australia “struggle under the yoke of lies and misunderstandings”? from HREOC about PDFs or any other matter.

    I know there is nothing like making a stir to attract a bit of attention. But, I feel it is sad that in his desire for the spotlight, Joe has found it necessary to attack someone who has done much in the fight to advance the rights of people with disabilities.

  21. In fact, Bruce did monologuize endlessly and I barely did get a word in edgewise. Joe is not a shrinking violet and tried many, many times to interrupt Bruce’s oft-inaccurate lecturing. Everything I wrote about HREOC and my Australian experience is true and accurate. It’s great that we have a slightly-differing eyewitness account, but I was the one trying to do the talking, not Roger.

    The issue of “my device doesn’t read your accessible PDF” is a separate question, one addressed in my article. I also specifically and factually addressed the true cost of software to read PDFs, which can be zero. Nobody has ever “minized” the cost of such software; Bruce and HREOC are guilty of the opposite, exaggerating it. I researched the facts and reported them.

    The fact that “many documents are not prepared correctly” is only one part of the puzzle, as it is for Web pages. I explained why in the article. An untagged PDF may still be accessible. The non-Web example is off-topic; standing-height wheelchairs are not readily available or cheap, while PDF readers are.

    I certainly *would* return to Australia to testify in a hearing, at which point the exact contentions of plaintiff and respondent would be discussed. I *didn’t* tell Bruce I would categorically argue that PDFs are accessible, because they are no more categorically accessible than HTML is. Bruce did, however, tell me they’d get in competing experts who said PDFs weren’t accessible. As I told him then and reiterate now, the difference is I’ll have the facts.

  22. The tendency for low-vision users to compensate for lousy page design by copying and pasting text into Word has been documented by at least two studies I’ve read, including one by Theofanos and Redish. Zoom layouts solve the problem. Or will, once retail sites start using them.

  23. It has come to my attention that my fact checking was incomplete: the true facts negate *some* of the conclusions I made and I appologize to any/all who had read my own opinions on PDF accessibility.

  24. Yes, I think PDF tags actually are XML, but I also think that they’re in a separate category from the XML inputs that were mentioned in the comment.

    Could one of my esteemed colleagues from Adobe clarify that, please?

  25. Hold on. I just noticed that Roger Hudson essentially accuses me of hogging some kind of limelight at the expense of poor downtrodden Bruce Maguire. Sadly, no!

    Do a Google search for Maguire vs. SOCOG and see whose name comes up. Bruce’s victory against the Sydney Olympics is a notable precedent in Web accessibility that I extensively documented — because it’s _important_. Far be it from me to “minimi[ze] the discrimination against people with disabilities.” When did I start doing that?

    Is it just barely possible, Roger, that Bruce shouldn’t coast on his reputation, that his friends shouldn’t mistake reputation for accuracy, and that I’m not trying to boost my own reputation?

    I just wrote an article about PDFs. If this is so great for my reputation, when is it gonna start getting me dates?

  26. Working in a bankruptcy law firm, I have to access the courts’ systems every day to get “archived” copies of documents associated with a particular case. I do so by clicking on a link in a very basic HTML 4.1 page (beside the point), and depending on the court, Adobe Acrobat eventually opens with the document I chose. For the most part, the documents that come up would be a nightmare to reproduce in HTML; row & column spanning cells, mandatory headers & footers, time sensitive dates, etc.

    As far as I can tell, your list of exempt PDF situations doesn’t include situations like the one I outlined above, at least not specificallly.

    Otherwise, great article. It always feels so good to prove the greedy nay-sayers wrong.

  27. Thank you very much for this detailled article. However, I want to remind you, that Safari only uses Preview’s functionality to display PDFs, i.e. it can’t be mentioned as it own PDF-reader.

    Best wishes,
    Philipp

  28. In response to the other users who had problems printing this great article, I had too had the same issue but resolved it by opening the page in Opera 8 and printing from there.

  29. No, I don’t know of any usability studies of PDFs, with or without disabled test subjects. We could certainly use those.

    The AFB advice is outdated by several years, but will no doubt be given credence indefinitely despite new facts. Perhaps the AFB should use a strength of the Web (immediacy) and update its document.

  30. So are you saying you wrote a long article about PDFs being accessible, yet you don’t know if disabled users can actually use PDFs that are designed to be accessible?

    Or are you saying you have no evidence that accessible PDFs are usable?

    What are you saying?

  31. bq. So are you saying you wrote a long article about PDFs being accessible, yet you don’t know if disabled users can actually use PDFs that are designed to be accessible? Or are you saying you have no evidence that accessible PDFs are usable?

    The author is not making either of those statments. It’s quite clear what the author is saying: he knows of no usability studies of PDFs; good ones would be welcome.

  32. Although many different applications allow simple creation of pdfs, sadly the internet is full of pdfs with no proper title and metadata. Although google makes up for this itself, if people add their own title and metadata they have a better say in how search results for their files appear (more so if they are using some other search engine, for instance for local use). Very few applications will allow you a route to view and change the title and decription and keywords, and in some cases the conversion process will take historical data from the file and use that.

    So yes, do look at your pdf as the result of a search, preferably with your local search engine (if you have one) as well as seeing what google says about it. You may well have to buy the full version Acrobat to edit the metadata (and you can use it to add XMP as well).

  33. C’mon. What is this?

    The article says PDFs are accessible. That means people with screen readers can use them.

    But you cannot cite a single piece of evidence from actual disabled users that they can use PDFs that are designed to be accessible.

    That was the point of the AFB white paper. Even though you can make a PDF technically accessible, in practice blind people still can’t use them, for a bunch of reasons.

    So, in the absence of any evidence that proves otherwise, everthing in this article is moot.

    Yes, you can make PDFs accessible, but why bother if the people you are doing this for can’t use them anyway?

    Get out of your ivory towers.

  34. PDF accessibility is just like Web accessibility: We’ve got a set of technologies and guidelines we use. By doing so, we have confidence that people with disabilities can understand and use our content. There have been very few (really indeed very few) usability studies of compliant Web sites, and none thath I know of pertaining to tagged PDF, for example. This doesn’t change the fact that PDF has accessibility features, or the related fact that many of the complaints about PDF that are advanced as reasons they’re inaccessible (e.g., “my device can’t read tagged PDF”), are off-topic. Rather like your complaints, Dominic.
    Link us to your curriculum vitae and I’m sure Adobe or others would consider hiring you to do such a study.

  35. By nature, I likes PDF.. this is a greate way to ensure that data will show exacly the same accross medias. What I hate is the browser support, they freeze a lot of browsers. I d’ont know if it’s just me but It gets me crazy when I loose my 20 opened tabs in firefox just because I didn’t notice that the link am clicking on was a PDF, wich I usually “save as” to read later in a decent PDF reader. I don’t know if it’s caused by malformed PDF tags or a bad browser implentation but IMO it happens way to often.

  36. I received a PDF recently and thought I’d check it for accessibility. I saw that ‘Add Tags To Document’ wasn’t greyed out in the Accessibility menu. But when I tried the menu option, the following error came up. Anyone got any ideas about this?

    _”Acrobat [7.0 Standard] was unable to make this document accessible because of the following error:_

    _Bad PDF; could not read page structure. (Bad PDF; error in processing fonts: bad font)[1]_

    _Please note that some pages of this document may have been changed. Because of this failure, you are advised to not save these changes.”_

    So I checked the Properties screen but couldn’t see any problems. There was a single font used – Courier. I also noted the company logo and a box of text on the page used Times New Roman. Trying to select the text revealed it was part of an image! No way to select, copy or read out the text there!

  37. In my job we process PDF job application forms for a leading website and we have to include db related fields into the PDF. Strangely its cheaper than HTML however the results are messy, file size is erratic and navigation is cumbersome.

  38. Finally, on the fourth try, I’ve figured out printer settings so I can print this fine article without losing the right-hand end of each line. Could you help? I’m going to be recommending it to a couple hundred EPA colleagues, and we like to save paper. :^) Thanks!

  39. I agree with Joe’s statement that “Some documents really should be PDFs.” Specifically, some RECORDS (locked-down documents with long-term value) should be PDF because PDF is an ideal preservation format, mainly because it is flatter than other formats. Preservation of HTML files is made easier with PDF, because HTML files, with their multi-media multi-layered nature, make it difficult to determine the boundaries of the record and use standardized metadata for the different layers and media. The problem of preserving context and record relationships doesn’t go away with PDF, though.

  40. I went to a media briefing on accessible PDFs hosted by the RNIB on Thursday 20th October 2005. One of the impressive demonstrations was the RNIB 2005 Financial Report, all done as an accessible PDF, including tables – which are regarded as being difficult to make accessible.

    Hugh Huddy – from the RNIB – did the demonstration of navigation a Balance sheet in a PDF document using Adobe Acrobat and Jaws. He navigated the balance sheet, as far as I could tell, in exactly the same manner you would if it were done in HTML. He switched into a tables reading mode, and had access to all the information in the table along with the relevant headers for the cells. Also demonstrated was navigating using links and header structure.

    Its not an ivory tower usability study, but a practical, and live, demonstration proving we are able to create accessible PDFs. I guess it proved PDFs – even complex ones – can be created to be accessible, and they can be accessible to screen readers.

  41. I’d just like to point out, with regards to Mike Davies comment above, that what he says is true about the outcome of the RNIBs briefing.

    What he doesn’t mention is that, after much struggling, the RNIB team of experienced developers had to bring in external consultants to enable them to properly tag their PDF files, as they just couldn’t do it..!!

    Does this really mean that PDFs are accessible? Surely that’s like saying a house without a roof will keep me dry in the rain, I just have to learn how to build a roof??

    On a more positive note, i’d love to see some independent user testing to see just how accessible properly tagged PDFs are, and also how the process of tagging can be made easier.

    My main issue with Joe’s article is that it’s now being touted around the UK public sector as validation for providing 90% of their web content in (un-tagged) PDFs, though no doubt Joe will be horrified to hear it.

  42. bq. What he doesn’t mention is that, after much struggling, the RNIB team of experienced developers had to bring in external consultants to enable them to properly tag their PDF files, as they just couldn’t do it..!!

    As I understood the comment, it wasn’t experienced developers that were having the problems, but their regular PDF publishers. Hugh opined that part of the problem was that either they were too afraid they would get it wrong, or too afraid to even try. Yes, the RNIB brought in some technical help.

    As I understood this, this is no difference to bringing in an accessibility consultant to help in-house web developers make their websites accessible. That does indeed happen.

    I’ve posted my notes on the event over at:
    http://www.isolani.co.uk/blog/access/RnibAccessiblePdfMediaBriefing
    As with all my live blogging attempts – mistakes are all mine.

  43. Coming in late here… just a clarification:

    I have a current complaint lodged with the HREOC and asked if the HREOC has a role in prosecuting, and the response given was:

    “HREOC is no longer a hearing commission, so we do not make determinations as to whether someone has been discriminated against or what the remedy should be. However, through our investigation and conciliation processes, HREOC may form a view as to whether or not there is an arguable case that discrimination may have occurred and we will express such a view to the parties, when appropriate.”

    So the HREOC will no longer crack down on anyone. (Which is sad when there’s a genuine grievance and the only option is to pay lawyers. No wonder I am in this situation: businesses know the little people can’t fight back.)

  44. I don’t know if this topic is related. I honestly could not look through all the threads. ButI was looking for some advice on how to create a zoom tool on a website, like alot clothing website have. To zoom in on the item. If anyone could help that would be greatly appreciated.

    Thank you.

    Mendy

  45. I have some web sites that we use PDF to grant more information about some subject when visitors ask.
    In this case I think it will be necessary and very useful.

    Thanks for your article and knowledge that you grant to me by reading it

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA