Since technically we all work in publishing, it makes sense to turn our collective attention to the technical and logistic challenges of ebooks. They are a new frontier, but it looks a lot like the old web frontier, with HTML, CSS, and XML underpinning the main ebook standard, ePub.
There are key distinctions between ebook publishing’s current problems and what the web standards movement faced. The web was founded without an intent to disrupt any particular industry; it had no precedent, no analogy. E-reading antagonizes a large, powerful industry that’s scared of what this new way of reading brings—and they’re either actively fighting open standards or simply ignoring them.
Currently, there are scant few resources to learn how to build ePub documents—the latest version of the ePub standard isn’t fully implemented on any modern e-reader. And the most popular e-reading platforms still encumber their work in DRM, which hurts sales and hinders long-term archiving.
Everybody suffers from our current system. Publishers won’t make as much money because of piracy, consumers find it frustrating to read on the terms they want, and writers lose money and exposure. It discourages quality long-form writing, hurts technological progress, and locks our knowledge away from future generations. It’s clear that we need something better—but to move forward, we must remind ourselves of where we came from.
It’s important to know how publishing generally works. The web has affected publishing in many ways, but the publishing industry remains a massive beast, employing hundreds of thousands of people at every step of the process. Writers want to make a good product, they want to reach as many readers as possible, and they want to be paid for their time and effort. Publishers traditionally help edit writers’ work and use distribution to connect writers and readers. Readers consume the writing, of course. They want to find good writing, they want to read on their own terms—either physical books, on their smartphone, or using an e-reader—and they want affordability and easy access. We owe it to ourselves to understand the current landscape.
Reading’s renaissance is shifting our expectations. Cameron Koczon said that content is freeing itself from context, empowering readers to act on their own terms. We add articles to Instapaper and Readability; we buy books in paper or Kindle form, we switch to “print view” so we don’t have to read articles on thirty separate pages.
The explosion in reading, and the freedom to read on one’s own terms, makes periodical publishers skittish about their primary revenue stream: advertising. More than ever before, writing is being given away: countless paid magazines put their articles’ full text online, and many other publications avoid print distribution entirely. Because so many outlets aren’t charging, often linking to each other with scant attribution, readers may become less willing to pay for writing if they can get similar work somewhere else, and writers get paid less. After accounting for inflation, the hourly and per-article rates in Writer’s Market have dropped 81.3% on average since 1991. This affects the kind of writing being published—and it affects publishers, who are squeamish about posting full RSS feeds. Instead, they link a summary to their advertising-filled webpages. Writers, publishers, and readers exist in a feedback loop—and when the standards of one group suffer, the other two decline, too.
The publishing industry is massive, and no article will provide a complete summary, but for our purposes it entails several primary roles:
- Talent acquisition reaches out to prospective writers, encouraging them to write for the publication.
- Editorial works closely with authors to ensure that the writing is as good as possible.
- Layout and design creates book covers and selects type.
- Marketing markets the book for the author by placement and advertising.
- Distribution ensures the title appears in bookstores of all sizes.
- Fulfillment houses store customer data and manage subscription magazine distribution.
Executives manage the process. Printers put the books together. Booksellers get the writing in front of readers, acting as the public face of the publishing industry. And through all that is a chain of paper, binding cloth, glue, and ink suppliers that creates raw materials for those books, and a chain of printers, binderies, and warehouses that assembles and ships them around the world.
The internet has disrupted the publishing industry’s institutions. The “big six” publishers (Hachette, Macmillan, Penguin Group, HarperCollins, Random House, and Simon & Schuster) are protecting their assets amidst declining author rates, fraught ebook pricing negotiations, fear of piracy, and the increase of self-publishing. Supporting so much overhead, they need more income than they can make right now.
Historically, writers pitch their manuscripts to various publishers and receive a cash advance in exchange for distribution rights, editorial control, and intellectual property ownership. Writers have little control over layout, design, marketing, or distribution reach. Publishers manage finances and logistics on behalf of the author, so the writer can write the book without interference.
Writers used to have to go through the publishing industry to reach an audience. The web changed this. Everybody can publish a blog for free. Print-on-demand services give easy access to printed matter. Services like Newspaper Club allow us to publish broadsheets. Authors publish long-form essays as Kindle Singles, with no publishing industry in sight. Many authors have gone the end-to-end route, putting their projects up on Kickstarter to fund production and shipping. Designer Frank Chimero raised over $100,000 to publish his book; Kern and Burn, Wax Magazine, and Matter have all had successful projects. (Disclosure: I’ve launched two successful publishing projects over Kickstarter, Cadence & Slang and Distance; and I’ve backed many other publishing projects, including Frank’s and Kern and Burn’s.) Thanks to tools like Kickstarter, Square, Shopify, Fetch, Lulu, Blurb, and many others, it’s never been easier to get self-published work into the hands of readers profitably. All of these tools indirectly threaten the traditional publishing industry.
As DIY-esque hustle is infusing publishing and writers become publishers, they must make sure they’re making something great. Because writers have the tools and the readership, they’re relying less on traditional publishers—and they can make much more money than they ever could have through traditional publishers. The traditional publishing industry is starting to look more and more like the record industry did in the nineties.
Publishers are increasingly afraid of losing money. Writers are building their own industry. Everybody has their own interests in mind. This clash of perspectives results in the technical landscape we’ve inherited. But does everybody get what they really want? How do we benefit from what we have, and what can we do going forward?
Ebooks: the technical history#section6
Michael Hart created the first “ebook” in 1971, after he received access to a mainframe at the University of Illinois and transcribed the Declaration of Independence word for word. Shortly after, Hart founded Project Gutenberg, a public, free repository for long-form text. Project Gutenberg has since digitized thousands of novels in the public domain, and distributed them, largely in plain text, to millions of readers.
As technology became more sophisticated, so did ebooks and the issues surrounding them. TeX, a publishing standard now largely used in mathematical and scientific texts, was created in 1978. DocBook, an XML-based schema primarily meant for technical writing, was authored in 1991.
In the past two decades, HTML and CSS have worked with browsers to enhance layout, and PDF, a once-closed standard developed by Adobe in 1993, has afforded freedoms on par with print design, with additional security to protect commercial typefaces and private documents.
Initially conceived in September 2007, the open ebook standard ePub uses a combination of HTML, CSS, and XML wrapped up as a Zip file. The ePub format cherry-picks HTML and CSS features, though. The current ePub standard is 3.0, published in October 2011, and is built on HTML5 and CSS3—but no e-readers support it right now. The International Digital Publishing Forum (IDPF) governs its implementation. Apple, Amazon, Adobe, Barnes & Noble, Bowker (which controls the ISBN system in the United States), and many others act as paying members of the IDPF. Other open standards, like HPub and Zhook show a lot of promise and tend to be easier to implement than ePub, but they lack broad, cross-platform e-reader support.
Amazon, the largest ebook distributor, currently uses their own proprietary format, called Mobipocket (or Mobi for short), for the Kindle. An independent company developed Mobipocket in 2000; Amazon bought them in 2005. (Adobe has since implemented ePub export in InDesign, although its markup is not semantic and requires substantial cleanup to be standards-compliant.) Mobi’s capabilities are largely on par with ePub’s widely-implemented 2.0.1 standard. Many utilities allow conversion between Mobi and ePub, but there is one key difference: everything sold through the Kindle store must be encrypted and wrapped in DRM.
Publishing a book through Amazon’s Kindle store requires authors to convert the book with Kindle’s authoring tools, which mandates another workflow—and two copies to separately maintain.
Recently, Apple released a program called iBooks Author, allowing people to publish their own work and sell it on the iBookstore. iBooks Author’s exported format is similar to ePub 3, with the same technologies at play, but it does not comply with the ePub standard. And controversy flared up around the iBooks Author EULA, where authors’ rights were called into question, shortly after which Apple rewrote the EULA terms. iBooks Author doesn’t allow any file type to be imported, only exports books as PDF or plain text, and the iBookstore wraps all purchases, iBooks-authored or not, in DRM.
Semantic markup? Nope.#section7
Currently, no page layout software can export semantically accurate markup. Publishing a semantic document in ePub is only possible if you’re willing to write the code yourself. InDesign and Calibre support ePub export, but InDesign exports all styles as
<span> tags, and Calibre is extremely hands-off and unforgiving in the way that it packages an ePub file. Jutoh is an outfit that takes InDesign files and creates semantic ePub documents for clients—but this band-aids the larger problem of not having the right tools. Wider ePub adoption is hobbled because good tools aren’t available to publishers and independent writers.
Publishers address publishing format fragmentation in different ways: some release an exclusive PDF or ePub or Kindle edition, targeting a specific format rather than designing for several. Others, like O’Reilly, bundle several versions together, so readers can fit the book to their reading context. But targeting a specific platform runs against the grain of current trends in web development, where we try to make one design fit many contexts. We have a long way to go before e-readers and publishers embrace standards and semantic markup.
The ebook landscape is broken#section8
While we’ve progressed by establishing technical norms for creating ebooks, the electronic publishing process needs improvement in important ways. There is no simple way to create a semantically correct ebook; web developers could help here, though (as of this writing) many don’t create ebooks. The IDPF is the ePub equivalent of the W3C, and while it has many paying members, none of them are required to follow the ePub standard in their own work, which leads to considerable fragmentation among platforms. Apple’s iBooks Author format exports a glorified fork of ePub with multimedia functionality; Amazon’s Kindle format, Mobipocket, uses HTML and CSS, but the standard differs significantly from ePub.
This fragmentation hurts publishers because most don’t understand markup, dedicated developers are often too costly for their projects, and the online documentation for digital publishing is pretty scant. There is no Stack Overflow equivalent for addressing ePub issues; no community has been formed to try and fix things yet. Writers suffer because DRM encourages piracy. Readers suffer because rampant platform lock-in prevents long-term ownership: for example, if you buy a book for a Kindle, it won’t run in iBooks. It’s extremely difficult to copy the Kindle files that you’ve bought if you wish to back them up, because they’re wrapped in DRM. If it’s simpler and more convenient to just buy the paper book, then the existing ebook model is broken.
Distribution is a challenge, too#section9
We’ve covered a little about how the publishing industry works and the ebook technical landscape, but the internet changes how books are being distributed and sold, and this is important.
The agency model#section10
The publishing industry sells physical books using the wholesale model. Publishers handle manufacturing, and supply large sellers and intermediary wholesalers who then distribute to smaller independent bookstores. In this case, there’s a suggested retail price, but the store’s final price can be whatever they like. (That’s why books cost so little through Amazon, and so much through independent stores: Amazon has the luxury of selling even long tail books without the cost of storing them in a warehouse waiting for a buyer, and they pass those savings on to their customers.) Although ebooks don’t need to be manufactured or bought in bulk, publishers were using the wholesale model to sell ebooks, making it possible for Amazon to offer ebooks at very low retail prices.
Digital publishing has brought about a new way of selling books, though, called the agency model. In this model, “agents” (Apple, Barnes & Noble, etc.) act as liaison between publisher and consumer, and they take a cut of the final retail price. For example, Apple takes a 30% cut of the retail price of books sold through the iBookstore. Since publishers have typically offered 50% discounts to wholesalers, a 30% agency commission leaves more potential profit for publishers. Publishers also get to set the final pricing, and since they are cutting out most of the supply chain, more of the consumer dollar comes back to them. Amazon refuses to adopt the agency model, instead wanting to sell ebooks at a fixed maximum price.
On the other hand, consumers don’t want to pay as much for an ebook as they do for the print version. Pricing implications have already played out in other formats. For instance, it’s usually cheaper to buy an album’s digital edition on the iTunes Store than it is to buy the same album on vinyl—and in the latter case, many new records come with free download codes for the digital equivalent. The trends in other industries will affect consumer expectations in publishing—and in small ways, it already has, with a movement geared to opening copyrighted works to the public, and the US Department of Justice suing all of the “big six” and Apple for price fixing (iBookstore prices differ widely from Amazon’s), which will likely give Amazon the upper hand in setting ebook prices in the long run.
This hurts publishers because they make less money, it hurts writers because they cannot share their passions through the largest outlets, it hurts readers because it could have chilling effects on the quality of writing in the long run, and it will almost certainly cement Amazon’s long-term dominance. Amazon is the only winner, and now that (as of this writing) three of the big six have settled out of court, it seems highly likely that it will come with the government’s legal backing.
Amazon is winning#section11
Publishers’ favorable agency model negotiations with Apple have, at times, come to a head with Amazon. Amazon fixes Kindle ebook prices to compete with Apple and welcome more consumers, and they refuse to operate according to an agency model. Instead, they wholesale their ebooks. On occasion, Amazon will pull all of the titles of major publishers who refuse to back down in such negotiations. For instance, during the writing of this article, the Independent Publishing Group pulled 5,000 titles when Amazon refused to renew their contract. (IPG is the United States’s second-largest independent publisher.) And the Educational Development Corporation pulled its own titles in mid-April, saying “Amazon is squeezing everyone out of business. They’re a predator. We’re better off without them.”
While Amazon is the largest physical bookseller, they still have competition in that sector. But when it comes to e-readers, the Kindle dominates. Although precise numbers are unavailable, Amazon’s overall market share is set to explode from 15% to 50% of all books sold period when you factor in the Kindle. Amazon’s DRM on the closed Kindle platform keeps customers locked in. They no longer consider using competing e-readers—which forces readers and authors to accept whatever pricing Amazon negotiates with publishers. In the long run, too, ebooks can go “out of print” with DRM, and there’s little incentive to maintain DRM authentication servers once the market moves on to a new standard.
Renting ebooks to libraries#section12
Libraries are the largest consumer of books in the country, but their relationship with ebooks has been fraught. Libraries usually buy a printed book for a set number of checkouts; once a book has been worn out, usually after ten years’ use, they buy another copy to replace it. This generates a relatively continuous revenue stream for publishers, especially when applied to popular mass-market books that are cheap to produce.
Because books physically degrade at different rates, and ebooks don’t, publishers are sensitive to the ebook distribution terms, lest they leave money on the table in the long run. Subsequently, DRM policies are different on consumer ebooks. Libraries can “lend” ebooks a fixed number of times before they have to buy them again, usually at significant markup, and many proprietary, cumbersome systems exist that require a specific e-reader. Usually, you can’t download library-lent ebooks to a Kindle, unless the library uses Amazon’s library system—which carries its own limitations. Card-carrying library members can preview some books remotely, but they can only view a limited number of pages. No solution addresses the issues around inter-library loan. Libraries use different systems for ebook lending, many of which don’t communicate effectively with one another—so if a publisher chooses one platform over another, many libraries will have to do without their titles in electronic form.
There are some innovators here, but they are few. The Internet Archive’s Open Library, for example, purchases physical books, digitizes them, and lends them out on a one-to-one basis. Some libraries are starting to cull their own physical archives now; they simply can’t keep old books around anymore. Obliging owners to pay again to use digital books means that those will also go out of circulation, just like our physical books do. We have a long way to go for libraries to use ebooks meaningfully—which is likely imperative to their survival.
Centralization of distribution channels#section13
As various bookstores fold, reading—especially novels and long-form nonfiction—is increasingly centralized among a few large providers. Right now, the United States’s biggest players are Amazon, Apple, and Barnes & Noble, who comprise around 41% of the digital market.1
What are the ramifications of this? There will be fewer large businesses to negotiate with, and waning incentive for small publishers to organize and bargain. More than ever, we need a rallying cause to effect meaningful change, because writers and publishers have little recourse but to accept the terms of the large booksellers.
The road ahead#section14
The internet is disrupting many content-focused industries, and the publishing landscape is beginning its own transformation in response. Tools haven’t yet been developed to properly, semantically export long-form writing. Most books are encumbered by DRM, a piracy-encouraging practice long since abandoned by the music industry. In the second part of this article, I’ll discuss the ramifications of these practices for various publishers and propose a way forward, so we can continue sharing information openly, in a way that benefits publishers, writers, and readers.