Publication Standards Part 1: The Fragmented Present
Issue № 352

Publication Standards Part 1: The Fragmented Present

Since technically we all work in publishing, it makes sense to turn our collective attention to the technical and logistic challenges of ebooks. They are a new frontier, but it looks a lot like the old web frontier, with HTML, CSS, and XML underpinning the main ebook standard, ePub.

Article Continues Below

There are key distinctions between ebook publishing’s current problems and what the web standards movement faced. The web was founded without an intent to disrupt any particular industry; it had no precedent, no analogy. E-reading antagonizes a large, powerful industry that’s scared of what this new way of reading brings—and they’re either actively fighting open standards or simply ignoring them.

Currently, there are scant few resources to learn how to build ePub documents—the latest version of the ePub standard isn’t fully implemented on any modern e-reader. And the most popular e-reading platforms still encumber their work in DRM, which hurts sales and hinders long-term archiving.

Everybody suffers from our current system. Publishers won’t make as much money because of piracy, consumers find it frustrating to read on the terms they want, and writers lose money and exposure. It discourages quality long-form writing, hurts technological progress, and locks our knowledge away from future generations. It’s clear that we need something better—but to move forward, we must remind ourselves of where we came from.

The roles#section1

It’s important to know how publishing generally works. The web has affected publishing in many ways, but the publishing industry remains a massive beast, employing hundreds of thousands of people at every step of the process. Writers want to make a good product, they want to reach as many readers as possible, and they want to be paid for their time and effort. Publishers traditionally help edit writers’ work and use distribution to connect writers and readers. Readers consume the writing, of course. They want to find good writing, they want to read on their own terms—either physical books, on their smartphone, or using an e-reader—and they want affordability and easy access. We owe it to ourselves to understand the current landscape.

Readers#section2

Reading’s renaissance is shifting our expectations. Cameron Koczon said that content is freeing itself from context, empowering readers to act on their own terms. We add articles to Instapaper and Readability; we buy books in paper or Kindle form, we switch to “print view” so we don’t have to read articles on thirty separate pages.

Reading often involves conversation, too. We tweet about the writing we like. We highlight interesting passages and share them with our friends. We quote and reblog what we want to share.

The explosion in reading, and the freedom to read on one’s own terms, makes periodical publishers skittish about their primary revenue stream: advertising. More than ever before, writing is being given away: countless paid magazines put their articles’ full text online, and many other publications avoid print distribution entirely. Because so many outlets aren’t charging, often linking to each other with scant attribution, readers may become less willing to pay for writing if they can get similar work somewhere else, and writers get paid less. After accounting for inflation, the hourly and per-article rates in Writer’s Market have dropped 81.3% on average since 1991. This affects the kind of writing being published—and it affects publishers, who are squeamish about posting full RSS feeds. Instead, they link a summary to their advertising-filled webpages. Writers, publishers, and readers exist in a feedback loop—and when the standards of one group suffer, the other two decline, too.

Publishers#section3

The publishing industry is massive, and no article will provide a complete summary, but for our purposes it entails several primary roles:

  • Talent acquisition reaches out to prospective writers, encouraging them to write for the publication.
  • Editorial works closely with authors to ensure that the writing is as good as possible.
  • Layout and design creates book covers and selects type.
  • Marketing markets the book for the author by placement and advertising.
  • Distribution ensures the title appears in bookstores of all sizes.
  • Fulfillment houses store customer data and manage subscription magazine distribution.

Executives manage the process. Printers put the books together. Booksellers get the writing in front of readers, acting as the public face of the publishing industry. And through all that is a chain of paper, binding cloth, glue, and ink suppliers that creates raw materials for those books, and a chain of printers, binderies, and warehouses that assembles and ships them around the world.

The internet has disrupted the publishing industry’s institutions. The “big six” publishers (Hachette, Macmillan, Penguin Group, HarperCollins, Random House, and Simon & Schuster) are protecting their assets amidst declining author rates, fraught ebook pricing negotiations, fear of piracy, and the increase of self-publishing. Supporting so much overhead, they need more income than they can make right now.

Writers#section4

Historically, writers pitch their manuscripts to various publishers and receive a cash advance in exchange for distribution rights, editorial control, and intellectual property ownership. Writers have little control over layout, design, marketing, or distribution reach. Publishers manage finances and logistics on behalf of the author, so the writer can write the book without interference.

Writers used to have to go through the publishing industry to reach an audience. The web changed this. Everybody can publish a blog for free. Print-on-demand services give easy access to printed matter. Services like Newspaper Club allow us to publish broadsheets. Authors publish long-form essays as Kindle Singles, with no publishing industry in sight. Many authors have gone the end-to-end route, putting their projects up on Kickstarter to fund production and shipping. Designer Frank Chimero raised over $100,000 to publish his book; Kern and Burn, Wax Magazine, and Matter have all had successful projects. (Disclosure: I’ve launched two successful publishing projects over Kickstarter, Cadence & Slang and Distance; and I’ve backed many other publishing projects, including Frank’s and Kern and Burn’s.) Thanks to tools like Kickstarter, Square, Shopify, Fetch, Lulu, Blurb, and many others, it’s never been easier to get self-published work into the hands of readers profitably. All of these tools indirectly threaten the traditional publishing industry.

As DIY-esque hustle is infusing publishing and writers become publishers, they must make sure they’re making something great. Because writers have the tools and the readership, they’re relying less on traditional publishers—and they can make much more money than they ever could have through traditional publishers. The traditional publishing industry is starting to look more and more like the record industry did in the nineties.

Publishers are increasingly afraid of losing money. Writers are building their own industry. Everybody has their own interests in mind. This clash of perspectives results in the technical landscape we’ve inherited. But does everybody get what they really want? How do we benefit from what we have, and what can we do going forward?

Ebooks: the technical history#section5

Michael Hart created the first “ebook” in 1971, after he received access to a mainframe at the University of Illinois and transcribed the Declaration of Independence word for word. Shortly after, Hart founded Project Gutenberg, a public, free repository for long-form text. Project Gutenberg has since digitized thousands of novels in the public domain, and distributed them, largely in plain text, to millions of readers.

As technology became more sophisticated, so did ebooks and the issues surrounding them. TeX, a publishing standard now largely used in mathematical and scientific texts, was created in 1978. DocBook, an XML-based schema primarily meant for technical writing, was authored in 1991.

In the past two decades, HTML and CSS have worked with browsers to enhance layout, and PDF, a once-closed standard developed by Adobe in 1993, has afforded freedoms on par with print design, with additional security to protect commercial typefaces and private documents.

Initially conceived in September 2007, the open ebook standard ePub uses a combination of HTML, CSS, and XML wrapped up as a Zip file. The ePub format cherry-picks HTML and CSS features, though. The current ePub standard is 3.0, published in October 2011, and is built on HTML5 and CSS3—but no e-readers support it right now. The International Digital Publishing Forum (IDPF) governs its implementation. Apple, Amazon, Adobe, Barnes & Noble, Bowker (which controls the ISBN system in the United States), and many others act as paying members of the IDPF. Other open standards, like HPub and Zhook show a lot of promise and tend to be easier to implement than ePub, but they lack broad, cross-platform e-reader support.

Amazon, the largest ebook distributor, currently uses their own proprietary format, called Mobipocket (or Mobi for short), for the Kindle. An independent company developed Mobipocket in 2000; Amazon bought them in 2005. (Adobe has since implemented ePub export in InDesign, although its markup is not semantic and requires substantial cleanup to be standards-compliant.) Mobi’s capabilities are largely on par with ePub’s widely-implemented 2.0.1 standard. Many utilities allow conversion between Mobi and ePub, but there is one key difference: everything sold through the Kindle store must be encrypted and wrapped in DRM.

Publishing a book through Amazon’s Kindle store requires authors to convert the book with Kindle’s authoring tools, which mandates another workflow—and two copies to separately maintain.

Recently, Apple released a program called iBooks Author, allowing people to publish their own work and sell it on the iBookstore. iBooks Author’s exported format is similar to ePub 3, with the same technologies at play, but it does not comply with the ePub standard. And controversy flared up around the iBooks Author EULA, where authors’ rights were called into question, shortly after which Apple rewrote the EULA terms. iBooks Author doesn’t allow any file type to be imported, only exports books as PDF or plain text, and the iBookstore wraps all purchases, iBooks-authored or not, in DRM.

Semantic markup? Nope.#section6

Currently, no page layout software can export semantically accurate markup. Publishing a semantic document in ePub is only possible if you’re willing to write the code yourself. InDesign and Calibre support ePub export, but InDesign exports all styles as <span> tags, and Calibre is extremely hands-off and unforgiving in the way that it packages an ePub file. Jutoh is an outfit that takes InDesign files and creates semantic ePub documents for clients—but this band-aids the larger problem of not having the right tools. Wider ePub adoption is hobbled because good tools aren’t available to publishers and independent writers.

Publishers address publishing format fragmentation in different ways: some release an exclusive PDF or ePub or Kindle edition, targeting a specific format rather than designing for several. Others, like O’Reilly, bundle several versions together, so readers can fit the book to their reading context. But targeting a specific platform runs against the grain of current trends in web development, where we try to make one design fit many contexts. We have a long way to go before e-readers and publishers embrace standards and semantic markup.

The ebook landscape is broken#section7

While we’ve progressed by establishing technical norms for creating ebooks, the electronic publishing process needs improvement in important ways. There is no simple way to create a semantically correct ebook; web developers could help here, though (as of this writing) many don’t create ebooks. The IDPF is the ePub equivalent of the W3C, and while it has many paying members, none of them are required to follow the ePub standard in their own work, which leads to considerable fragmentation among platforms. Apple’s iBooks Author format exports a glorified fork of ePub with multimedia functionality; Amazon’s Kindle format, Mobipocket, uses HTML and CSS, but the standard differs significantly from ePub.

This fragmentation hurts publishers because most don’t understand markup, dedicated developers are often too costly for their projects, and the online documentation for digital publishing is pretty scant. There is no Stack Overflow equivalent for addressing ePub issues; no community has been formed to try and fix things yet. Writers suffer because DRM encourages piracy. Readers suffer because rampant platform lock-in prevents long-term ownership: for example, if you buy a book for a Kindle, it won’t run in iBooks. It’s extremely difficult to copy the Kindle files that you’ve bought if you wish to back them up, because they’re wrapped in DRM. If it’s simpler and more convenient to just buy the paper book, then the existing ebook model is broken.

Distribution is a challenge, too#section8

We’ve covered a little about how the publishing industry works and the ebook technical landscape, but the internet changes how books are being distributed and sold, and this is important.

The agency model#section9

The publishing industry sells physical books using the wholesale model. Publishers handle manufacturing, and supply large sellers and intermediary wholesalers who then distribute to smaller independent bookstores. In this case, there’s a suggested retail price, but the store’s final price can be whatever they like. (That’s why books cost so little through Amazon, and so much through independent stores: Amazon has the luxury of selling even long tail books without the cost of storing them in a warehouse waiting for a buyer, and they pass those savings on to their customers.) Although ebooks don’t need to be manufactured or bought in bulk, publishers were using the wholesale model to sell ebooks, making it possible for Amazon to offer ebooks at very low retail prices.

Digital publishing has brought about a new way of selling books, though, called the agency model. In this model, “agents” (Apple, Barnes & Noble, etc.) act as liaison between publisher and consumer, and they take a cut of the final retail price. For example, Apple takes a 30% cut of the retail price of books sold through the iBookstore. Since publishers have typically offered 50% discounts to wholesalers, a 30% agency commission leaves more potential profit for publishers. Publishers also get to set the final pricing, and since they are cutting out most of the supply chain, more of the consumer dollar comes back to them. Amazon refuses to adopt the agency model, instead wanting to sell ebooks at a fixed maximum price.

On the other hand, consumers don’t want to pay as much for an ebook as they do for the print version. Pricing implications have already played out in other formats. For instance, it’s usually cheaper to buy an album’s digital edition on the iTunes Store than it is to buy the same album on vinyl—and in the latter case, many new records come with free download codes for the digital equivalent. The trends in other industries will affect consumer expectations in publishing—and in small ways, it already has, with a movement geared to opening copyrighted works to the public, and the US Department of Justice suing all of the “big six” and Apple for price fixing (iBookstore prices differ widely from Amazon’s), which will likely give Amazon the upper hand in setting ebook prices in the long run.

This hurts publishers because they make less money, it hurts writers because they cannot share their passions through the largest outlets, it hurts readers because it could have chilling effects on the quality of writing in the long run, and it will almost certainly cement Amazon’s long-term dominance. Amazon is the only winner, and now that (as of this writing) three of the big six have settled out of court, it seems highly likely that it will come with the government’s legal backing.

Amazon is winning#section10

Publishers’ favorable agency model negotiations with Apple have, at times, come to a head with Amazon. Amazon fixes Kindle ebook prices to compete with Apple and welcome more consumers, and they refuse to operate according to an agency model. Instead, they wholesale their ebooks. On occasion, Amazon will pull all of the titles of major publishers who refuse to back down in such negotiations. For instance, during the writing of this article, the Independent Publishing Group pulled 5,000 titles when Amazon refused to renew their contract. (IPG is the United States’s second-largest independent publisher.) And the Educational Development Corporation pulled its own titles in mid-April, saying “Amazon is squeezing everyone out of business. They’re a predator. We’re better off without them.”

While Amazon is the largest physical bookseller, they still have competition in that sector. But when it comes to e-readers, the Kindle dominates. Although precise numbers are unavailable, Amazon’s overall market share is set to explode from 15% to 50% of all books sold period when you factor in the Kindle. Amazon’s DRM on the closed Kindle platform keeps customers locked in. They no longer consider using competing e-readers—which forces readers and authors to accept whatever pricing Amazon negotiates with publishers. In the long run, too, ebooks can go “out of print” with DRM, and there’s little incentive to maintain DRM authentication servers once the market moves on to a new standard.

Renting ebooks to libraries#section11

Libraries are the largest consumer of books in the country, but their relationship with ebooks has been fraught. Libraries usually buy a printed book for a set number of checkouts; once a book has been worn out, usually after ten years’ use, they buy another copy to replace it. This generates a relatively continuous revenue stream for publishers, especially when applied to popular mass-market books that are cheap to produce.

Because books physically degrade at different rates, and ebooks don’t, publishers are sensitive to the ebook distribution terms, lest they leave money on the table in the long run. Subsequently, DRM policies are different on consumer ebooks. Libraries can “lend” ebooks a fixed number of times before they have to buy them again, usually at significant markup, and many proprietary, cumbersome systems exist that require a specific e-reader. Usually, you can’t download library-lent ebooks to a Kindle, unless the library uses Amazon’s library system—which carries its own limitations. Card-carrying library members can preview some books remotely, but they can only view a limited number of pages. No solution addresses the issues around inter-library loan. Libraries use different systems for ebook lending, many of which don’t communicate effectively with one another—so if a publisher chooses one platform over another, many libraries will have to do without their titles in electronic form.

There are some innovators here, but they are few. The Internet Archive’s Open Library, for example, purchases physical books, digitizes them, and lends them out on a one-to-one basis. Some libraries are starting to cull their own physical archives now; they simply can’t keep old books around anymore. Obliging owners to pay again to use digital books means that those will also go out of circulation, just like our physical books do.  We have a long way to go for libraries to use ebooks meaningfully—which is likely imperative to their survival.

Centralization of distribution channels#section12

As various bookstores fold, reading—especially novels and long-form nonfiction—is increasingly centralized among a few large providers. Right now, the United States’s biggest players are Amazon, Apple, and Barnes & Noble, who comprise around 41% of the digital market.1

What are the ramifications of this? There will be fewer large businesses to negotiate with, and waning incentive for small publishers to organize and bargain. More than ever, we need a rallying cause to effect meaningful change, because writers and publishers have little recourse but to accept the terms of the large booksellers.

The road ahead#section13

The internet is disrupting many content-focused industries, and the publishing landscape is beginning its own transformation in response. Tools haven’t yet been developed to properly, semantically export long-form writing. Most books are encumbered by DRM, a piracy-encouraging practice long since abandoned by the music industry. In the second part of this article, I’ll discuss the ramifications of these practices for various publishers and propose a way forward, so we can continue sharing information openly, in a way that benefits publishers, writers, and readers.

Notes

  • 1. And lest this turn into a “death of mom and pop stores” argument, Borders remains the elephant in that room. Some of the big players are dying; they’re just doing so more slowly, and less perceptibly, than the little guys.

16 Reader Comments

  1. Great article, I am very pleased to see this here. Personally I think the ebook standards are coming from the wrong direction and it is hurting their development. I would like to see focus on HTML standards for publishing and use ePub as a wrapper. For example, many ePub readers are webkit based which gets us off to a great start but they use “CSS columns to implement pagination”:http://youtu.be/_FVs5FVmVVc?t=21m58s instead of an “HTML spec dedicated to paging”:http://people.opera.com/howcome/2012/reader/ . Building models for pagination and layout from HTML first then moving to ePub as a wrapper would greatly increase consistency as well as control over layout for publishers. I am very pleased to see publishers like “A Book Apart”:http://www.abookapart.com/ and “Pottermore”:http://www.pottermore.com/ publishing ePubs and other book formats independently of Amazon and other big stores however the tie-in DRM to these e-readers is a major problem. I encourage everyone who cares about this stuff to join the “digital publishing community group”:http://www.w3.org/community/digipub/

  2. Don’t have much to add, but anyone interested in ebooks should follow the hashtag #eprdctn on Twitter — so many bright people in the business of making ebooks get involved there.

  3. Great article, thank you for articulating these issues so well. I am currently making ebooks/ibooks…and hit up against a lot of this stuff. Having said that I am finding the opportunities here amazing and deeply satisfying. Our model involves curating the content for our books, so they are collections of articles, video and audio, and our focus is mainly on training, though I have a beautiful concept song/novel coming up. Working with authors has been fantastic, who seem genuinely grateful for the time time taken to create these books and make it all work. There are some really decent people out there doing this, #eprdctn definitely a great resource. Liz Castro – again amazing. I am pushing forward and trying to define what I do and how I do it as best as I can, and the greatest mainstay of this in my opinion is community.

    I look forward to Part 2 😉

  4. You did a great job of providing an intelligent, yet easily understood, breakdown of the struggles libraries are facing with ebooks. As the web developer for a public library this is something we are working to improve but there are a lot of barriers in front of us.

    Right now we have a petition at ebooksforlibraries.com and we’re working to open the lines of communication between publishers and libraries.

  5. > web developers could help here

    um… no.

    most emphatically not.

    it’s 2012, and in the last decade,
    web developers haven’t even managed
    to “fix” _the web_ yet… in fact,
    in too many ways, the web is just as
    screwed up as it used to be back in
    the old days of the “browser wars”;
    it’s just that today’s incompatibilities
    have migrated out to specialized areas
    of which the general public is unaware,
    because we’ve all had to give up and
    use templates, or hire costly “experts”.

    so please, just go away, and don’t try to
    “help” e-books. we’ll do fine without you.

    i’m serious. shoo!

    -bowerbird

  6. I wanted to correct a couple things:

    1. You do not have to use Amazon’s Kindle conversion software before uploading the file to the Kindle store. I use Calibre to convert from ePub to mobi and upload that file to Amazon.

    2. DRM is not required to upload to Kindle store. There’s an option that you can check on when uploading the book to decide where you wish to use DRM.

    Producing my content in HTML, I’ve been able to mostly streamline the conversion process into PDF, ePub and mobi formats. One format would be nicer, of course.

  7. Eep–Confirming what snookca says: You don’t have to use Kindle’s conversion software and you don’t need to use DRM. (Thank goodness!)

  8. @Scott Kellum: The split between ePub, HTML/CSS, and native apps is probably the biggest concern of our project.

    @BillLudwig: Nice. I’ve seen this petition before, and I “tweeted”:http://twitter.com/pubstn/status/197783603093053441 about it recently. You very much have our support; let me know if you need anything.

    @snookca, @amber: Oof, good catch. One follow-up: you download a non-DRM’d Kindle book from the Kindle store, is it possible for you to save the mobi locally for backup purposes?

  9. @Nick, while I don’t have a kindle, I’m told that yes, you can save the mobi files locally for backup purposes. You just can’t read the ones with DRM.

  10. In the world of STM (scientific, technical and medical), reference, and other publishing disciplines we solved this problem years ago. Single source authoring using structured markup – where one creates the text using markup that identifies content, context and structure – is the key. How rich is the markup? As rich as the content creator determines it needs to be. How is it output? In any format someone would like – through XSL transformations. What will the next new format be? It doesn’t really matter – my content, even if authored 20 years ago, can be output to the latest and greatest new format by creating a transform. One transform that can change 1 page, 1000 pages, 1,000,000 pages – as many as you might have in your archives – into something new. A dozen transforms can change the same source text into a dozen different formats, and all the while my text can be kept current without having to worry about making changes in a dozen places.

    Sure. You don’t want to learn XML or have to use an XML editor. That’s your choice. And it’s a choice you should consciously make.

    As someone who has been involved in the standards-making process for almost two decades, readers must understand that most standards are created by companies and individuals with competing interests (unless authored by a single company in which case it isn’t really a standard, is it?); there is always some level of compromise. Only standards created after the fact will match currently-available technology; it takes a while (sometimes years, depending on an application’s lifecycle) for application developers to implement what has been codified. And in the case of HTML5 we don’t even have that; we have two competing bodies (WHATWG and W3C) representing very different interests; the end result is that we don’t *have* a standard; instead we have an ever-evolving feature set which is why most standards referencing HTML5 include a fixed list of supported features (typically those that are supported by the vast majority of currently-available browsers).

    The upshot? We really don’t want a world where there is only one standard, one format that each and every one of us must use, no matter what our specific requirements might be. The world would be a very dull place and innovation would grind to a halt.

  11. > Layout and design creates book covers and selects type.

    Really, as a designer, you should know that designers do more for interiors than just “select type.” When we get to design interiors, we actually design them, which goes much beyond just selecting type.

  12. @xmlgeek: All fair points. XML is an admirable starting point for open, interoperable formats that can survive archival pretty well. I’m fond of using plain text as much as possible when considering the long term.

    But technical writing is a very different beast from complex page layout software. And while standard publishing may be a solved problem for the former, it is far from it for the latter. That is what this essay concerns — and its publication in A List Apart was in part because the same issues are addressed by modern web developers.

    To be clear, I’m not advocating ePub 3 as The One True Path where everyone must fall in line. ePub 3 is the best way to handle modern digital publishing concerns _right now_, and the best hope to spread across e-reading platforms _right now_. That will change someday. When that day comes, we’ll hopefully have conversion tools and an easy way to read older formats — and (to be heinously reductive about it) I reckon that’s roughly how the web plays out today.

  13. @Nick, I must disagree. We’ve been producing some very sophisticated page layouts from XML for quite some time. Whether you’re using XSL-FO, SDL’s XPP, importing into InDesign, or any other method of your choosing – it’s been being published from SGML/XML sources for a couple of decades.

  14. First time I read your blog, and it is really amazing. And actually you are tottally right “Since technically we all work in publishing”.

  15. bowerbird, web developers don’t manage the web, they’d like to, but it’s companies that do. However, being a web developer, there’s something to be said for a “screwed up” web that requires people to “hire costly experts”…

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA