Building Books with CSS3

by Nellie McKessonJune 12, 2012

Published in CSS, Typography & Web Fonts, Workflow & Tools

While historically, it’s been difficult at best to create print-quality PDF books from markup alone, CSS3 now brings the Paged Media Module, which targets print book formatting. “Paged” media exists as finite pages, like books and magazines, rather than as long scrolling stretches of text, like most websites. CSS3 allows us to style text, divide it into book pages, and set the page structure as a whole. You can dictate the size of the book, header and footer content, how to display cross references and tables of contents, whether to add guides and bleeds for commercial printing companies, and more. With a single CSS stylesheet, publishers can take XHTML source content and turn it into a laid-out, print-ready PDF. You can take your XHTML source, bypass desktop page layout software like Adobe InDesign, and package it as an ePub file. It’s a lightweight and adaptable workflow, which gets you beautiful books faster.

Article Continues Below

XML, XSL, XHTML, and PDF processors#section2

As the publishing industry moves toward digital-centric workflows, there’s a need for scalability—repeatable processes and workflows that work at small and large scales. Creating a well-formatted printed book is no longer enough; publishers often need to release several different formats for every book: print, ePub for the iPad, Nook, etc., .mobi for the Kindle, and so on. The hardest jump is the one from print to ePub—you need to put plain text, often from multiple documents or text flows, and non-linear elements such as images, into a cleanly-tagged linear flow, and package it with a table of contents and instructions on how to tie together the various files that make up the book (see the ePub Wikipedia page to learn more about the extra special sauce that’s part of the ePub format). Even InDesign’s ePub output still needs a lot of extra cleanup after the fact, because of the non-linear nature of print page design.

For many years, XML has been one way to achieve a scalable multi-destination publishing model. XML offers a structured and standardized way to tag book content. It converts easily to XHTML, which is the foundation for digital books. It also comes with XSL-FO, which is markup that converts to print-quality PDF layout. XSL-FO has been both the gateway to XML-source publishing, and a major roadblock: it’s a powerful language for structuring pages and formatting XML files, but it’s also intricate, unapproachable, and not very well known. However, by using XSL-FO and a PDF processor like Antenna House or Prince to read the markup, publishers can use a single XML source to flow neatly into XHTML and ePub and also to produce fully laid-out, print-quality PDFs.

With the combination of major PDF processors and paged media features in CSS3, XML- and XHTML-based publishing can move away from XSL-FO to tap the vast and talented web design community. PDF processors Antenna House 6.0 and Prince 8.0 come with built-in CSS support, along with a slew of their own CSS extensions. These processors read tagged files and convert them to PDF using user-supplied stylesheets. The paginated PDF you get uses the same extensive CSS available for web design, in addition to the specialized CSS3 features added just for paged media, like text strings, cross references, and printer marks. [1]

Cost is a factor in adopting this kind of workflow. The PDF processor is the biggest upfront cost beside the initial stylesheet development. As of this writing, Antenna House costs $1,700 for a single user license, or $7,000 for a server license. Prince’s licenses are substantially less: $495 for a single user, or $3,800 for a server license. But compared to the ongoing cost of desktop page layout, a single upfront payment to install a PDF processor becomes a viable option. (Prince offers a demo version that watermarks the first page of each PDF but is otherwise fully functional. It’s a good way to experiment and evaluate the workflow.)

The open source command-line tool xhtml2pdf is built on python and can convert html to PDF for free, however the CSS support is much less robust than the for-pay tools, especially for CSS3 paged media features. Download the source code from GitHub. Here are some notes I whipped up after playing with xhtml2pdf for an hour.

Building a book#section3

The new CSS3 features come from the Paged Media Module and the Generated Content for Paged Media Module (GCPM). I used the latest working draft and the latest editor’s draft of the Paged Media Module to develop my stylesheets. The spec is fairly stable and has entered the last call period (meaning the working group feels pretty good about it and is looking for final review before they recommend advancement). They’re still editing and it’s likely that they’ll release another Last Call Working Draft to finalize changes made during this review period.

The first step when working with print documents is to set up your page structure using the @page element. This is akin to master pages in print layout, through which you can set the trim size (i.e., page dimensions), borders, running headers and footers, fonts, and so on—basically anything that you want to appear on every page. And of course, you can still use cascades. For example:

@page {
  size: 5.5in 8.5in;
}

This code sets the trim size of every page in the book, which you can then build on to style different sections of your book. The following page definitions add margins and padding for left and right hand pages only in parts of the file that use the “chapters” page rule:

@page chapters:left { /* left page setup */
  margin: 0.75in 0.75in 1.125in 0.62in;
  padding-left: 0.5in;
}
@page chapters:right { /* right page setup */
  margin: 0.75in 0.62in 1.125in 0.75in;
  padding-right: 0.5in;
}

The page names are yours to create, and each named page can be further broken up into :first (which styles the first page within an element that uses that page rule), :left, and :right. Invoke page rules like this:

section.chapter {
page: chapters;
}

The Paged Media spec also has 17 predefined page areas that you can customize within your page rules. There’s the main page area, and then 16 other areas run along the edges, as follows:

top-left-corner	top-left	top-center	top-right	top-right-corner
left-top	main page area			right-top
left-middle				right-middle
left-bottom				right-bottom
bottom-left-corner	bottom-left	bottom-center	bottom-right	bottom-right-corner

You can style each of these page areas individually, if for example you want to add navigation tabs or running headers or footers (see below for more on those). The Paged Media Editor’s Draft has a great description of sizing and positioning of margin boxes. All but the corner margin boxes have variable widths (for boxes on the horizontal edges) or heights (for boxes along the vertical edges), and will stretch the full width or height available until they run into an obstacle (for example, neighboring content defined in one of the adjacent margin boxes). The example below adds a gray bleed to the outside edge of all index pages by adding a background color to just three of the vertical margin boxes. Because there’s no other content defined in the remaining boxes, the bleed will fill the full height of the page. You might accomplish a similar effect with a fixed position background image or by using page borders, but this method is simple, clean, and gives true bleeds (see Bleeds below).

@page indexmaster:right {
	@top-right-corner {
	background-color: #777777;
	background-color: device-cmyk(0.0, 0.0, 0.0, 0.5);
	padding-left: .8in;
	margin: -6pt -6pt -6pt 0;
	}
	@right-top {
	background-color: #777777;
	background-color: device-cmyk(0.0, 0.0, 0.0, 0.5);
	padding-left: .8in;
	margin: -6pt -6pt -6pt 0;
	}
	@bottom-right-corner {
	background-color: #777777;
	background-color: device-cmyk(0.0, 0.0, 0.0, 0.5);
	padding-left: .8in;
	margin: -6pt -6pt -6pt 0;
	}
}

To keep the bleed on the outside edge, the left and right pages need to be defined separately. The margins, padding, and margin boxes will all need slight adjustments for the corresponding left page bleed. (You may also have noticed that there are two color definitions in the above code; see CMYK Colors below for more about that.)

Counters#section4

Counters aren’t new, but really come in handy for paged media, allowing you to add automatic numbering to chapters, figures, examples, and so on with just a few lines of CSS, like this:

section.chapter > div.titlepage > div > div > h2.title:before {
counter-increment: ChapterNo;
content: "Chapter " counter(ChapterNo);
}
div.figure-title:before {
counter-increment: FigureNo;
content: "Figure " counter(ChapterNo)"-" counter(FigureNo)": ";
}
section.chapter {
counter-reset: ChapterNo FigureNo;
}

In the above code, we created two counters, one for chapter numbering and one for figure numbering, and then reset them both starting at every new chapter. (Bear in mind counter-reset cascades, which means that if you want to reset a few counters on the same element but you define them separately, only the last definition will be honored. To get all the counters to reset, you need to run them together, as shown above.) Additionally, we used the ChapterNo counter within the figure title, to do things like this: “Figure 5-11:.” In this case, the ChapterNo counter is actually applied to the figure title’s parent element—section.chapter. The PDF processor will look progressively further and further up until it finds an instance of the specified counter that applies to the element in question.

Strings#section5

You can turn almost any element into a string that you can then invoke in your CSS to appear in other places throughout your document. Headers and footers, where you have the page number and some text appear on each page, make good use of strings—for example, the book title on the left-hand page, and the chapter title on the right (CSS3 also includes some built-in handling for running elements; see below for why I chose to use strings instead).

Use string-set on any element to make the contents of the element reusable. Make up a name for it, and name the content you want to include:

div.book > div.titlepage > div > div > h1.title {
  string-set: Booktitle self;
}
section.chapter > div.titlepage > div > div > h2.title {
  string-set: Chapter self before;
}

In the top example, the name of the string is “Booktitle,” and I use the very simple “self” to say that I want the string to include whatever the content of that element is. In the second block, I tell the string to include both the content of the element, as well as any content I added using the :before selector (as I did to add with the chapter numbers in the Counters section, above).

To invoke the strings, reference them in the content property:

@page :left { /* left page setup */
  @bottom-left { /* verso running footer */
    content: counter(page)" "string(Booktitle);
  }
}
@page :right { /* right page setup */
  @bottom-right { /* recto running footer */
    content: string(Chapter)" "counter(page);
  }
}

Strings can be quite powerful and can include various types of content within the string-set property, including counters (I use the page counter in the above examples to display the current page number on each page as well), before/after text, and static text. You can also define multiple strings within one string-set property.

CSS3-GCPM actually includes special properties just for running elements: running() and element(). Used together, these properties convert any element into a running header or footer. However, the danger here is that when you convert an element to a running element in this way, it no longer appears in its original place: running() acts more like a float that also repeats on subsequent pages. Since I want my headers to appear both in their places inline and as running elements, I used strings instead.

Cross references#section6

Most long documents (like books) include cross references, which usually look something like this: “See page 127.” Within an XML or HTML workflow, cross references can be set up as live links that jump to another section. Although live cross reference links are a basic feature for all digital books, including web-optimized PDFs, they naturally won’t be useful for the print book. However, since the source content is unpaginated, it’s hard to know what location the text should refer to. You won’t know the print page number until you send the text through the PDF processor, and in any case that page number is inaccurate when it comes to reflowable eBooks. The answer is to use generated text, which relies on target-counter(), and target-text().

For example, say you have this cross reference in your HTML:

<p>See <a class="xref" href="#section25" title="Working with Cross 
References">Chapter 5, <em>Working with Cross References</em></a>.</p>

By adding this style to your CSS:

a.xref:after {
  content: " (page " target-counter(attr(href, url), page) ")";
}

You’ll end up with:

See Chapter 5, Working with Cross References (page 127)

There are a few things going on in that CSS. First, we supplied a static text string that will add a space, an opening parenthesis, the word “page ”, and another space before any generated content. The next bit, target-counter, tells the renderer to pull in a specific counter related to the element. Then, within the parentheses, we tell the renderer that we need the “page” counter that applies to the href attribute of the element in question i.e., the renderer should follow the link to its source (#section25), figure out what page it’s on, and display that page number. To wrap it up, we have one last text string to add a closing parenthesis. If the pagination changes the next time we run the document through the PDF processor, the page number will update automatically.

The target-text() property takes things a step further by pulling in all the text from another element somewhere else in the document. For a simple example, let’s say we need to do something about a hard-coded cross reference to a print page number, like this one:

<p>See <a class="xref" href="#section25" title="Working with Cross 
References">page 110</a></p>

…

<h2 class="title" id="section25">Working with Cross References</h2>

Again, we want to make sure that the cross reference always displays an accurate page number, but we also want to include the name of the section being referenced to match the formatting of our previous example. And so, the following:

a.xref {
  content: target-text(attr(href, url), content())" (page " target-counter
  (attr(href, url), page) ")";
}

…will give us this:

See Working with Cross References (page 127)

The target-text property works much like target-counter—it follows the url to its source, and when we supply it with a value of content(), it pulls in the content of the element we’re linking to. The last piece of our cross reference is to add the referenced chapter number within the cross reference text. If we’ve already set up automatic chapter numbering using counters, as we did above in Strings, then we can pull that in as well:

a.xref {
  content: target-counter(attr(href, url), ChapterNo)", "target-text
  (attr(href, url), content())" (page " target-counter(attr(href, url), 
  page) ")";
}

For our desired end result:

See Chapter 5, Working with Cross References (page 127)

And now for an important warning: Antenna House won’t break generated text across lines. If the imported text is too long to fit in the page area, it’ll just stretch off past the page edge. Antenna House will, however, break static text strings that you include in the content property. For example, in the above, it will break anywhere in “Chapter “, “ (page “, and “)”, but it won’t break within the actual chapter title, or in the page or chapter numbers (though those latter two are so small, that it probably wouldn’t break inside them anyway). This makes generated text somewhat risky and only appropriate for short lines; more about this in the Footnotes section below.

Table of contents#section7

A table of contents can be set up in the XHTML as a series of nested unordered lists, with each list item linked to the section in question. This works great for ebooks, but print books need to display the page number for the section as well. Just like cross references, you can use target-counter to set that up:

div.toc ul li.preface a:after { 
 content: leader(dotted) " " target-counter(attr(href, url), page);
}

The leader(dotted) function adds a leader tab between the text of the table of contents entry and the generated page number, like so:

Working with Cross References…………………………………….. 127

There are three predefined leader styles: dotted, solid, and space—or you can create your own string. For example, leader(“~”) will create a line of tildes.

Multi-column layouts#section8

Multi-column layouts are another new feature of CSS3. They allow you to split any div into multiple columns using column-count. For example, to set only the index of a book in two columns, while leaving the majority of the text in a single column, add column-count: 2 to the index div:

div.titlepage+div.indexnote+div.index {
column-count: 2;
column-gap: 12pt;
}

The column-count property sets the number of columns, and the column-gap property sets the space between the columns. You can also add column-width to specify the width of two (or more) columns. The columns will span the entire available page area by default.

Breaks#section9

If you’ve done digital book production, then you’re most likely familiar with CSS’ page break properties: page-break-before, page-break-after, and page-break-inside.

As defined in CSS2.1, page-break-before and -after accept the following values: auto, always, avoid, left, right, and inherit. You can use them to force breaks around elements, or use page-break-inside to prevent breaks from occurring inside elements. (This is useful for keeping all paragraphs of a sidebar on the same page, for example). Assigning a value of left or right will force a break until you end up with a blank left or right page, respectively. This is useful for book chapters, where you want every chapter to start on a right-hand page. You’d define the chapter div as follows:

section.chapter {
  page-break-before: right;
}

CSS3 adds a few extra properties for multi-column layouts: break-before, break-after, and break-inside. These function the same as the page-break rules, but at the column level, and add a few extra possible values: page (force a page break), column (force a column break), avoid-page (avoid a page break), and avoid-column (avoid a column break).

Footnotes#section10

CSS3-GCPM adds special handling just for footnotes. First you’ve got the @footnote selector, which defines the part of the page reserved just for footnotes (if you’ve got any). We also have a new kind of float: float: footnote;, which is where the real magic happens. When you apply a float of footnote to an element, the entire contents of the element get floated down to the bottom of the page, into the @footnote page area. They lose the normal inherited formatting, and instead get styled with any formatting you’ve defined for the @footnote area. Additionally, at the point of reference, a marker is added (in superscript by default) that corresponds to the number (or symbol) next to your newly floated content. You can style the in-text marker, called the footnote-call, and the floated footnote number, called the footnote-marker with two new pseudo-elements: ::footnote-call and ::footnote-marker.

Now here’s the disconnect: my XHTML source files included all footnotes as endnotes, where the footnote text sat at the end of each section. My print design called for the footnotes to appear on the page on which they were referenced. In spite of this, I almost got footnotes working without any XHTML changes by just using generated text and the CSS3 footnote tools. Ultimately this plan failed because, as noted above, generated text in some PDF processors doesn’t like to break across lines but will instead just run off the margin if it gets too long. For books with footnotes just a couple of words long, there’s no problem, but that’s rarely the case. [2]

I ended up editing the XHTML to move the footnotes to the exact position where they’re referenced and wrap them in a span with class="footnote". I chose spans mainly because that would leave the footnotes inline, without adding an extra paragraph break (as a div or p would).

Here’s the new html:

<p>As you saw in the earlier section,<span class="footnote"><p>If you 
were paying attention, of course.</p></span> generated text doesn't break 
across lines.</p>

Yep, you’re seeing that right: we’ve got a p within a span within a p. It’s not exactly perfectly formed XHTML, but it does the trick. And with this simple CSS:

span.footnote {
  float: footnote;
}

We get this:

As you saw in the earlier section,¹ generated text doesn’t break across lines.

1 If you were paying attention, of course.

Another part of the CSS3 footnote arsenal is a predefined counter—footnote—that applies to all elements with float: footnote. The footnote counter resets the same as any other counter (see Counters above), allowing you to restart footnote numbering as needed (for example, you might set the numbering to restart at the beginning of each new chapter).

You can customize the way footnotes are marked—with numbers, letters, symbols, or any other value supported in list-style-type. There’s also a predefined “footnotes” style that rotates through and then multiplies four different glyphs: asterisk, double asterisk, cross, and double cross. Footnotes will be numbered with decimal numbers like 1, 2, 3, etc., by default. To change to lowercase letters, you’d do the following:

::footnote-call { 
  content: counter(footnote, lower-alpha); 
} 
::footnote-marker { 
  content: counter(footnote, lower-alpha); 
}

Make sure to set the list type for both the footnote call and footnote marker, unless you want to seriously confuse your readers.

PDF bookmarks#section11

Bookmarking is irrelevant when you’re dealing with print media, but is a handy (and I would argue, essential) component for web-optimized PDFs. Bookmarking adds a linked table of contents to the navigation panel of a PDF reader, allowing users to jump to specific sections. You can create bookmarks to almost any element, and you can tell the PDF how to nest and display the bookmarks all in your CSS.

Here we’ve got two levels of bookmarks, nesting level-one headings inside chapter titles. Instead of having all the levels expanded and displayed when the PDF is opened, we’ve set them to a state of “closed.” Users will only see the chapter titles, and can click to expand the tree and see the nested section headings if they wish:

section.chapter > div.titlepage > div > div > h2.title { 
  bookmark-level: 1; 
  bookmark-state: closed;
}
div.sect1 > div.titlepage > div > div > h2.title { 
  bookmark-level: 2; 
  bookmark-state: closed;
}

Bookmarks will automatically include the entirety of an element’s content, including any text you added with :before and :after selectors. However, you can restrict the bookmark to display only a subset of the element’s information by using the bookmark-label property. For example:

section.chapter > div.titlepage > div > div > h2.title { 
  bookmark-level: 1; 
  bookmark-state: closed;
  bookmark-label: content();
}

The example above will display only the actual text of the element, and ignore any before/after content. Note that all text is imported without any formatting, and you can’t specify combinations of content values within a single bookmark-label declaration.

You can also choose to display a specific text string that will overwrite the contents of the HTML element. For example, if you want to add a bookmark to the copyright page, but the words “Copyright Page” don’t actually appear anywhere in the text:

div.copyrightpage { 
  bookmark-level: 1; 
  bookmark-state: closed;
  bookmark-label: "Copyright Page"
}

Fonts#section12

When it comes to adding custom fonts to your CSS, you may be relieved to know that it’s the same old CSS you’re used to: use @font-face to declare the font, and use font-family to invoke it. Remember to include fallback fonts, especially for body text where you may need to use symbols that aren’t included in your main body font set. Again, this is the same CSS that people have been using for ages:

font-family: "Gotham", "Arial Unicode", sans-serif;

Arial Unicode includes a huge number of glyphs, and so is usually a pretty safe sans-serif fallback.

Most commercial printers require fonts to be embedded in every PDF file. The methods for this vary depending on the PDF processor, so you’ll need to read the documentation carefully if you want to build embedded fonts into your workflow. You could also embed fonts after conversion with PitStop or another PDF post-processing tool.

There are a lot of nice features for fonts coming with CSS3, but they’re still unstable and neither Antenna House nor Prince has added support yet (though Antenna House—and Prince to a more limited extent—has some nice extensions for working with fonts). Check out the Fonts module to get a sense of what’s coming. Development that improves text formatting on a larger scale, including specifying word- and line-breaks, spacing, and so on is underway. Prince and Antenna House have implemented some of the features to varying degrees, as they had been defined at the time of release. You can check out the spec, though I encourage you to check with your PDF processor’s CSS reference before you experiment, as there may be variations.

Final touches for printing#section13

There are a few final steps to take if you’re planning to print your document commercially.

Image resolution#section14

Image resolution is crucial for printed media. A general guideline is to set each image’s resolution somewhere in the 200 to 300dpi range (specific requirements depend on each book’s needs). Most PDF processors will impose their own default resolutions on images during conversion, but you can choose to preserve the resolution of the source files instead:

img {
image-resolution: from-image;
}

You can also set the value to normal, to let the processor choose the resolution, or you can provide a specific numerical dpi value. (Messing around with image resolution is tricky, though, so do your homework first!)

CMYK colors#section15

You should be thinking about CMYK colors throughout building your stylesheet. You specify CMYK colors similarly to how you specify RGB colors:

hr {
color: device-cmyk(0.0, 0.0, 0.0, 0.3);
}

Each value should be a number between 0 and 1 (percentage values actually also work, though only the decimal values are endorsed by the W3C spec right now). Specify the percentage of Cyan, Magenta, Yellow, and Black ink to be used, in that order. You can also build that in with fallbacks by stacking color definitions, for cases where you need to repurpose your stylesheets for multiple presentations (web, print, etc):

hr {
color: #B3B3B3;
color: device-cmyk(0.0, 0.0, 0.0, 0.3);
}

If the device reading the code doesn’t understand CMYK, it’ll use the web-friendly version.

Printer marks and bleed#section16

During commercial printing, books are actually printed on a larger page size than the final version, and are cut down. The cutting is usually pretty exact, but can vary up to a few sixteenths of an inch. So, to ensure that any images or colors that you have at the edges of the page will actual lie on the edge of the page without strips of white being left during the cropping process, you need to set them to run off the page edge a bit, just in case, and then you’ll need to tell the processor to render that little bit of extra stuff beyond the edge, and to add crop marks to guide the printer:

@page {
  bleed: 6pt;
  marks: crop;
}

It’s that easy. Of course, you’ll need to be creative with bleeding elements, using negative margins and positioning to get them to actually bleed—the processor won’t automatically add extra color or content beyond the limits of the element, it’ll only show content that already exists.

Final notes and further reading#section17

You can read through the full list of CSS that Antenna House supports, but I warn you that the documentation is limited at best and not always clearly worded. Prince’s documentation is slightly better.

Both Antenna House and Prince have their own extensions built on top of the standard CSS3, which are worth checking out. Here are Antenna House’s extensions. Prince’s extensions are listed inline with regular CSS support, and are less robust. Additionally, if the CSS documentation isn’t helping, it may be useful to read the documentation for the related XSL-FO property. They’ve been in use longer and are more fleshed out, and the functionality is usually the same or very similar. I wasn’t able to find documentation on Prince for this, but here is Antenna House’s documentation.

Remember that CSS3 is still a developing spec; CSS3.info keeps a fairly up-to-date list of the status of the various CSS3 modules. Don’t let that stop you from dipping a toe/foot/leg/neck in the water, though! Here, I limited myself to some book-building basics—page dimensions and margins, cross references, strings, headers and footers, and printer-friendly colors, images, and bleeds—but CSS3 has a lot more to offer when it comes to paged media, and I encourage you to see how much you can do (and remember, CSS2.1 still works, too).

Notes

1. If you’re starting with XML source files, you’ll find it much easier to convert to XHTML first before styling with CSS. Luckily Bob Stayton already built the XSL to help you do that: http://sourceforge.net/projects/docbook/files/epub3/.
2. Because where’s the fun in footnotes if you can’t wax poetic a little bit?

22 Reader Comments

afonsoduarte says:

June 12, 2012 at 10:23 am

Nice and thorough article.

Do you know what the browser support is like for the Page Media Module? Prince and Antenna House are great, but there would be no need for them if we could just use cmd + p and the native (in os x anyway) save as pdf.
Nellie McKesson says:

June 12, 2012 at 11:48 am

@afonsoduarte AFAIK, there’s no browser support for paged media yet, though I believe the intention is that paged media will work on the web eventually as well. It would be pretty awesome to be able to set web content up in fixed pages–goodbye epub, hello streaming books!
Charlie Clark says:

June 12, 2012 at 12:50 pm

Opera released a labs build with support for this. It’s fantastic and completely transforms the experience of reading in the browser.

http://dev.opera.com/articles/view/labs-more-fun-using-the-web-with-getusermedia-and-native-pages/
sjs says:

June 12, 2012 at 1:04 pm

I’ve used mPDF int he past to create PDFs for print using CSS. It supports functions like page breaks, but unfortunately doesn’t do so via CSS.

There are *a lot* of issues with mPDF, but if you can fight through them, it’s the best *free* tool for HTML->CSS that can do page breaks.
Nellie McKesson says:

June 12, 2012 at 1:08 pm

A couple Google+ readers also just pointed me to this tool: http://code.google.com/p/wkhtmltopdf/

It’s built on webkit, so it’s not trying to reinvent the wheel. I haven’t played with it yet, but I’ll poke around as soon as I can.
barefootliam says:

June 12, 2012 at 2:44 pm

If you go into a bookshop today the chances are you’ll be _surrounded_ by books produced from markup. The world’s largest publishers (and many smaller publishers) use XML (including XHTML) and XSLT to produce XSL-FO and make books. So when you start out with, “While historically, it’s been difficult at best to create print-quality PDF books from markup alone” I can accept that markup is difficult for many people but maybe the effect of the sentence is misleading.

It’s great to see CSS starting to catch up (overall it’s still a long way behind, but vendor extensions are bridging the gap, and in some ways CSS 3 is ahead). We can expect improvements in the CSS support for print and for paged media in general over the coming months and years.
Nellie McKesson says:

June 12, 2012 at 3:23 pm

@barefootliam Thanks for pushing for clarification. Yeah, at O’Reilly we’ve been using an XML->XSL-FO toolchain for many years to produce our books. The difficult part is the FO, as it’s a tricky language that creates a barrier between the common man and beautiful PDFs created from markup. XSL-FO isn’t an *impossible* language to learn, but it doesn’t have the simplicity that CSS has, which is why CSS3 support is so potentially revolutionary. The time it takes us to iterate and create new templates with CSS3 is 1/3 or less the time it would have taken with XSL-FO, and the formatting control that was previously silo’d in the hands of elite FO coders is now available to anyone with CSS knowledge.
barryvan says:

June 12, 2012 at 9:13 pm

It’s probably worth pointing out that WeasyPrint [1] is an open-source HTML+CSS->PDF system, written in Python. Simon Sapin, the lead developer, is very active on the W3 CSS mailing lists. It’s very definitely not at the same level as Prince and the like, but it’s certainly worth checking out.

[1] http://weasyprint.org/
bowerbird says:

June 12, 2012 at 9:39 pm

nellie, after just skimming your article,
my head hurts! so it’s no wonder that
you drink bourbon and play the drums.
if my job was this difficult, i would too.

fortunately, there are ways to do this
that are _much_ simpler to accomplish.
i’m sure you’ll learn about them soon…

-bowerbird
Jeffrey Zeldman says:

June 13, 2012 at 7:39 am

Dear Bowerbird:

Snark aside, to which specific methods of easy epub creation are you referring? URLs, please. Thanks.
bowerbird says:

June 13, 2012 at 8:34 pm

jeffrey zeldman said:
> Snark aside, to which specific methods
> of easy epub creation are you referring?

well, mr. zeldman, i will be
happy to give you a pointer.

but first you should tell me
if you consider “snark” to
be a good and funny thing,
or a bad and nasty thing…

because i just said “i love you”
(yes, _you_, jeffrey zeldman)
in a comment earlier today:
> http://blog.readability.com/2012/06/announcement/#comment-3418

so… you know, it would be…
ironic if you’re criticizing me
while i’m professing fandom.

(but… you’re in luck, because
i like irony as well as snark!)

and even if it was “snark”, hell,
i’m sure nellie can take it well.

after all, she uses sticks to beat
an instrument to produce music,
_and_ drinks her share of bourbon,
so i’d guess she’s a tough chic…

(and if you’re wondering about her
sense’o’humor, grok her twitter pic.)

anyway, you let me know, jeffrey!
i will check back here tomorrow…

meanwhile, these _underscores_
and *asterisks* i have used here,
but not the non-link above, might
give a little clue about my answer.

-bowerbird

p.s. oh, i should also note that
this article is really on the topic
of turning (x)html into a .pdf,
rather than an .epub, but i will
be happy to give you a pointer
on the overall question of using
a “master” source-text to attain
many different formats we want
(e.g., .epub, .mobi, .pdf, .html5).
amit says:

June 14, 2012 at 4:10 am

nice article
barefootliam says:

June 14, 2012 at 4:15 pm

@Nellie McKesson, yes, I agree 100% that FO is too hard.

The challenge will be to make CSS as powerful without also making it too hard.

The current drafts for CSS don’t support multiple streams of footnotes, or “page 3 or 5” or “this page left intentionally blank” or stacked marginalia or collapsing index entries into ranges; there are products (e.g. from Antenna House) that have extended CSS as a basis for some or all of this, but such extensions are product-specific and in some cases may even be customer-specific rather than general.

So I’m hoping that over the next few years we get to the point where HTML + CSS can be used for the majority of the world’s books, without an unacceptable loss of functionality or beauty.
Nellie McKesson says:

June 14, 2012 at 5:35 pm

@barefootliam I agree that there’s still plenty to add/polish in the paged media spec, and your point about being careful not to turn CSS into FO by building in all that same messy functionality is a very good one.

We actually haven’t had to lean too heavily on Antenna House extensions (I think we really only use them for floats, which are a pain right now), and have managed to replicate or improve our old FO layout. Granted our book designs aren’t exactly the most intricate things, but the point is I think it’s possible, and closer than you think!
barefootliam says:

June 14, 2012 at 11:43 pm

@Nellie McKesson actually I’ve a pretty good idea how close it is 🙂 and agree it’s possible (and that floats are a pain). We’re hoping to write up some of the places CSS would need enhancements, in the W3C Print and Page Layout community group, and then to move on to proposals and actually making it happen.

Really I just wanted to remind readers here that in fact people have been producing (with varying amounts of difficulty) printed books from XML (and SGML before it) for decades, and that it’s not even a rare and unusual thing to do 🙂
Webdesign Den Haag says:

June 18, 2012 at 1:08 pm

Thanks for sharing! Great article but kinda tough
Christoph PÃ¤per says:

June 20, 2012 at 12:48 pm

It’s been “˜::before’ and “˜::after’ for quite *some* time now.
Christoph PÃ¤per says:

June 21, 2012 at 5:13 am

The comma between name and type in the “˜attr()’ pseudo-fuction is gone, it only appears in front of the fallback value. So it’s “attr(href url)”.
whole1959 says:

June 21, 2012 at 8:45 am

Long before xhtml, xml, html, publishing companies and large typesetting foundries used sgml. Actually; html was derived from sgml. Sgml was complex and huge/bulky so in the above mentioned way is good as it is always good to separate data, from form and function.
bowerbird says:

June 21, 2012 at 5:46 pm

still no response from mr. zeldman, over a week later.

i guess he wasn’t interested in conversation after all.

-bowerbird
Smartycrowd says:

June 24, 2012 at 3:39 am

Do you really think that CSS starting to catch up. I hope you are indeed right that we can expect improvements in the CSS support for print and for paged media in general over the coming months and years barefootliam.
TellMeMore says:

March 13, 2013 at 3:06 pm

This might seem like a silly question, but it is actually a serious one. What if I wanted to publish looseleaf supplements to existing publications? That is, updated page groups to replace pages in existing printed books. (The books are in something like a three-ring binder, so pages can easily be replaced.)

Some of the unusual things I’d want to do would be:
* specify arbitrary page breaks (but only temporarily–for printing purposes)
* override automatic list numbering (because what if my updated page starts with list item 32?)
* have the ability to do plenty of manipulation with page numbers (if I’m putting in more pages than I took out, I’ll be making “point pages.”)
* “scrape” information from headings, manipulate it, and show it in dictionary-style guide heads/ears, as with Word’s “StyleRef” when used inside a field in a page header

… So, for anyone in the know… can XHTML with CSS3 do any or all of these things?
I’m looking in the direction of PrinceXML. Am I likely to find what I need?
Thanks!