Printing a Book with CSS: Boom!
Issue № 208

Printing a Book with CSS: Boom!

HTML and CSS, two of our favorite acronyms, are normally associated with web pages. And deservedly so: HTML is the dominant document format on the web and CSS is used to style most HTML pages. But, are they suitable for off-screen use? Can CSS be used for serious print jobs? To find out, we decided to take the ultimate challenge: to produce the next edition of our book directly from HTML and CSS files. In this article we sketch our solution and quote from the style sheet used. Towards the end we describe the book microformat (boom!) we developed in the process.

Article Continues Below

The studious reader may want to fetch a sample HTML file, sample style sheet, as well as the PDF file generated by Prince. The PDF file is similar to the one we sent to the printer. We encourage you to base your own book on the sample file and tell us how it goes.

Print vs. pixel#section2

A printed book has many features not seen on screens. There are page numbers, headers and footers, a table of contents, and an index. The content must be split into pages of fixed size, and cross-references within the book (for example, “see definition on page 35”) must be resolved. Finally, the content must be converted to PDF, which is sent to the printer.

Web browsers are good at dealing with pixels on a screen, but not very good at printing. To print a full book we turned to Prince, a dedicated batch processor which converts XML to PDF by way of CSS. Prince supports the print-specific features of CSS2, as well as functionality proposed for CSS3.

CSS2#section3

CSS2 has a notion of paged media (think sheets of paper), as opposed to continuous media (think scrollbars). Style sheets can set the size of pages and their margins. Page templates can be given names and elements can state which named page they want to be printed on. Also, elements in the source document can force page breaks. Here is a snippet from the style sheet we used:

@page {
  size: 7in 9.25in;
  margin: 27mm 16mm 27mm 16mm;
}

Having a US-based publisher, we were given the page size in inches. We, being Europeans, continued with metric measurements. CSS accepts both.

After setting the up the page size and margin, we needed to make sure there are page breaks in the right places. The following excerpt shows how page breaks are generated after chapters and appendices:

div.chapter, div.appendix {
  page-break-after: always;
}

Also, we used CSS2 to declare named pages:

div.titlepage {
  page: blank;
}

That is, the title page is to be printed on pages with the name “blank.” CSS2 described the concept of named pages, but their value only becomes apparent when headers and footers are available. For this we have to turn to CSS3.

CSS3#section4

The CSS Working Group has published a CSS3 Module for Paged Media. It describes additional functionality required for printing. We will start by looking at running headers and footers.

Headers and footers#section5

Here is an example:

@page :left {
  @top-left {
    content: "Cascading Style Sheets";
  }
}

The example above puts a string (“Cascading Style Sheets”) in the top left corner of all left-hand side pages of the book. All pages? Not quite. A subsequent rule removes the header from pages named “blank”:

@page blank :left {
  @top-left {
    content: normal;
  }
}

Recall from earlier that all <div class=“titlepage”></div> elements are to be printed on “blank” pages. Given the style sheet above, “blank” left-hand side pages will be printed without a header.

Stealing strings#section6

Our book consists of many chapters and the title of each chapter is displayed in a header on right-hand side pages. To achieve this, the title string must be copied from an element with the string-set property:

h1 {
  string-set: header content();
}

Just like there were named pages in the previous section, CSS3 also has named strings. In the example above, the string named “header” is assigned the chapter headings. Each time a chapter heading is encountered, the chapter title is copied into this string. The string can be referred to in other parts of the style sheet:

@page :right {
  @top-right {
    content: string(header, first); 
  }
}

In the example above, the right-hand side header is set to be the value of the “header” string. The keyword “first” indcates that we want the first value of “header” in case there are several assignments on that page.

Page numbers#section7

Like headers, page numbers are a navigational aid in books. Setting the page numbers is easy:

@page :left {
  @bottom-left {
    content: counter(page);
  }
}

One requirement from our publisher was to use roman numerals in the first part of the book. This part is referred to as “front-matter”. Here is the style sheet for roman page numbers in the front-matter:

@page front-matter :left {
  @bottom-left {
    content: counter(page, lower-roman);
  }
}

The numbering systems are the same as for the list-style-type property and lower-roman is one of them. The counter called “page” is predefined in CSS.

Cross-references#section8

The web is a huge collection of cross-references: all hyperlinks are cross-references. Cross-references in books are similar in nature, but presented differently. Instead of the blue underlined text we know from our screens, books contain text such as “see the figure on page 35.” The number “35” is unknown to the authors of the book—one can only find the page number by formatting the content. Therefore, the number “35” cannot be typed into the manuscript but must be inserted by the formatter. To do so, the formatter needs a pointer to the figure. In HTML, this is done with an A element:

<a class="pageref" href="#figure"gt;see the figure</a>

The corresponding style sheet looks like this:

a.pageref::after { 
  content: " on page " target-counter(attr(href), page) 
}

The example above needs some explanation. The selector refers to a generated pseudo-element (::after) which comes after the content of the A element. The first part of that pseudo-element is the string ” on page ”. After that comes the most interesting part, the target-counter function which fetches the value of the “page” counter at the location pointed to by the “href” attribute. The result is a that the string ” on page ” is concatenated with the number “35”.

Table of contents#section9

Similar magic is invoked to generate a table of contents (TOC). Given a bunch of hyperlinks pointing to chapters, sections and other TOC entries, the style sheet describes how to present the hyperlinks as TOC. Here is a sample TOC entry:

<ul class="toc">
<li><a href="#intro">Introduction</a></li>
<li><a href="#html"><abbr title="HyperText Markup Language">HTML</abbr></a></li>
</ul>

The style sheet for the TOC uses the same target-counter to fetch a page number:

ul.toc a::after {
  content: leader('.') target-counter(attr(href), page);
}

Also, a new function, leader, is used to generate “leaders.” In typography, a “leader” is a line that guides the eye from the textual entry to the page number. In our example, a set of dots is added between the text and the page number:

Introduction….................1
HTML….........................3

Note that the this functionality is experimental; no Working Draft for leaders has been published yet.

The book microformat—boom!#section10

As you probably have guessed by now, we succeeded in producing our book using HTML and CSS. In doing so, we also developed a set of conventions for marking up a book in HTML. HTML has the wonderful class attribute which lets anyone extend the semantics of HTML documents while building on HTML’s universally known semantics. So, in our book, we used a rich set of HTML elements and added a bunch of class names.

Since then, the concept of “microformats” has entered the web and we are happy to discover that we actually developed (at least the beginnings of) a microformat for books. We think other authors will be able use the boom! microformat and improve upon it in the process.

Sections of a book#section11

The chapters in the first part of the book, such as preface, foreword, and table of contents, are enclosed in a DIV with a corresponding class name. The chapters in the main body are DIVs with a class of “chapter” and the appendices are DIVs with class “appendix.” In the style sheet, the class names are primarily used to select the correct named page with the correct headers and footers.

Although HTML has six levels of headings (H1, H2, etc.) to distinguish chapter headings, section headings, and subsection heading, it is convenient to enclose sections in an element, if only to be able to style the end of a section. We used a DIV with class “section”.

Tables and figures#section12

HTML doesn’t have a dedicated element for figures with captions, but it is easy to create one by specializing a DIV:

<div class="figure">
  <p class="caption">...</p>
  <p class="art"><img src="..." alt="..."/></p>
</div>

The TABLE element has a CAPTION element, but support is spotty. We, therefore, used a similar strategy for marking up tables:

<div class="table">
  <p class="caption">...</p>
  <table class="lined">
    ...
  </table>
</div>

We used a variety of figure styles (normal, wide, on the side, etc.) and table styles (normal, wide, lined, top-floating, etc.) in our book. An element can be given several class names, so that, say, a table can be both “lined” and “wide.” We have cut down on the number of alternatives in the sample document for the sake of simplicity.

Side notes and side bars#section13

A DIV with class “sidenote” is used for side remarks, related to the (following) text in the main body but not necessarily shown in-line. A typical way to show them is to put them in the margin.

A “sidebar” is longer than a “sidenote.” The latter is typically only one paragraph, maybe two; the former is several paragraphs or includes lists or other material. In the sample document there is one sidebar that floats to the top, uses the full width of the page, and is given a gray background.

Summing up#section14

The Prince formatter has opened up the processing pipeline from HTML and CSS to PDF. It is now possible, even feasible, to use HTML as the document format for books. This makes it easier to cross-publish content on the web and in print.

That said, authors who attempt to use the techniques described in this article will face some technical issues along the way. For example, we have not discussed how to generate the TOC structures and how to display wide tables. We have also left some room for improvement in the boom! microformat. However, compared to the headaches of actually writing a book, formatting is now a joy!

Bert Bos

Bert Bos proposed and implemented his own style sheet language before joining forces with Håkon Wium Lie at W3C in 1995. He was the co-author of the original CSS specification and launched W3C's internationalization activities. He is currently the Style Sheets activity lead at W3C.

69 Reader Comments

  1. The people who are slagging this approach are completely missing the point, IMO. For me, the good part is not so much the use of CSS to format the book, but the use of XHTML to mark it up. The printing back-end can be ripped out and replaced with whatever works for you — FO (yuck), groff, LaTeX — or load up the HTML in M$ Word or OpenOffice, apply a stylesheet, and print.

    We can talk about DocBook until we’re blue in the face, but it’s such an incredibly complex DTD that most writers would give up before finishing the first chapter of the first document. DITA is a step in the right direction, but it’s probably still too complex for non-gearheads without a fair amount of motivation. Just about everyone knows enough XHTML to write a document, and there are plenty of tools — Free and commercial — that provide a pretty GUI for people who need it.

    As long as writers have to associate XML with complex large-scale publishing systems with six-figure deployment costs and five-figure support costs, it will be “eXcellent, Maybe Later” outside of Fortune 100 companies. HTML brought on-line publishing to the masses through a simple syntax; now it can bring single-source on-line/paper/PDF publishing to the masses as well.

  2. Others have pointed out the advertizing. I would also point out that HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended), so his opinion on XSL should be taken with a grain of salt.

    But now to Prince itself—

    It seems like a good way of bringing web pages to print. Someone mentioned its applicability to the blog-streaming world. Prince can fill this niche well. But going from web to print has, up till now, been carried out by printing according to a print stylesheet, not downloading a PDF. What Prince has over, for instance, Firefox is that the latter doesn’t support the CSS page model properly. When browsers come up to that functionality, Prince will be out of job.

    Mr Lie might answer that the niche isn’t web to casual print, it’s XHTML to books, with the XHTML not necessarily ever being hosted on a web server, and books like the kind we get from Framemaker or Quark. However, that’s a niche Prince, or more accurately an XHTML to PDF tool, can’t fill either. Maybe CSS is already up to the task of heavy formatting (and I doubt that), but XHTML isn’t up to the task of rich markup. XHTML is a limited tagset. You know it when you have to use span tags where in general XML you’d use an element. You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML. XHTML is suitable for the simplest books, but anything beyond that, like any random book you pick up in the college library, requires a more feature-rich markup language. XHTML for books could only be hobbyists’ fare, and maybe not even that, since hobbyists are far more likely to opt for WYSIWYG tools than textual stuff.

    In short, I don’t see Boom finding its niche among any of the possibilities. It’s overkill for simple web to print, and underpowered for professional typesetting.

  3. bq. HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended)

    🙂 It’s the «FO» part I have a problem with. Formatting objects don’t have any semantics and should therefore not be represented in XML. It’s just a bunch of font tags. Which is why I “once wrote”:http://www.xml.com/pub/a/1999/05/xsl/xslconsidered_1.html?page=4

    bq. I can understand why overworked undergraduates think FONT is cool, but I’m very disappointed when a group of highly skilled adults tell kids to stop playing, form a committee – and then come out with a set of supercharged FONT tags

    Anyway, your main argument is not CSS vs. XSL-FO, it’s against the use of HTML as the basis for our markup. You write:

    bq. XHTML isn’t up to the task of rich markup. XHTML is a limited tagset.

    Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back — without losing information.

    bq. You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML.

    I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.

  4. bq. Formatting objects don’t have any semantics and should therefore not be represented in XML.

    Why not? In the letters that make up the initialism XML, I don’t see anything that stands for Semantics. XML is just a toolchest for building any markup language you wish, and one of those happens to be the page layout language called XSL-FO. And it isn’t “just a bunch of font tags”? anymore than CSS is—I think we both know XSL-FO is to be generated from XSLT rather than written by hand, and when generated from an XSLT script it’s equivalent to a CSS stylesheet in separating content and presentation.

    bq. Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back—without losing information.

    Doesn’t the use of a kluge indicate the inappropriateness of the format? And, um, talking about semantics, are you aware that class attributes are style directives, containing no more semantics than I or B or TT tags? You’re like proposing the use of such constructs as divs with style attributes instead of H1/H2/H3 tags, which Ian Hixie complained about on his blog, but on steroids!

    This isn’t the right tool for the job. Anyone who so much prefers CSS to XSL can use CSS to style XML, and that would be better. I shudder to the thought of using HTML, with kluges and all, for preparing a college grammar book. But even CSS isn’t wholly satisfactory—you’ve had to write an external script to generate the TOC, while you can do it with XSLT, and then style with FO in the same gulp. Looking at it that way, the XSL approach could be said to be more deskilling than the HTML/CSS one.

    bq. I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.

    But OpenDocument is for office applications, not for browsers. You seem to be very Web-centric. Additionally, even if ODF were based on HTML, the type of HTML that office applications generate approaches the elegance of Frontpage’s output.

  5. How novel, using a language based on SGML (Standardised Genral Markup Language) to make a printer paint a page instead of a browser.

    Ah yes – my first program documentation (1983) was generated using IBM DCF on a 3090 mainframe. And our documents had strange markup like

    ,

    etc etc – and it had ‘stylesheets’ to add ornamentation (other than bold or underline) etc etc when IBM released its first advanced function printers (3800-3 and 3820s).

    IIRCit was a superset(or was that a subset?) of SGML. It even generated tocs, indexes etc etc

    Amazing its still around … http://www.printers.ibm.com/internet/wwsites.nsf/vwwebpublished/dcfhome_z_ww

    Kim Mihaly

  6. TeX is different thing. I wrote a number of papers, and even a whole book using it. These days I use XSL and produce PDF. This is XML based and more flexible. Still I personally like TeX much more. But PDF and TeX are about actually printing content in high quality.

    The point of this article seems to be, that using XHTML/CSS *can* be used to publish a *real* book. It’s a prove-of-concept by the persons who created CSS — and that’s nice.

    Printing web-pages is always a pain. And I hope that’s what this is about. Printer-friendly pages are never really what the claim to be. XHTM/CSS can not compete with PDF. But it can complement it. And make it easier to import web-pages into publishing systems.

  7. Very nice paper, thanks. And Prince may be just the tool I need for a “print-on-demand” adjunct to my (free) ebooks site (http://etext.library.adelaide.edu.au)

    I’ve been tinkering for a long time with ebooks — mostly public domain novels and essays (which it is true to say are quite simple compared to technical works), using HTML. My main interest has been in formatting books for the web rather than print, but there’s always that lingering, “wouldn’t it be nice” feeling that it would be great to be able to print them too, if desired. And I have had some limited success in that direction using rudimentary CSS (see the FAQ), which produces a nice result if you don’t mind A4 and don’t much care about page numbering etc.

    But it is very pleasing to see someone pushing the envelope to see what can be done with CSS. Now, if only my browsers supported all those features, I’d be very happy.

    Of course, with or without Prince, there’s no reason I should not use the CSS3 features, even if they are not currently supported. They will be one day, and then my ebooks will be ready and waiting!

    (And I’ve heard all the “wrong tool” arguments from the ebook crowd already, thanks! LaTex, Docbook, XSL, yadda yadda. Most of them are still producing ugly results whatever the tool.)

  8. I opened up this article mainly because I’ve been looking for a means to create invoices and proposals quickly, easily, and with some customization.

    I’ve always hated opening up Indesign or Msoft Word just to fudge a couple variables and print. I’m on a slow laptop, and it can seem like forever to load up these bloated apps, only to close them after seconds of use.

    I love the possibility that I can create a printed page template, and only have to open a simple text editor to edit and then send it to a browser (which is always on!) and hit print.

    I know everyone’s been knocking the book format application, but I’m very excited about other possible applications.

  9. I am righting a docbook “book”. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.

    Thanks,
    Raj

  10. bq. I am righting a docbook “book”?. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.

    Prince ships with a CSS style sheet that does rudimentary styling of DocBook files. It’s called “docbook.css” and it should be a reasonable starting point.

  11. While the article is great, i don’t get the point.
    I tried printing you HTML file via the latest version of firefox and ie and the footers (page numbers) do not appear.
    if we need to have another program to convert to pdf to then be able to print, what is the use? why cannot CSS just work with browsers for printing?
    I am trying to create documents that can be visible on the web and then when somene wants to print them, have a page number and a footer appear and your solution does not work, why is the question? since it is CSS and XHTML, there should not be a problem.
    i have to revert to the idea of making 2 style scheets one for print and one for screen which is also a bad solution.
    Unless i missed something

  12. I don’t think this is ready for any serious use. Consider for example

    • That you can’t have a ul inside a p,
    • or any block level element

    which really breaks orphan control code. There is simply no way that a client figure out like that a sentence like this belongs to the paragraph above, and isn’t a paragraph on its own.

    (La)TeX has rather advanced algorithms to do this right, since it seriously breaks text flow. It may not be the kind of thing that people point their fingers at, but if you ask them, they tell you that your text was “heavy”. Any typographer worth his salt, and a serious book publisher will give this high priority.

    Don’t get me wrong: I would really like to see a LaTeX replacement, as the HTML tools are much more widespread than LaTeX, and LaTeX is often a pain to write. However, it is important to realise that there are many good reasons why people use it for high-quality work, and that is not going to change before certain flaws in the original design of HTML is corrected, and I know it breaks your heart, HÃ¥kon, but that means backwards-incompatible changes must be made to HTML.

    Also, it means that we have to put some effort into high-quality printing in the UAs, and I don’t see us doing that…?

  13. I like the idea, since we already have the API documentation of our software generated as HTML. It must not be perfect for printing (I would prefer Latex over XSL-FO/Docbook, but that is not important here).

    What I am really missing is the possibility to create a reference to a numbered element: E.g. images are numbered with a chapter prefix and a counter for the image, i.e. a caption like “Fig. 2.3” for the third image in the second chapter.

    bq. %&&/%%%$$
    %&&&%%%&%
    %%&&&&&&&
    {text-align:center}Fig. 1.1

    This could be easily done using counters. But now I would like to create a reference in the text to this image like, e.g. “see Fig. 2.3” where the “Fig 2.3” is automatically generated. Is this possible?

    bq. bla bla bla (see Fig 1.1) bla bla bla

  14. I’m involved in a project that requires a “clean” print option, but none of the developers have specific expertise in printing (nicely) from XHTML. Frankly, we have been dreading the day when we would have to buckle down an learn an unfamiliar print technology. After stumbling on this thread I picked up the free Prince demo today. Within an hour, I was outputting reasonably complex pdf layouts with tables borders, backgrounds and images (oooooooh… ahhhhhhh…) and without touching the original content. The CSS2 implementation is refreshingly solid (this the week that IE7 and FF2 were released). Now, my team is genuinely excited about printing. Offset press publishing might be a stretch, but Prince proves that more modest goals are achievable with a fraction of the effort.

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

I am a creative.

A List Apart founder and web design OG Zeldman ponders the moments of inspiration, the hours of plodding, and the ultimate mystery at the heart of a creative career.
Career