Printing a Book with CSS: Boom!

by Bert Bos, Håkon Wium Lie

68 Reader Comments

Back to the Article
  1. The pages have no corner trims and no bleed; the black text separates into CMYK leaving the black at only 90%. We would not be happy to go to press with this file

    Sure. The sample file is optimized for screen use rather than print. Thanks for your partial list of requirements for PDF Prince—feel free to email me your complete list :-)

    Copy & paste the code below to embed this comment.
  2. In documentation management and production, we’ve been trying to get away from the conversion game for years. Tools that convert one format of document or text into another abound. If you’re familiar with help authoring tools, think Doc-to-Help, for example.

    Although you’re using style sheets and proper markup, you’re still trying to convert one format (HTML) into another (Print/PDF). In the long run, this is not the way to go. And your PDF will suffer as a result. Try to build bookmarks from that PDF

    True single-sourcing demands that content and format be separated completely. And there are tools out there that do that splendidly, where content is stored in unformatted data blocks and published at run-time using CSS (for HTML/XHTML/XML) or MS Word DOT templates for print. Incidentally, an MS Word DOT file is to MS Word precisely what CSS is to HTML. It only controls a lot more (like headers and footers) a lot better for print – because that’s what it was designed to do.

    Copy & paste the code below to embed this comment.
  3. Although you’re using style sheets and proper markup, you’re still trying to convert one format (HTML) into another (Print/PDF). In the long run, this is not the way to go.

    In general, I agree with your views that single source is good and document conversion is bad. This is why CSS has the concept of media types; the same source document (in, say, HTML) should be usable on all sorts of devices. The sample document has style sheets for print and screen and support for other media types can easily be added if you want more control of the final presentation.

    The only reason for converting our files to PDF is to send it to the printer. They only accept PDF files, and, as such, PDF works quite well.  Until printers accept HTML/CSS natively, using PDF is good solution which doesn’t change the fact that we have a single source.

    Copy & paste the code below to embed this comment.
  4. I have been trying to solve this web-to-print quandary for a project over the past month, and after trying to use RPDF with Ruby on Rails, I am retreating to a simply printable, CSS-styled web page.

    So, this article really got me excited! Then after reading it and hopping over to princexml.com to get Prince, I feel like this article is an advert for the $350 Prince software.

    It bums me out to read an article on my old favorite ALA, which is all about standards, open-ness & accessibility on the web, only to find out that I have to buy expensive proprietary software to put the knowledge to use.

    HÃ¥kon, I really appreciate you for all of the standards goodness you bring to us, but this is a TOTAL BUMMER.

    Copy & paste the code below to embed this comment.
  5. I feel like this article is an advert for the $350 Prince software.

    This article is about open standards, namely HTML and CSS. At the time of writing, Prince is the only software that can process our code. Not   referring to it would have been negligent. We hope other software will start supporting our code — this was one of the reasons for writing the article in the first place.

    Did you notice that you can download and use a fully functional demo version of Prince for free? You can also publish academic works without buying a license. And, if you compare Prince with other standards-based software that produces printed materials (some XSL tools come to mind), Prince is a bargain.

    Copy & paste the code below to embed this comment.
  6. HÃ¥kon, I think that you will agree with me, that if you want to present some content stored in XML to user, you often need to do two things: (1) transform your document and (2) assign visual characteristic to individual components of the document.

    During transformation you can do things like building a table of contents, numbering chapters and figures or adding some fixed content like a word “Figure” in the front of each figure name. CSS is able to do some basic modification to a document like adding figure numbers. More complex transformations like building of ToC must be created by something more powerful. My tool of choice for this task will be XSLT, but you can use any language which is able to read, manipulate and store XML document. You told me previously, that ToC for your book was created with some script.

    After (well, yes in CSS this is not after but at the same time) document is transformed, visual characteristics like fonts, colors, spacing, margins, etc. are applied to elements in the source document. If I understand correctly your position, you are advocating CSS over XSL-FO here because CSS syntax is easier and you are assigning properties directly to elements from source XML document. I think that I can agree with your position here… but only as long you are using CSS with some general, document oriented XML format like XHTML or DocBook. Let me explain.

    If you have book in XHTML you can easily add ToC into this document, because XHTML contains general markup for paragraphs and lists and ToC is nothing else then list of chapter titles with links.

    But you can’t do this with more specific XML formats. For example imagine a simple invoice:


    <invoice>
      … invoice metadata here …
      <item>
      <description>Pilsner Beer</description>
      <qty>6</qty>
      <unitPrice>1.69</unitPrice>
      </item>
      <item>
      <description>Sausage</description>
      <qty>3</qty>
      <unitPrice>0.59</unitPrice>
      </item>
    </invoice>


    You probably would present it as a table.

    During document transformation you need add new row with table header, new column with subtotals and finally new row for total. But XML schema of invoice doesn’t allow you to specify such informations.

    This example clearly shows that there are classes of documents which must be transformed to some more general markup prior assignment of visual characteristics. XSL-FO is a such intermediate markup. I can imagine that you can also use XHTML+CSS for this purpose. But you are loosing big advantage of CSS then—your CSS rules are no more working against original markup, but against intermediate XHTML code.

    So my conclusion from this is: CSS can be used for formatting documents that are written in some very generic, free text oriented vocabulary like XHTML. For more rigidly structured XML formats CSS can be used, but it is no longer easier to use then XSL-FO.

    The difference in complexity is mainly caused by fact that all XHTML elements have some default formatting behaviour. Once you are not using XHTML, there is no big difference between:

    CSS:

    … { display: block;
        color: red;
        font-weight: bold; }


    and XSL-FO:

    <fo:block color=“red” font-weigh=“bold”>…</fo:block>


    It is just matter of syntax, because basic formatting model of XSL-FO and CSS is very similar and many XSL-FO properties were directly taken from CSS.

    But if there is a way how to handle my invoice example using only CSS without introducing another intermediate format, I would like to know.

    Jirka

    Copy & paste the code below to embed this comment.
  7. And, if you compare Prince with other standards-based software that produces printed materials (some XSL tools come to mind), Prince is a bargain.

    I use XSL-FO toolchain for print production, namely XEP from RenderX. It’s even little bit cheaper then Prince and feature list is more complex IMHO. For example hyphenation is done directly with XEP. Hyphentation patterns are weighted, because some places inside word are more appropriate as hyphenation point. This is something you can’t acheive with soft-hyphens placed into document. Other XSL-FO implementations offer similar functionality for similar price.

    But it is good to have more competition on the XML formatting market.

    Copy & paste the code below to embed this comment.
  8. If I understand correctly your position, you are advocating CSS over XSL-FO here because CSS syntax is easier and you are assigning properties directly to elements from source XML document. I think that I can agree with your position here”¦ but only as long you are using CSS with some general, document oriented XML format like XHTML or DocBook.

    Yes, this is an important part of the argument. CSS is well suited for structured document formats where the content comes roughly in the order of presentation. I believe content should be in this near-presentation state when it “crosses the wire”. Styling should be applied as close to the reader as possible, i.e. in the client.

    The other argument for using CSS in printing is that one can reuse many of the CSS style sheets written for the web.

    This example clearly shows that there are classes of documents which must be transformed to some more general markup prior assignment of visual characteristics.

    I agree completely. And CSS hasn’t been designed for that purpose. XSLT has, and is perfectly fine to use. It’s Turing-complete and can perform the computations needed to calculate your columns. My only problem with XSLT is that it has “Style” in its name.

    You told me previously, that ToC for your book was created with some script.

    Yes, we use Bert Bos’ “multitoc”:http://www.w3.org/Tools/HTML-XML-utils/ to generate a TOC. There have been proposals for how to handle this in CSS, but it’s probably too much of a transformation thing to make it into the CSS standards.


    bq. XSL-FO is a such intermediate markup. I can imagine that you can also use XHTML+CSS for this purpose. But you are loosing big advantage of CSS then—your CSS rules are no more working against original markup, but against intermediate XHTML code.

    I don’t see any problem with working against ‘intermediate code’. I think the XHTML code is what you should offer on the web since it uses well-known semantics. Your invoice example uses tag names not universally known. That’s fine as an internal format, but shouldn’t be published on the web. Also, I “don’t think XSL-FO should be published on the web”:http://people.opera.com/howcome/1999/foch.html—but that’s a different debate :-)

    Copy & paste the code below to embed this comment.
  9. Does anyone know how to set a background color to have alpha transparency using CSS?  I know this isn’t supported yet until CSS 3.0, but I believe some browsers already support the feature.

    Copy & paste the code below to embed this comment.
  10. I really like this demonstration of the developing capabilities of CSS. As I see it, there are situations where you want to use CSS + XHTML for multiple presentations ( views ), as when you want to print content that’s mainly aimed at the web browser.

    The single source idea is certainly a good one, and solutions like Apache Cocoon uses XSLT for transforming an originating XML document for structure to produce XHTML for the browser or mobile platform and XSL-FO for printing purposes. It can use FOP ( Formatting Objects Processor ) to get PDF for printing.

    The XSLT “having Style in it” is a bit confusing, but as you know ( HÃ¥kon was only expert from the start :-), XSL was introduced as “a style sheet for XML/XHTML”, to separate content from presentation. This was taken over by CSS and XSL took on another route. Modern browsers can take whatever domain specific XML document and render it using CSS styles.

    XSLT is the XSL for Transformation, using an XSLT engine to transform one document into another, possibility reordering or filtering out parts of the original content.

    XSL-FO became the styling part of the XSL standards, better used for printing purposes.

    I completely agree that XSL-FO wouldn’t be suitable for sending documents to a browser. Even if the browser could render the document, it’s far too verbose and not easily human readable, and View source has taught us so much.

    XSL-FO is complicated, and the possibility to use XML/XHTML + CSS to render print quality documents are good news.

    Copy & paste the code below to embed this comment.
  11. The people who are slagging this approach are completely missing the point, IMO. For me, the good part is not so much the use of CSS to format the book, but the use of XHTML to mark it up. The printing back-end can be ripped out and replaced with whatever works for you — FO (yuck), groff, LaTeX — or load up the HTML in M$ Word or OpenOffice, apply a stylesheet, and print.

    We can talk about DocBook until we’re blue in the face, but it’s such an incredibly complex DTD that most writers would give up before finishing the first chapter of the first document. DITA is a step in the right direction, but it’s probably still too complex for non-gearheads without a fair amount of motivation. Just about everyone knows enough XHTML to write a document, and there are plenty of tools — Free and commercial — that provide a pretty GUI for people who need it.

    As long as writers have to associate XML with complex large-scale publishing systems with six-figure deployment costs and five-figure support costs, it will be “eXcellent, Maybe Later” outside of Fortune 100 companies. HTML brought on-line publishing to the masses through a simple syntax; now it can bring single-source on-line/paper/PDF publishing to the masses as well.

    Copy & paste the code below to embed this comment.
  12. Others have pointed out the advertizing. I would also point out that HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended), so his opinion on XSL should be taken with a grain of salt.

    But now to Prince itself—

    It seems like a good way of bringing web pages to print. Someone mentioned its applicability to the blog-streaming world. Prince can fill this niche well. But going from web to print has, up till now, been carried out by printing according to a print stylesheet, not downloading a PDF. What Prince has over, for instance, Firefox is that the latter doesn’t support the CSS page model properly. When browsers come up to that functionality, Prince will be out of job.

    Mr Lie might answer that the niche isn’t web to casual print, it’s XHTML to books, with the XHTML not necessarily ever being hosted on a web server, and books like the kind we get from Framemaker or Quark. However, that’s a niche Prince, or more accurately an XHTML to PDF tool, can’t fill either. Maybe CSS is already up to the task of heavy formatting (and I doubt that), but XHTML isn’t up to the task of rich markup. XHTML is a limited tagset. You know it when you have to use span tags where in general XML you’d use an element. You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML. XHTML is suitable for the simplest books, but anything beyond that, like any random book you pick up in the college library, requires a more feature-rich markup language. XHTML for books could only be hobbyists’ fare, and maybe not even that, since hobbyists are far more likely to opt for WYSIWYG tools than textual stuff.

    In short, I don’t see Boom finding its niche among any of the possibilities. It’s overkill for simple web to print, and underpowered for professional typesetting.

    Copy & paste the code below to embed this comment.
  13. Great idea I will have to read the book to give a better review but excellent job on making it via this method.

    Copy & paste the code below to embed this comment.
  14. HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended)

    :) It’s the «FO» part I have a problem with. Formatting objects don’t have any semantics and should therefore not be represented in XML. It’s just a bunch of font tags. Which is why I “once wrote”:http://www.xml.com/pub/a/1999/05/xsl/xslconsidered_1.html?page=4


    bq. I can understand why overworked undergraduates think FONT is cool, but I’m very disappointed when a group of highly skilled adults tell kids to stop playing, form a committee – and then come out with a set of supercharged FONT tags

    Anyway, your main argument is not CSS vs. XSL-FO, it’s against the use of HTML as the basis for our markup. You write:

    XHTML isn’t up to the task of rich markup. XHTML is a limited tagset.

    Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back—without losing information.

    You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML.

    I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.

    Copy & paste the code below to embed this comment.
  15. Formatting objects don’t have any semantics and should therefore not be represented in XML.

    Why not? In the letters that make up the initialism XML, I don’t see anything that stands for Semantics. XML is just a toolchest for building any markup language you wish, and one of those happens to be the page layout language called XSL-FO. And it isn’t “just a bunch of font tags”? anymore than CSS is—I think we both know XSL-FO is to be generated from XSLT rather than written by hand, and when generated from an XSLT script it’s equivalent to a CSS stylesheet in separating content and presentation.

    Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back—without losing information.

    Doesn’t the use of a kluge indicate the inappropriateness of the format? And, um, talking about semantics, are you aware that class attributes are style directives, containing no more semantics than I or B or TT tags? You’re like proposing the use of such constructs as divs with style attributes instead of H1/H2/H3 tags, which Ian Hixie complained about on his blog, but on steroids!

    This isn’t the right tool for the job. Anyone who so much prefers CSS to XSL can use CSS to style XML, and that would be better. I shudder to the thought of using HTML, with kluges and all, for preparing a college grammar book. But even CSS isn’t wholly satisfactory—you’ve had to write an external script to generate the TOC, while you can do it with XSLT, and then style with FO in the same gulp. Looking at it that way, the XSL approach could be said to be more deskilling than the HTML/CSS one.

    I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.

    But OpenDocument is for office applications, not for browsers. You seem to be very Web-centric. Additionally, even if ODF were based on HTML, the type of HTML that office applications generate approaches the elegance of Frontpage’s output.

    Copy & paste the code below to embed this comment.
  16. Personaly I am more familiar with css than with Microsoft word. Now I can type my esseys in Dreamweaver :)

    Copy & paste the code below to embed this comment.
  17. How novel, using a language based on SGML (Standardised Genral Markup Language) to make a printer paint a page instead of a browser.

    Ah yes – my first program documentation (1983) was generated using IBM DCF on a 3090 mainframe. And our documents had strange markup like <h1>,  etc etc – and it had ‘stylesheets’ to add ornamentation (other than bold or underline) etc etc when IBM released its first advanced function printers (3800-3 and 3820s).

    IIRCit was a superset(or was that a subset?) of SGML. It even generated tocs, indexes etc etc

    Amazing its still around … http://www.printers.ibm.com/internet/wwsites.nsf/vwwebpublished/dcfhome_z_ww

    Kim Mihaly

    Copy & paste the code below to embed this comment.
  18. Just give me the needed CSS print functionality, and a web browser that supports it.  Then all I’ll need to do is File -> Print to Postscript/PDF.  Even better:

    % firefox—print http://www.alistapart.com/print/me—output ala.pdf

    Copy & paste the code below to embed this comment.
  19. TeX is different thing. I wrote a number of papers, and even a whole book using it. These days I use XSL and produce PDF. This is XML based and more flexible. Still I personally like TeX much more. But PDF and TeX are about actually printing content in high quality.

    The point of this article seems to be, that using XHTML/CSS can be used to publish a real book. It’s a prove-of-concept by the persons who created CSS — and that’s nice.

    Printing web-pages is always a pain. And I hope that’s what this is about. Printer-friendly pages are never really what the claim to be. XHTM/CSS can not compete with PDF. But it can complement it. And make it easier to import web-pages into publishing systems.

    Copy & paste the code below to embed this comment.
  20. Very nice paper, thanks. And Prince may be just the tool I need for a “print-on-demand” adjunct to my (free) ebooks site (http://etext.library.adelaide.edu.au)

    I’ve been tinkering for a long time with ebooks—mostly public domain novels and essays (which it is true to say are quite simple compared to technical works), using HTML. My main interest has been in formatting books for the web rather than print, but there’s always that lingering, “wouldn’t it be nice” feeling that it would be great to be able to print them too, if desired. And I have had some limited success in that direction using rudimentary CSS (see the FAQ), which produces a nice result if you don’t mind A4 and don’t much care about page numbering etc.

    But it is very pleasing to see someone pushing the envelope to see what can be done with CSS. Now, if only my browsers supported all those features, I’d be very happy.

    Of course, with or without Prince, there’s no reason I should not use the CSS3 features, even if they are not currently supported. They will be one day, and then my ebooks will be ready and waiting!

    (And I’ve heard all the “wrong tool” arguments from the ebook crowd already, thanks! LaTex, Docbook, XSL, yadda yadda. Most of them are still producing ugly results whatever the tool.)

    Copy & paste the code below to embed this comment.
  21. I opened up this article mainly because I’ve been looking for a means to create invoices and proposals quickly, easily, and with some customization.

    I’ve always hated opening up Indesign or Msoft Word just to fudge a couple variables and print. I’m on a slow laptop, and it can seem like forever to load up these bloated apps, only to close them after seconds of use.

    I love the possibility that I can create a printed page template, and only have to open a simple text editor to edit and then send it to a browser (which is always on!) and hit print.

    I know everyone’s been knocking the book format application, but I’m very excited about other possible applications.

    Copy & paste the code below to embed this comment.
  22. I am righting a docbook “book”. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.

    Thanks,
    Raj

    Copy & paste the code below to embed this comment.
  23. I am righting a docbook “book”?. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.

    Prince ships with a CSS style sheet that does rudimentary styling of DocBook files. It’s called “docbook.css” and it should be a reasonable starting point.

     

    Copy & paste the code below to embed this comment.
  24. thanks

    Copy & paste the code below to embed this comment.
  25. While the article is great, i don’t get the point.
    I tried printing you HTML file via the latest version of firefox and ie and the footers (page numbers) do not appear.
    if we need to have another program to convert to pdf to then be able to print, what is the use? why cannot CSS just work with browsers for printing?
    I am trying to create documents that can be visible on the web and then when somene wants to print them, have a page number and a footer appear and your solution does not work, why is the question? since it is CSS and XHTML, there should not be a problem.
    i have to revert to the idea of making 2 style scheets one for print and one for screen which is also a bad solution.
    Unless i missed something

    Copy & paste the code below to embed this comment.
  26. I don’t think this is ready for any serious use. Consider for example
    <ul>
    <li>That you can’t have a ul inside a p,</li>
    <li>or any block level element</li>
    </ul>
    which really breaks orphan control code. There is simply no way that a client figure out like that a sentence like this belongs to the paragraph above, and isn’t a paragraph on its own.

    (La)TeX has rather advanced algorithms to do this right, since it seriously breaks text flow. It may not be the kind of thing that people point their fingers at, but if you ask them, they tell you that your text was “heavy”. Any typographer worth his salt, and a serious book publisher will give this high priority.

    Don’t get me wrong: I would really like to see a LaTeX replacement, as the HTML tools are much more widespread than LaTeX, and LaTeX is often a pain to write. However, it is important to realise that there are many good reasons why people use it for high-quality work, and that is not going to change before certain flaws in the original design of HTML is corrected, and I know it breaks your heart, HÃ¥kon, but that means backwards-incompatible changes must be made to HTML.

    Also, it means that we have to put some effort into high-quality printing in the UAs, and I don’t see us doing that…?

    Copy & paste the code below to embed this comment.
  27. I like the idea, since we already have the API documentation of our software generated as HTML. It must not be perfect for printing (I would prefer Latex over XSL-FO/Docbook, but that is not important here).

    What I am really missing is the possibility to create a reference to a numbered element: E.g. images are numbered with a chapter prefix and a counter for the image, i.e. a caption like “Fig. 2.3” for the third image in the second chapter.

    &&/%%$$
      &&&%%&%
      %%&&&&&&&
      {text-align:center}Fig. 1.1


    This could be easily done using counters. But now I would like to create a reference in the text to this image like, e.g. “see Fig. 2.3” where the “Fig 2.3” is automatically generated. Is this possible?

    bla bla bla (see Fig 1.1) bla bla bla

    Copy & paste the code below to embed this comment.
  28. I’m involved in a project that requires a “clean” print option, but none of the developers have specific expertise in printing (nicely) from XHTML. Frankly, we have been dreading the day when we would have to buckle down an learn an unfamiliar print technology. After stumbling on this thread I picked up the free Prince demo today. Within an hour, I was outputting reasonably complex pdf layouts with tables borders, backgrounds and images (oooooooh… ahhhhhhh…) and without touching the original content. The CSS2 implementation is refreshingly solid (this the week that IE7 and FF2 were released). Now, my team is genuinely excited about printing. Offset press publishing might be a stretch, but Prince proves that more modest goals are achievable with a fraction of the effort.

    Copy & paste the code below to embed this comment.