Printing a Book with CSS: Boom!

by Bert Bos, Håkon Wium Lie

68 Reader Comments

Back to the Article
  1. This article reminded me of Emeril AND showed me a great new way for displaying book content at the same time. That’s hard to do. Congrats.

    Copy & paste the code below to embed this comment.
  2. Blatantly ignoring the point of the article, as a pre-press technician for a local printer myself, I would be crestfallen to receive a PDF such as the sample.pdf included with the article.

    The pages have no corner trims and no bleed; the black text separates into CMYK leaving the black at only 90%.  We would not be happy to go to press with this file, which highlights the pitfalls of using unsuitable tools for the job.  I wonder if QuarkXPress will reliably import HTML and CSS? ;)

    Now, if you’ll excuse me, I must find a suitable orifice for this 128pp Microsoft Word booklet.

    Copy & paste the code below to embed this comment.
  3. Its nice that (X)HTML/CSS can be clobbered to produce an actual book but it really did make me wince that the ‘hacks’ you were using were already present in something that was designed specifically for this type of job.

    Pretty much every is here:
    http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf

    (La)TeX is expected when sending material to the printers, and if you ever have to publish work for a journal you must use LaTeX….

    Sorry but (X)HTML/CSS really is not up to the job.  Don’t re-invent the wheel.

    Copy & paste the code below to embed this comment.
  4. I think it would be better to use an xml file with xhtml markup tags and a xsl fo stylesheet to make the pdf. Then u can use sans serif fonts for online and serif fonts for pdf files,swap images between 72 dpi and 300/600. Some tools also handle cmyk and metatags. Maybe in a near future we can even send along jdf info. Imo html and css is just to limited for decent use. I also think it is wrong to create new tags with print in mind. That way we end up with a lot of xml-sollutions that all do the same thing.

    Copy & paste the code below to embed this comment.
  5. Sorry but (X)HTML/CSS really is not up to the job. Don’t re-invent the wheel.

    It depends on the purpose of your document. If it is a document that is primarily to be distributed and read online, but with the option for people to print it if they wish, this seems like a perfect combination.

    LaTeX is a relatively specialised product requiring non-standard tools – far more so than HTML/CSS – so for small users it may not be practical to use it.

    Yes, for a major print job, a dedicated print tool such as PDF or LaTeX would be better – but this would then be unsuitable for screen reading. So if the document is not worth doing twice – once for paper and once for screen – HTML/CSS is probably the best way of producing a single format that works well in both media … and others (eg Braille, screen reader).

    Copy & paste the code below to embed this comment.
  6. The TABLE element has a CAPTION element, but support is spotty.

    Advocating generic (X)HTML (with classes) over existing semantic elements demands more than a casual dismissal. I’d really like to see statements like this clarified (a link would suffice). I’m sure there are good reasons. Share them :)

    Copy & paste the code below to embed this comment.
  7. …But still it’s a great use of css to print webpages.

    It shouldn’t be used as press-ready document. Of course not, XHTML was not designed for that, but to give documents structure. Yet, if you would like to make articles in your site available for download, this is a great way of “polishing” them into nice looking stuff :)

    It may not be the “best” way to print a book. But it is a great way of printing webpages.

    Copy & paste the code below to embed this comment.
  8. Really nice post, I think not every body is skilled enough to use it now, but maybe in the future.

    HTML and CSS, two of our favorite acronyms bq.
    I always thought HTML and CSS are abbreviations, can any body tell me the difference between acronyms and abbreviations ?

    Copy & paste the code below to embed this comment.
  9. This is a great idea, one I’ve certainly never thought of using, but I don’t think it’s ready for use on large projects. Companies like Quark and Adobe shell out big bucks to produce programs that do the same thing, why not leave it to them? As others have mentioned, though, I hope these programs will have support for importing X/HTML and CSS in the future. The possibilities would be virtually limitless.

    this raises one somewhat unrelated question though. How do you think users would react if the web began acting like print media, with numbered pages, simplified layouts, etc? I think it would simplify a lot of the processes average web users go through and it would be a little more calming to firt time computer users used to looking at newspapers and magazines.

    Copy & paste the code below to embed this comment.
  10. I don’t think that at this point CSS is mature enough to fulfil a truly professional role in book-printing.

    I do, however, encourage any attempt to make it mature, since anything that aims to lessen the gap between print and web is a Good Thing.

    I would enjoy articles here on ALA that delve into the nooks and crannies of current CSS-printability, get out of it what they can and in turn bring shortcomings to light, because CSS-print is as dark a subject today as CSS itself was around 1998.

    But it is a great way of printing webpages.

    And hopefully, a great way of webbing print pages. ;)

    Copy & paste the code below to embed this comment.
  11. A response to Abdelrahman Osama’s post about abbreviations and acronyms. I believe you are right, and as far as I get it – and here it goes – the difference is that an acronym is an pronounceable abbreviation (an abbreviation which forms out a word), whereas an abbreviation is just one or more letters taken from one or several words.

    Acronyms: MODEM, LASER
    Abbreviations: CSS, HTML, IBM

    And now to the article =) :
    As said before by several persons, I do too believe that HTML+CSS (yet ;)) can solve all the problems, but this is a great solution for printable (web) articles, tutorials, and the like that do span several pages. Including “books” that’s been published to the web.

    And it’s fun to watch CSS maturing and showing off some muscles in different terretories.

    Copy & paste the code below to embed this comment.
  12. I think that the people who have criticised XHTML / CSS as not being ready for styling a printed book have a valid point. This article shows that XHTML / CSS are not yet up there with dedicated tools such as LaTeX. But the sample markup and style sheet may well be the best-written print style ever written to date!

    I’m guessing that almost all of the CSS magic used in the sample is already supported by cutting-edge browsers such as Firefox. If browsers were able to output to print as well as Prince can – and they’re really not that far off – this would basically eliminate the barrier between web and print altogether. My biggest concern is that they had to use so much print-specific markup, such as specialised classes, in order to achieve the desired output.

    Copy & paste the code below to embed this comment.
  13. No, this technique is nowhere close to replacing an experienced print designer; nor is it going to replace proper pre-press preparation for “real”? print jobs. I see it, however, giving opportunities in our blog-happy nation to provide automated PDF versions of constantly updated content. One conceivably could print out a customized date range of some news or blog-oriented site with headers, an index/TOC and page numbers. It would make for catching up on favorite Websites that much easier when I’m riding the train in to work.

    Copy & paste the code below to embed this comment.
  14. While I agree with the many who have commented before I have (that CSS really isn’t capable of all the details needed to create an arbitrarily styled book), I found this to be a great primer on getting your web pages to the print medium. It’s much more accessible than the W3C guidelines.

    Thanks for it!

    Copy & paste the code below to embed this comment.
  15. If you want to make accessible PDFs for online reading, you need a tool that, unlike the current version of Prince, converts your HTML markup to PDF tags. I’ve seen a lot of interesting PDF generation tools that fail to take advantage of PDF tags. Maybe the feature is hard to implement, but until Prince produces tagged PDFs, I’ll have to stick with Word and Acrobat.

    Copy & paste the code below to embed this comment.
  16. While I think the original focus of the article is to show that HTML and CSS can use the CSS3 Paged Media Module to prepare an HTML document for printed material, I think using it solely for books is mistaken.

    Also, format editors like LaTex are proprietary and will, in the near future be marginalized by free (as in beer) tools that will use common semantics and transforms.

    I think the two authors show us that it is possible to created printable books using HTML and XML (using Prince tool as a the transform). However, I believe the smarter application lies in the Print command within our browsers. Heavily annotated webpages, especially health care provider pharma, would best be suited for this CSS3 module as a linkable “Print this page” URL on the intended page.

    I myself have created a CSS module for print jobs using the Firefox add-on, Greasemonkey. Whereas the NY Times prints incredibly small font pages, the Greasemonkey CSS module allows me to format the articles into a much more useable printed document format.

    Copy & paste the code below to embed this comment.
  17. If I remember correctly, Joe Clark, in the writing of his book, Building Accessible Web Sites, did something like this: he wrote using HTML and produced his book from the HTML content although I don’t think he (or his publisher) took the output directly from HTML. Furthermore, he had a dual purpose in mind, to post the book to his web site so for his purposes, writing the book using HTML was perfectly fine.

    This is certainly an interesting idea and learning about and using the paged media CSS properties alone is well worth this article but, like some others, I am not convinced that HTML is the right source for all books as others have also stated.

    Copy & paste the code below to embed this comment.
  18. If I remember correctly, Joe Clark, in the writing of his book, Building Accessible Web Sites, did something like this: he wrote using HTML and produced his book from the HTML content although I don’t think he (or his publisher) took the output directly from HTML. Furthermore, he had a dual purpose in mind, to post the book to his web site so for his purposes, writing the book using HTML was perfectly fine.

    This is certainly an interesting idea and learning about and using the paged media CSS properties alone is well worth this article but, like some others, I am not convinced that HTML is the right source for all books as others have also stated.

    Copy & paste the code below to embed this comment.
  19. After I clicked submit, I got called away from my desk. When I returned, I didn’t see my comment (Opera 9TP) so I clicked again (several minutes had passed, short memory).

    Copy & paste the code below to embed this comment.
  20. Was I the only person suprised to see how poorly Firefox (even the beta) handles the print stylesheet? Even IE handles it better.

    Copy & paste the code below to embed this comment.
  21. “Docbook”:http://docbook.org/ is a better tool for this job. From their “FAQ”:http://www.dpawson.co.uk/docbook/reference.html#d16e16 :

    “DocBook provides a system for writing structured documents using SGML or XML. It is particularly well-suited to books and papers about computer hardware and software, though it is by no means limited to them.
    In short, DocBook is an easy-to-understand and widely used DTD. Dozens of organizations use DocBook for millions of pages of documentation, in various print and online formats, worldwide.”

    By marking a book up in Docbook XML, you can export your book to any suitable format (via an XSLT transformation). There are already tools to convert Docbook documents to html, xhtml, PDF, and more. If writing/marking up a book is your goal, you would be better off using Docbook than reinventing the wheel and writing your own microformat.

    Copy & paste the code below to embed this comment.
  22. Prince costs $350, works from the command line on Mac and Linux, and doesn’t do auto-hyphenating. I own Lie’s book; a whole lot of manual hyphenating must have gone into it. It would to take less time to reformat in free LaTeX than to manually hypenate an entire book.

    Copy & paste the code below to embed this comment.
  23. Also, format editors like LaTex are proprietary and will, in the near future be marginalized by free (as in beer) tools that will use common semantics and transforms.

    In response to Thom Wiley’s views on LaTeX, it should be noted that LaTeX is free (as in beer) and also as in liberty (it’s under the GNU license). It’s a template extension of an underlying language—TeX—that’s got its own markup defined, multiple viewers, editors, and tools, and it’s been around for more than a decade (TeX began to be formally defined around 1989). You could say that LaTeX is what CSS/(X)HTML is to SGML.

    CSS is a great tool, but it’s not really made for the print job. There’s way too many things that need to be considered that CSS simply doesn’t give you control over in real printing/publishing (Fine typesetting and typography, anyone?) It would be a nice way of getting long sets of texts off the web and onto paper, but it’d still look better and be more readable if you threw an HTML page through a text processor (sed or otherwise) and converted it into a LaTeX document, then printed it.

    LaTeX isn’t as obscure as one might think. While it’s certainly not a desktop publishing type solution, it’s been around for long enough that it’s a very widely used method of typesetting academic papers and documents.

    Maybe, if CSS gained a slew of options that’d allow authors to set and define every aspect that was needed for print, it’d succeed in this field. But for now, it’s a somewhat half baked solution, and its options need a whole lot more implementation (which might never get done) to really fit the bill, even for the smallest self-read page jobs.

    That all being said, it’s a very neat concept.

    Copy & paste the code below to embed this comment.
  24. I’m not planning on writing any books with HTML/CSS, but there’s a lot here that can be put into my next print stylesheet.

    One thing though, what’s with the first sub-heading: “Print vs. paper”? Aren’t they the same thing? Shouldn’t it be “Print vs. pixel” or “Screen vs. paper”?

    Copy & paste the code below to embed this comment.
  25. Was I the only person suprised to see how poorly Firefox (even the beta) handles the print stylesheet? Even IE handles it better.

    What upsets me is that in Opera (even the latest version), ticking the “Print background colour” option appears to include the page background as defined in the screen stylesheet, not the (usually white) one in the print stylesheet. And then the background screen colour clashes with the foreground print colours as these are different from the screen colours – very upsetting!

    And it’s very irritating that you can’t allow eg <td> background colours without the page background. I don’t want to print a page of blue, but small blocks of background colour are fine.

    Copy & paste the code below to embed this comment.
  26. Very nice, very very nice, I always wondered if it was possible to print in a decent way using XHTML and CSS ;) and I think it’s great!
    In this tutorial I have seen a large use of unknown-to-me syntax, where can I find a full guide to CSS syntax (apart from w3c specifications?).

    Thanks
    Tommaso Urli

    Copy & paste the code below to embed this comment.
  27. What upsets me is that in Opera (even the latest version), ticking the “Print background colour”? option appears to include the page background as defined in the screen stylesheet, not the (usually white) one in the print stylesheet.

    Next preview of Opera 9 will probably solve that. See post about that “here”:http://my.opera.com/community/forums/topic.dml?id=102299 .

    Copy & paste the code below to embed this comment.
  28. I pretty much agree with the consensus that HTML+CSS are not really tools for printing books. CSS isn’t really designed to turn HTML or XML into print media as much as it’s geared toward styling HTML or XML into something suitable to be printed. HTML doesn’t even have enough semantics to properly markup most college-level papers much less semantics for proper printed media. HTML is lightweight and feature poor.

    One side note: I think the use of ‘rel’ and ‘rev’ attributes should have been used with ‘link’ and ‘a’ elements in the HTML.

    Copy & paste the code below to embed this comment.
  29. What about mentioning that this is written by a member of the Prince team, touting a product that costs U$349 as the only alternative to produce output from the “microformat” proposed in the article?

    Not that the article itself is bad, but a disclaimer would be nice. Or is it there and I read too fast?

    Copy & paste the code below to embed this comment.
  30. Related to what Jared Hales said, I think it would be interesting if someone were to write the necessary XSL stylesheets to transform DocBook to boom! Last time I checked, the stylesheets for XHTML included in the standard distribution outputted non-semantic XHTML 1.0 Transitional. Outputting to boom! instead would, I think, be a big improvement for those looking to get their DocBook content to web.

    Copy & paste the code below to embed this comment.
  31. I am a web guy but I have exported to Quark Express without converting the colours to CMYK to a company who have digital printers. I presume they simply convert it automatically?

    Copy & paste the code below to embed this comment.
  32. I think it would be better to use an xml file with xhtml markup tags and a xsl fo stylesheet to make the pdf.

    I don’t. I challenge you to write the XSL style sheet that generates a similar PDF file from our XHTML source file. It’s certainly possible, but much more troublesome. For more arguments along this line, see “Printing XML: Why CSS Is Better than XSL”:http://www.xml.com/pub/a/2005/01/19/print.html

     

    Copy & paste the code below to embed this comment.
  33. this is written by a member of the Prince team … a disclaimer would be nice. Or is it there and I read too fast?

    There was a disclosure in our original text. Somehow it disappeared in the publication process, but it’s now back in the bio section. Thanks for notifying us.

    Copy & paste the code below to embed this comment.
  34. If writing/marking up a book is your goal, you would be better off using Docbook than reinventing the wheel and writing your own microformat.

    Docbook is nice, and you can quite easily adapt the sample CSS style sheet to work with Docbook. The main benefit of using the Boom! microformat is that you can display the document in a billion browsers… and also print it!

    Copy & paste the code below to embed this comment.
  35. This is a great idea, one I’ve certainly never thought of using, but I don’t think it’s ready for use on large projects. Companies like Quark and Adobe shell out big bucks to produce programs that do the same thing, why not leave it to them?

    Why leave it to them if we can achieve the same using simple web standards? I think XML/HTML + CSS is ready for large projects and the article shows how.

    A batch-formatting approach (which both CSS and XSL uses) cannot compete with a human designer in (say) glamour magazines, but most published books can be produced with simple web standards. Our book is relatively advanced compared to (say) a novel.

    Copy & paste the code below to embed this comment.
  36. Prince costs $350, works from the command line on Mac and Linux, and doesn’t do auto-hyphenating. I own Lie’s book; a whole lot of manual hyphenating must have gone into it.

    Actually, we used a perl script which added soft hyphens (&shy;) in the right places. Given soft hyphens entities, Prince will do the right thing. It would be nice for Prince to fully automate the process though, I know it’s on the todo list. (That’s one of the great perks of being on the YesLogic board; I get to influence the todo list :-) Email me if you need the script.

    Copy & paste the code below to embed this comment.
  37. One thing though, what’s with the first sub-heading: “Print vs. paper”?? Aren’t they the same thing? Shouldn’t it be “Print vs. pixel”? or “Screen vs. paper”??

    My goodness, yes! Blush. Sloppy authors, I’d say. Fixed. Thanks.

    Copy & paste the code below to embed this comment.
  38. Very nice, very very nice, I always wondered if it was possible to print in a decent way using XHTML and CSS ;) and I think it’s great! In this tutorial I have seen a large use of unknown-to-me syntax, where can I find a full guide to CSS syntax (apart from w3c specifications?).

    Thanks for your kind words. Alas, some of the syntax isn’t even described in W3C specifications yet. The article you were reading explains some of the more advanced extensions, and any book on CSS will tell you the basics. I hope that most of the functionality will be described in W3C Working Drafts in the next 6 months. And some of them are even quite readable.

    Copy & paste the code below to embed this comment.
  39. Advocating generic (X)HTML (with classes) over [the caption element] demands more than a casual dismissal. I’d really like to see statements like this clarified (a link would suffice). I’m sure there are good reasons. Share them :)

    Ideally, I would like to use the caption element; I believe in HTML semantics. The element is troublesome, however, for two reasons. First, it appears inside the table element while you typically want it to be presented outside of the table. Second, various browsers have tried to add support for the caption element and its attributes. Some have “failed”:http://www.blooberry.com/indexdot/html/tagpages/c/caption.htm , and as a result you enter a minefield when trying to use it. At least, that’s how I felt when I tried. Maybe I didn’t try hard enough. I’d be happy to see you find a way to achieve the same formatting by using the caption element.

    Copy & paste the code below to embed this comment.
  40. I don’t think that at this point CSS is mature enough to fulfil a truly professional role in book-printing.

    Perhaps not. Certainly, one can point to features that are missing (e.g. automatic line numbers). However, most books don’t use any such advanced features. I’d estimate that CSS and Prince could produce 90% of all books published in Latin scripts. I’m happy to leave the remaining 10% to a guy and his Quark.

    Copy & paste the code below to embed this comment.
  41. The pages have no corner trims and no bleed; the black text separates into CMYK leaving the black at only 90%. We would not be happy to go to press with this file

    Sure. The sample file is optimized for screen use rather than print. Thanks for your partial list of requirements for PDF Prince—feel free to email me your complete list :-)

    Copy & paste the code below to embed this comment.
  42. In documentation management and production, we’ve been trying to get away from the conversion game for years. Tools that convert one format of document or text into another abound. If you’re familiar with help authoring tools, think Doc-to-Help, for example.

    Although you’re using style sheets and proper markup, you’re still trying to convert one format (HTML) into another (Print/PDF). In the long run, this is not the way to go. And your PDF will suffer as a result. Try to build bookmarks from that PDF

    True single-sourcing demands that content and format be separated completely. And there are tools out there that do that splendidly, where content is stored in unformatted data blocks and published at run-time using CSS (for HTML/XHTML/XML) or MS Word DOT templates for print. Incidentally, an MS Word DOT file is to MS Word precisely what CSS is to HTML. It only controls a lot more (like headers and footers) a lot better for print – because that’s what it was designed to do.

    Copy & paste the code below to embed this comment.
  43. Although you’re using style sheets and proper markup, you’re still trying to convert one format (HTML) into another (Print/PDF). In the long run, this is not the way to go.

    In general, I agree with your views that single source is good and document conversion is bad. This is why CSS has the concept of media types; the same source document (in, say, HTML) should be usable on all sorts of devices. The sample document has style sheets for print and screen and support for other media types can easily be added if you want more control of the final presentation.

    The only reason for converting our files to PDF is to send it to the printer. They only accept PDF files, and, as such, PDF works quite well.  Until printers accept HTML/CSS natively, using PDF is good solution which doesn’t change the fact that we have a single source.

    Copy & paste the code below to embed this comment.
  44. I have been trying to solve this web-to-print quandary for a project over the past month, and after trying to use RPDF with Ruby on Rails, I am retreating to a simply printable, CSS-styled web page.

    So, this article really got me excited! Then after reading it and hopping over to princexml.com to get Prince, I feel like this article is an advert for the $350 Prince software.

    It bums me out to read an article on my old favorite ALA, which is all about standards, open-ness & accessibility on the web, only to find out that I have to buy expensive proprietary software to put the knowledge to use.

    HÃ¥kon, I really appreciate you for all of the standards goodness you bring to us, but this is a TOTAL BUMMER.

    Copy & paste the code below to embed this comment.
  45. I feel like this article is an advert for the $350 Prince software.

    This article is about open standards, namely HTML and CSS. At the time of writing, Prince is the only software that can process our code. Not   referring to it would have been negligent. We hope other software will start supporting our code — this was one of the reasons for writing the article in the first place.

    Did you notice that you can download and use a fully functional demo version of Prince for free? You can also publish academic works without buying a license. And, if you compare Prince with other standards-based software that produces printed materials (some XSL tools come to mind), Prince is a bargain.

    Copy & paste the code below to embed this comment.
  46. HÃ¥kon, I think that you will agree with me, that if you want to present some content stored in XML to user, you often need to do two things: (1) transform your document and (2) assign visual characteristic to individual components of the document.

    During transformation you can do things like building a table of contents, numbering chapters and figures or adding some fixed content like a word “Figure” in the front of each figure name. CSS is able to do some basic modification to a document like adding figure numbers. More complex transformations like building of ToC must be created by something more powerful. My tool of choice for this task will be XSLT, but you can use any language which is able to read, manipulate and store XML document. You told me previously, that ToC for your book was created with some script.

    After (well, yes in CSS this is not after but at the same time) document is transformed, visual characteristics like fonts, colors, spacing, margins, etc. are applied to elements in the source document. If I understand correctly your position, you are advocating CSS over XSL-FO here because CSS syntax is easier and you are assigning properties directly to elements from source XML document. I think that I can agree with your position here… but only as long you are using CSS with some general, document oriented XML format like XHTML or DocBook. Let me explain.

    If you have book in XHTML you can easily add ToC into this document, because XHTML contains general markup for paragraphs and lists and ToC is nothing else then list of chapter titles with links.

    But you can’t do this with more specific XML formats. For example imagine a simple invoice:


    <invoice>
      … invoice metadata here …
      <item>
      <description>Pilsner Beer</description>
      <qty>6</qty>
      <unitPrice>1.69</unitPrice>
      </item>
      <item>
      <description>Sausage</description>
      <qty>3</qty>
      <unitPrice>0.59</unitPrice>
      </item>
    </invoice>


    You probably would present it as a table.

    During document transformation you need add new row with table header, new column with subtotals and finally new row for total. But XML schema of invoice doesn’t allow you to specify such informations.

    This example clearly shows that there are classes of documents which must be transformed to some more general markup prior assignment of visual characteristics. XSL-FO is a such intermediate markup. I can imagine that you can also use XHTML+CSS for this purpose. But you are loosing big advantage of CSS then—your CSS rules are no more working against original markup, but against intermediate XHTML code.

    So my conclusion from this is: CSS can be used for formatting documents that are written in some very generic, free text oriented vocabulary like XHTML. For more rigidly structured XML formats CSS can be used, but it is no longer easier to use then XSL-FO.

    The difference in complexity is mainly caused by fact that all XHTML elements have some default formatting behaviour. Once you are not using XHTML, there is no big difference between:

    CSS:

    … { display: block;
        color: red;
        font-weight: bold; }


    and XSL-FO:

    <fo:block color=“red” font-weigh=“bold”>…</fo:block>


    It is just matter of syntax, because basic formatting model of XSL-FO and CSS is very similar and many XSL-FO properties were directly taken from CSS.

    But if there is a way how to handle my invoice example using only CSS without introducing another intermediate format, I would like to know.

    Jirka

    Copy & paste the code below to embed this comment.
  47. And, if you compare Prince with other standards-based software that produces printed materials (some XSL tools come to mind), Prince is a bargain.

    I use XSL-FO toolchain for print production, namely XEP from RenderX. It’s even little bit cheaper then Prince and feature list is more complex IMHO. For example hyphenation is done directly with XEP. Hyphentation patterns are weighted, because some places inside word are more appropriate as hyphenation point. This is something you can’t acheive with soft-hyphens placed into document. Other XSL-FO implementations offer similar functionality for similar price.

    But it is good to have more competition on the XML formatting market.

    Copy & paste the code below to embed this comment.
  48. If I understand correctly your position, you are advocating CSS over XSL-FO here because CSS syntax is easier and you are assigning properties directly to elements from source XML document. I think that I can agree with your position here”¦ but only as long you are using CSS with some general, document oriented XML format like XHTML or DocBook.

    Yes, this is an important part of the argument. CSS is well suited for structured document formats where the content comes roughly in the order of presentation. I believe content should be in this near-presentation state when it “crosses the wire”. Styling should be applied as close to the reader as possible, i.e. in the client.

    The other argument for using CSS in printing is that one can reuse many of the CSS style sheets written for the web.

    This example clearly shows that there are classes of documents which must be transformed to some more general markup prior assignment of visual characteristics.

    I agree completely. And CSS hasn’t been designed for that purpose. XSLT has, and is perfectly fine to use. It’s Turing-complete and can perform the computations needed to calculate your columns. My only problem with XSLT is that it has “Style” in its name.

    You told me previously, that ToC for your book was created with some script.

    Yes, we use Bert Bos’ “multitoc”:http://www.w3.org/Tools/HTML-XML-utils/ to generate a TOC. There have been proposals for how to handle this in CSS, but it’s probably too much of a transformation thing to make it into the CSS standards.


    bq. XSL-FO is a such intermediate markup. I can imagine that you can also use XHTML+CSS for this purpose. But you are loosing big advantage of CSS then—your CSS rules are no more working against original markup, but against intermediate XHTML code.

    I don’t see any problem with working against ‘intermediate code’. I think the XHTML code is what you should offer on the web since it uses well-known semantics. Your invoice example uses tag names not universally known. That’s fine as an internal format, but shouldn’t be published on the web. Also, I “don’t think XSL-FO should be published on the web”:http://people.opera.com/howcome/1999/foch.html—but that’s a different debate :-)

    Copy & paste the code below to embed this comment.
  49. Does anyone know how to set a background color to have alpha transparency using CSS?  I know this isn’t supported yet until CSS 3.0, but I believe some browsers already support the feature.

    Copy & paste the code below to embed this comment.
  50. I really like this demonstration of the developing capabilities of CSS. As I see it, there are situations where you want to use CSS + XHTML for multiple presentations ( views ), as when you want to print content that’s mainly aimed at the web browser.

    The single source idea is certainly a good one, and solutions like Apache Cocoon uses XSLT for transforming an originating XML document for structure to produce XHTML for the browser or mobile platform and XSL-FO for printing purposes. It can use FOP ( Formatting Objects Processor ) to get PDF for printing.

    The XSLT “having Style in it” is a bit confusing, but as you know ( HÃ¥kon was only expert from the start :-), XSL was introduced as “a style sheet for XML/XHTML”, to separate content from presentation. This was taken over by CSS and XSL took on another route. Modern browsers can take whatever domain specific XML document and render it using CSS styles.

    XSLT is the XSL for Transformation, using an XSLT engine to transform one document into another, possibility reordering or filtering out parts of the original content.

    XSL-FO became the styling part of the XSL standards, better used for printing purposes.

    I completely agree that XSL-FO wouldn’t be suitable for sending documents to a browser. Even if the browser could render the document, it’s far too verbose and not easily human readable, and View source has taught us so much.

    XSL-FO is complicated, and the possibility to use XML/XHTML + CSS to render print quality documents are good news.

    Copy & paste the code below to embed this comment.
  51. The people who are slagging this approach are completely missing the point, IMO. For me, the good part is not so much the use of CSS to format the book, but the use of XHTML to mark it up. The printing back-end can be ripped out and replaced with whatever works for you — FO (yuck), groff, LaTeX — or load up the HTML in M$ Word or OpenOffice, apply a stylesheet, and print.

    We can talk about DocBook until we’re blue in the face, but it’s such an incredibly complex DTD that most writers would give up before finishing the first chapter of the first document. DITA is a step in the right direction, but it’s probably still too complex for non-gearheads without a fair amount of motivation. Just about everyone knows enough XHTML to write a document, and there are plenty of tools — Free and commercial — that provide a pretty GUI for people who need it.

    As long as writers have to associate XML with complex large-scale publishing systems with six-figure deployment costs and five-figure support costs, it will be “eXcellent, Maybe Later” outside of Fortune 100 companies. HTML brought on-line publishing to the masses through a simple syntax; now it can bring single-source on-line/paper/PDF publishing to the masses as well.

    Copy & paste the code below to embed this comment.
  52. Others have pointed out the advertizing. I would also point out that HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended), so his opinion on XSL should be taken with a grain of salt.

    But now to Prince itself—

    It seems like a good way of bringing web pages to print. Someone mentioned its applicability to the blog-streaming world. Prince can fill this niche well. But going from web to print has, up till now, been carried out by printing according to a print stylesheet, not downloading a PDF. What Prince has over, for instance, Firefox is that the latter doesn’t support the CSS page model properly. When browsers come up to that functionality, Prince will be out of job.

    Mr Lie might answer that the niche isn’t web to casual print, it’s XHTML to books, with the XHTML not necessarily ever being hosted on a web server, and books like the kind we get from Framemaker or Quark. However, that’s a niche Prince, or more accurately an XHTML to PDF tool, can’t fill either. Maybe CSS is already up to the task of heavy formatting (and I doubt that), but XHTML isn’t up to the task of rich markup. XHTML is a limited tagset. You know it when you have to use span tags where in general XML you’d use an element. You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML. XHTML is suitable for the simplest books, but anything beyond that, like any random book you pick up in the college library, requires a more feature-rich markup language. XHTML for books could only be hobbyists’ fare, and maybe not even that, since hobbyists are far more likely to opt for WYSIWYG tools than textual stuff.

    In short, I don’t see Boom finding its niche among any of the possibilities. It’s overkill for simple web to print, and underpowered for professional typesetting.

    Copy & paste the code below to embed this comment.
  53. Great idea I will have to read the book to give a better review but excellent job on making it via this method.

    Copy & paste the code below to embed this comment.
  54. HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended)

    :) It’s the «FO» part I have a problem with. Formatting objects don’t have any semantics and should therefore not be represented in XML. It’s just a bunch of font tags. Which is why I “once wrote”:http://www.xml.com/pub/a/1999/05/xsl/xslconsidered_1.html?page=4


    bq. I can understand why overworked undergraduates think FONT is cool, but I’m very disappointed when a group of highly skilled adults tell kids to stop playing, form a committee – and then come out with a set of supercharged FONT tags

    Anyway, your main argument is not CSS vs. XSL-FO, it’s against the use of HTML as the basis for our markup. You write:

    XHTML isn’t up to the task of rich markup. XHTML is a limited tagset.

    Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back—without losing information.

    You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML.

    I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.

    Copy & paste the code below to embed this comment.
  55. Formatting objects don’t have any semantics and should therefore not be represented in XML.

    Why not? In the letters that make up the initialism XML, I don’t see anything that stands for Semantics. XML is just a toolchest for building any markup language you wish, and one of those happens to be the page layout language called XSL-FO. And it isn’t “just a bunch of font tags”? anymore than CSS is—I think we both know XSL-FO is to be generated from XSLT rather than written by hand, and when generated from an XSLT script it’s equivalent to a CSS stylesheet in separating content and presentation.

    Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back—without losing information.

    Doesn’t the use of a kluge indicate the inappropriateness of the format? And, um, talking about semantics, are you aware that class attributes are style directives, containing no more semantics than I or B or TT tags? You’re like proposing the use of such constructs as divs with style attributes instead of H1/H2/H3 tags, which Ian Hixie complained about on his blog, but on steroids!

    This isn’t the right tool for the job. Anyone who so much prefers CSS to XSL can use CSS to style XML, and that would be better. I shudder to the thought of using HTML, with kluges and all, for preparing a college grammar book. But even CSS isn’t wholly satisfactory—you’ve had to write an external script to generate the TOC, while you can do it with XSLT, and then style with FO in the same gulp. Looking at it that way, the XSL approach could be said to be more deskilling than the HTML/CSS one.

    I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.

    But OpenDocument is for office applications, not for browsers. You seem to be very Web-centric. Additionally, even if ODF were based on HTML, the type of HTML that office applications generate approaches the elegance of Frontpage’s output.

    Copy & paste the code below to embed this comment.
  56. Personaly I am more familiar with css than with Microsoft word. Now I can type my esseys in Dreamweaver :)

    Copy & paste the code below to embed this comment.
  57. How novel, using a language based on SGML (Standardised Genral Markup Language) to make a printer paint a page instead of a browser.

    Ah yes – my first program documentation (1983) was generated using IBM DCF on a 3090 mainframe. And our documents had strange markup like <h1>,  etc etc – and it had ‘stylesheets’ to add ornamentation (other than bold or underline) etc etc when IBM released its first advanced function printers (3800-3 and 3820s).

    IIRCit was a superset(or was that a subset?) of SGML. It even generated tocs, indexes etc etc

    Amazing its still around … http://www.printers.ibm.com/internet/wwsites.nsf/vwwebpublished/dcfhome_z_ww

    Kim Mihaly

    Copy & paste the code below to embed this comment.
  58. Just give me the needed CSS print functionality, and a web browser that supports it.  Then all I’ll need to do is File -> Print to Postscript/PDF.  Even better:

    % firefox—print http://www.alistapart.com/print/me—output ala.pdf

    Copy & paste the code below to embed this comment.
  59. TeX is different thing. I wrote a number of papers, and even a whole book using it. These days I use XSL and produce PDF. This is XML based and more flexible. Still I personally like TeX much more. But PDF and TeX are about actually printing content in high quality.

    The point of this article seems to be, that using XHTML/CSS can be used to publish a real book. It’s a prove-of-concept by the persons who created CSS — and that’s nice.

    Printing web-pages is always a pain. And I hope that’s what this is about. Printer-friendly pages are never really what the claim to be. XHTM/CSS can not compete with PDF. But it can complement it. And make it easier to import web-pages into publishing systems.

    Copy & paste the code below to embed this comment.
  60. Very nice paper, thanks. And Prince may be just the tool I need for a “print-on-demand” adjunct to my (free) ebooks site (http://etext.library.adelaide.edu.au)

    I’ve been tinkering for a long time with ebooks—mostly public domain novels and essays (which it is true to say are quite simple compared to technical works), using HTML. My main interest has been in formatting books for the web rather than print, but there’s always that lingering, “wouldn’t it be nice” feeling that it would be great to be able to print them too, if desired. And I have had some limited success in that direction using rudimentary CSS (see the FAQ), which produces a nice result if you don’t mind A4 and don’t much care about page numbering etc.

    But it is very pleasing to see someone pushing the envelope to see what can be done with CSS. Now, if only my browsers supported all those features, I’d be very happy.

    Of course, with or without Prince, there’s no reason I should not use the CSS3 features, even if they are not currently supported. They will be one day, and then my ebooks will be ready and waiting!

    (And I’ve heard all the “wrong tool” arguments from the ebook crowd already, thanks! LaTex, Docbook, XSL, yadda yadda. Most of them are still producing ugly results whatever the tool.)

    Copy & paste the code below to embed this comment.
  61. I opened up this article mainly because I’ve been looking for a means to create invoices and proposals quickly, easily, and with some customization.

    I’ve always hated opening up Indesign or Msoft Word just to fudge a couple variables and print. I’m on a slow laptop, and it can seem like forever to load up these bloated apps, only to close them after seconds of use.

    I love the possibility that I can create a printed page template, and only have to open a simple text editor to edit and then send it to a browser (which is always on!) and hit print.

    I know everyone’s been knocking the book format application, but I’m very excited about other possible applications.

    Copy & paste the code below to embed this comment.
  62. I am righting a docbook “book”. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.

    Thanks,
    Raj

    Copy & paste the code below to embed this comment.
  63. I am righting a docbook “book”?. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.

    Prince ships with a CSS style sheet that does rudimentary styling of DocBook files. It’s called “docbook.css” and it should be a reasonable starting point.

     

    Copy & paste the code below to embed this comment.
  64. thanks

    Copy & paste the code below to embed this comment.
  65. While the article is great, i don’t get the point.
    I tried printing you HTML file via the latest version of firefox and ie and the footers (page numbers) do not appear.
    if we need to have another program to convert to pdf to then be able to print, what is the use? why cannot CSS just work with browsers for printing?
    I am trying to create documents that can be visible on the web and then when somene wants to print them, have a page number and a footer appear and your solution does not work, why is the question? since it is CSS and XHTML, there should not be a problem.
    i have to revert to the idea of making 2 style scheets one for print and one for screen which is also a bad solution.
    Unless i missed something

    Copy & paste the code below to embed this comment.
  66. I don’t think this is ready for any serious use. Consider for example
    <ul>
    <li>That you can’t have a ul inside a p,</li>
    <li>or any block level element</li>
    </ul>
    which really breaks orphan control code. There is simply no way that a client figure out like that a sentence like this belongs to the paragraph above, and isn’t a paragraph on its own.

    (La)TeX has rather advanced algorithms to do this right, since it seriously breaks text flow. It may not be the kind of thing that people point their fingers at, but if you ask them, they tell you that your text was “heavy”. Any typographer worth his salt, and a serious book publisher will give this high priority.

    Don’t get me wrong: I would really like to see a LaTeX replacement, as the HTML tools are much more widespread than LaTeX, and LaTeX is often a pain to write. However, it is important to realise that there are many good reasons why people use it for high-quality work, and that is not going to change before certain flaws in the original design of HTML is corrected, and I know it breaks your heart, HÃ¥kon, but that means backwards-incompatible changes must be made to HTML.

    Also, it means that we have to put some effort into high-quality printing in the UAs, and I don’t see us doing that…?

    Copy & paste the code below to embed this comment.
  67. I like the idea, since we already have the API documentation of our software generated as HTML. It must not be perfect for printing (I would prefer Latex over XSL-FO/Docbook, but that is not important here).

    What I am really missing is the possibility to create a reference to a numbered element: E.g. images are numbered with a chapter prefix and a counter for the image, i.e. a caption like “Fig. 2.3” for the third image in the second chapter.

    &&/%%$$
      &&&%%&%
      %%&&&&&&&
      {text-align:center}Fig. 1.1


    This could be easily done using counters. But now I would like to create a reference in the text to this image like, e.g. “see Fig. 2.3” where the “Fig 2.3” is automatically generated. Is this possible?

    bla bla bla (see Fig 1.1) bla bla bla

    Copy & paste the code below to embed this comment.
  68. I’m involved in a project that requires a “clean” print option, but none of the developers have specific expertise in printing (nicely) from XHTML. Frankly, we have been dreading the day when we would have to buckle down an learn an unfamiliar print technology. After stumbling on this thread I picked up the free Prince demo today. Within an hour, I was outputting reasonably complex pdf layouts with tables borders, backgrounds and images (oooooooh… ahhhhhhh…) and without touching the original content. The CSS2 implementation is refreshingly solid (this the week that IE7 and FF2 were released). Now, my team is genuinely excited about printing. Offset press publishing might be a stretch, but Prince proves that more modest goals are achievable with a fraction of the effort.

    Copy & paste the code below to embed this comment.