HTML and CSS, two of our favorite acronyms, are normally associated with web pages. And deservedly so: HTML is the dominant document format on the web and CSS is used to style most HTML pages. But, are they suitable for off-screen use? Can CSS be used for serious print jobs? To find out, we decided to take the ultimate challenge: to produce the next edition of our book directly from HTML and CSS files. In this article we sketch our solution and quote from the style sheet used. Towards the end we describe the book microformat (boom!) we developed in the process.
The studious reader may want to fetch a sample HTML file, sample style sheet, as well as the PDF file generated by Prince. The PDF file is similar to the one we sent to the printer. We encourage you to base your own book on the sample file and tell us how it goes.
Print vs. pixel#section2
A printed book has many features not seen on screens. There are page numbers, headers and footers, a table of contents, and an index. The content must be split into pages of fixed size, and cross-references within the book (for example, “see definition on page 35”) must be resolved. Finally, the content must be converted to PDF, which is sent to the printer.
Web browsers are good at dealing with pixels on a screen, but not very good at printing. To print a full book we turned to Prince, a dedicated batch processor which converts XML to PDF by way of CSS. Prince supports the print-specific features of CSS2, as well as functionality proposed for CSS3.
CSS2#section3
CSS2 has a notion of paged media (think sheets of paper), as opposed to continuous media (think scrollbars). Style sheets can set the size of pages and their margins. Page templates can be given names and elements can state which named page they want to be printed on. Also, elements in the source document can force page breaks. Here is a snippet from the style sheet we used:
@page { size: 7in 9.25in; margin: 27mm 16mm 27mm 16mm; }
Having a US-based publisher, we were given the page size in inches. We, being Europeans, continued with metric measurements. CSS accepts both.
After setting the up the page size and margin, we needed to make sure there are page breaks in the right places. The following excerpt shows how page breaks are generated after chapters and appendices:
div.chapter, div.appendix { page-break-after: always; }
Also, we used CSS2 to declare named pages:
div.titlepage { page: blank; }
That is, the title page is to be printed on pages with the name “blank.” CSS2 described the concept of named pages, but their value only becomes apparent when headers and footers are available. For this we have to turn to CSS3.
CSS3#section4
The CSS Working Group has published a CSS3 Module for Paged Media. It describes additional functionality required for printing. We will start by looking at running headers and footers.
Headers and footers#section5
Here is an example:
@page :left { @top-left { content: "Cascading Style Sheets"; } }
The example above puts a string (“Cascading Style Sheets”) in the top left corner of all left-hand side pages of the book. All pages? Not quite. A subsequent rule removes the header from pages named “blank”:
@page blank :left { @top-left { content: normal; } }
Recall from earlier that all <div class=“titlepage”></div>
elements are to be printed on “blank” pages. Given the style sheet above, “blank” left-hand side pages will be printed without a header.
Stealing strings#section6
Our book consists of many chapters and the title of each chapter is displayed in a header on right-hand side pages. To achieve this, the title string must be copied from an element with the string-set
property:
h1 { string-set: header content(); }
Just like there were named pages in the previous section, CSS3 also has named strings. In the example above, the string named “header” is assigned the chapter headings. Each time a chapter heading is encountered, the chapter title is copied into this string. The string can be referred to in other parts of the style sheet:
@page :right { @top-right { content: string(header, first); } }
In the example above, the right-hand side header is set to be the value of the “header” string. The keyword “first” indcates that we want the first value of “header” in case there are several assignments on that page.
Page numbers#section7
Like headers, page numbers are a navigational aid in books. Setting the page numbers is easy:
@page :left { @bottom-left { content: counter(page); } }
One requirement from our publisher was to use roman numerals in the first part of the book. This part is referred to as “front-matter”. Here is the style sheet for roman page numbers in the front-matter:
@page front-matter :left { @bottom-left { content: counter(page, lower-roman); } }
The numbering systems are the same as for the list-style-type
property and lower-roman
is one of them. The counter
called “page” is predefined in CSS.
Cross-references#section8
The web is a huge collection of cross-references: all hyperlinks are cross-references. Cross-references in books are similar in nature, but presented differently. Instead of the blue underlined text we know from our screens, books contain text such as “see the figure on page 35.” The number “35” is unknown to the authors of the book—one can only find the page number by formatting the content. Therefore, the number “35” cannot be typed into the manuscript but must be inserted by the formatter. To do so, the formatter needs a pointer to the figure. In HTML, this is done with an A
element:
<a class="pageref" href="#figure"gt;see the figure</a>
The corresponding style sheet looks like this:
a.pageref::after { content: " on page " target-counter(attr(href), page) }
The example above needs some explanation. The selector refers to a generated pseudo-element (::after
) which comes after the content of the A
element. The first part of that pseudo-element is the string ” on page ”. After that comes the most interesting part, the target-counter
function which fetches the value of the “page” counter at the location pointed to by the “href” attribute. The result is a that the string ” on page ” is concatenated with the number “35”.
Table of contents#section9
Similar magic is invoked to generate a table of contents (TOC). Given a bunch of hyperlinks pointing to chapters, sections and other TOC entries, the style sheet describes how to present the hyperlinks as TOC. Here is a sample TOC entry:
<ul class="toc"> <li><a href="#intro">Introduction</a></li> <li><a href="#html"><abbr title="HyperText Markup Language">HTML</abbr></a></li> </ul>
The style sheet for the TOC uses the same target-counter
to fetch a page number:
ul.toc a::after { content: leader('.') target-counter(attr(href), page); }
Also, a new function, leader
, is used to generate “leaders.” In typography, a “leader” is a line that guides the eye from the textual entry to the page number. In our example, a set of dots is added between the text and the page number:
Introduction….................1 HTML….........................3
Note that the this functionality is experimental; no Working Draft for leaders has been published yet.
The book microformat—boom!#section10
As you probably have guessed by now, we succeeded in producing our book using HTML and CSS. In doing so, we also developed a set of conventions for marking up a book in HTML. HTML has the wonderful class
attribute which lets anyone extend the semantics of HTML documents while building on HTML’s universally known semantics. So, in our book, we used a rich set of HTML elements and added a bunch of class
names.
Since then, the concept of “microformats” has entered the web and we are happy to discover that we actually developed (at least the beginnings of) a microformat for books. We think other authors will be able use the boom! microformat and improve upon it in the process.
Sections of a book#section11
The chapters in the first part of the book, such as preface, foreword, and table of contents, are enclosed in a DIV
with a corresponding class
name. The chapters in the main body are DIV
s with a class
of “chapter” and the appendices are DIV
s with class
“appendix.” In the style sheet, the class
names are primarily used to select the correct named page with the correct headers and footers.
Although HTML has six levels of headings (H1
, H2
, etc.) to distinguish chapter headings, section headings, and subsection heading, it is convenient to enclose sections in an element, if only to be able to style the end of a section. We used a DIV
with class
“section”.
Tables and figures#section12
HTML doesn’t have a dedicated element for figures with captions, but it is easy to create one by specializing a DIV
:
<div class="figure"> <p class="caption">...</p> <p class="art"><img src="..." alt="..."/></p> </div>
The TABLE
element has a CAPTION
element, but support is spotty. We, therefore, used a similar strategy for marking up tables:
<div class="table"> <p class="caption">...</p> <table class="lined"> ... </table> </div>
We used a variety of figure styles (normal, wide, on the side, etc.) and table styles (normal, wide, lined, top-floating, etc.) in our book. An element can be given several class
names, so that, say, a table can be both “lined” and “wide.” We have cut down on the number of alternatives in the sample document for the sake of simplicity.
Side notes and side bars#section13
A DIV
with class
“sidenote” is used for side remarks, related to the (following) text in the main body but not necessarily shown in-line. A typical way to show them is to put them in the margin.
A “sidebar” is longer than a “sidenote.” The latter is typically only one paragraph, maybe two; the former is several paragraphs or includes lists or other material. In the sample document there is one sidebar that floats to the top, uses the full width of the page, and is given a gray background.
Summing up#section14
The Prince formatter has opened up the processing pipeline from HTML and CSS to PDF. It is now possible, even feasible, to use HTML as the document format for books. This makes it easier to cross-publish content on the web and in print.
That said, authors who attempt to use the techniques described in this article will face some technical issues along the way. For example, we have not discussed how to generate the TOC structures and how to display wide tables. We have also left some room for improvement in the boom! microformat. However, compared to the headaches of actually writing a book, formatting is now a joy!
This article reminded me of Emeril AND showed me a great new way for displaying book content at the same time. That’s hard to do. Congrats.
Blatantly ignoring the point of the article, as a pre-press technician for a local printer myself, I would be crestfallen to receive a PDF such as the sample.pdf included with the article.
The pages have no corner trims and no bleed; the black text separates into CMYK leaving the black at only 90%. We would not be happy to go to press with this file, which highlights the pitfalls of using unsuitable tools for the job. I wonder if QuarkXPress will reliably import HTML and CSS? 😉
Now, if you’ll excuse me, I must find a suitable orifice for this 128pp Microsoft Word booklet.
Its nice that (X)HTML/CSS can be clobbered to produce an actual book but it really did make me wince that the ‘hacks’ you were using were already present in something that was designed specifically for this type of job.
Pretty much every is here:
http://www.ctan.org/tex-archive/info/lshort/english/lshort.pdf
(La)TeX is expected when sending material to the printers, and if you ever have to publish work for a journal you must use LaTeX….
Sorry but (X)HTML/CSS really is not up to the job. Don’t re-invent the wheel.
I think it would be better to use an xml file with xhtml markup tags and a xsl fo stylesheet to make the pdf. Then u can use sans serif fonts for online and serif fonts for pdf files,swap images between 72 dpi and 300/600. Some tools also handle cmyk and metatags. Maybe in a near future we can even send along jdf info. Imo html and css is just to limited for decent use. I also think it is wrong to create new tags with print in mind. That way we end up with a lot of xml-sollutions that all do the same thing.
bq. Sorry but (X)HTML/CSS really is not up to the job. Don’t re-invent the wheel.
It depends on the purpose of your document. If it is a document that is primarily to be distributed and read online, but with the option for people to print it if they wish, this seems like a perfect combination.
LaTeX is a relatively specialised product requiring non-standard tools – far more so than HTML/CSS – so for small users it may not be practical to use it.
Yes, for a major print job, a dedicated print tool such as PDF or LaTeX would be better – but this would then be unsuitable for screen reading. So if the document is not worth doing twice – once for paper and once for screen – HTML/CSS is probably the best way of producing a single format that works well in both media … and others (eg Braille, screen reader).
bq. The TABLE element has a CAPTION element, but support is spotty.
Advocating generic (X)HTML (with classes) over existing semantic elements *demands* more than a casual dismissal. I’d really like to see statements like this clarified (a link would suffice). I’m sure there are good reasons. Share them 🙂
…But still it’s a great use of css to print webpages.
It shouldn’t be used as press-ready document. Of course not, XHTML was not designed for that, but to give documents structure. Yet, if you would like to make articles in your site available for download, this is a great way of “polishing” them into nice looking stuff 🙂
It may not be the “best” way to print a book. But it is a great way of printing webpages.
Really nice post, I think not every body is skilled enough to use it now, but maybe in the future.
bq. HTML and CSS, two of our favorite acronyms bq.
I always thought HTML and CSS are abbreviations, can any body tell me the difference between acronyms and abbreviations ?
This is a great idea, one I’ve certainly never thought of using, but I don’t think it’s ready for use on large projects. Companies like Quark and Adobe shell out big bucks to produce programs that do the same thing, why not leave it to them? As others have mentioned, though, I hope these programs will have support for importing X/HTML and CSS in the future. The possibilities would be virtually limitless.
this raises one somewhat unrelated question though. How do you think users would react if the web began acting like print media, with numbered pages, simplified layouts, etc? I think it would simplify a lot of the processes average web users go through and it would be a little more calming to firt time computer users used to looking at newspapers and magazines.
I don’t think that at this point CSS is mature enough to fulfil a truly professional role in book-printing.
I do, however, encourage any attempt to _make_ it mature, since anything that aims to lessen the gap between print and web is a Good Thing.
I would enjoy articles here on ALA that delve into the nooks and crannies of current CSS-printability, get out of it what they can and in turn bring shortcomings to light, because CSS-print is as dark a subject today as CSS itself was around 1998.
bq. But it is a great way of printing webpages.
And hopefully, a great way of webbing print pages. 😉
A response to Abdelrahman Osama’s post about abbreviations and acronyms. I believe you are right, and as far as I get it – and here it goes – the difference is that an acronym is an pronounceable abbreviation (an abbreviation which forms out a word), whereas an abbreviation is just one or more letters taken from one or several words.
Acronyms: MODEM, LASER
Abbreviations: CSS, HTML, IBM
And now to the article =) :
As said before by several persons, I do too believe that HTML+CSS (yet ;)) can solve all the problems, but this is a great solution for printable (web) articles, tutorials, and the like that do span several pages. Including “books” that’s been published to the web.
And it’s fun to watch CSS maturing and showing off some muscles in different terretories.
I think that the people who have criticised XHTML / CSS as not being ready for styling a printed book have a valid point. This article shows that XHTML / CSS are not yet up there with dedicated tools such as LaTeX. But the sample markup and style sheet may well be the best-written print style ever written to date!
I’m guessing that almost all of the CSS magic used in the sample is already supported by cutting-edge browsers such as Firefox. If browsers were able to output to print as well as Prince can – and they’re really not that far off – this would basically eliminate the barrier between web and print altogether. My biggest concern is that they had to use so much print-specific markup, such as specialised classes, in order to achieve the desired output.
No, this technique is nowhere close to replacing an experienced print designer; nor is it going to replace proper pre-press preparation for “real”? print jobs. I see it, however, giving opportunities in our blog-happy nation to provide automated PDF versions of constantly updated content. One conceivably could print out a customized date range of some news or blog-oriented site with headers, an index/TOC and page numbers. It would make for catching up on favorite Websites that much easier when I’m riding the train in to work.
While I agree with the many who have commented before I have (that CSS really isn’t capable of all the details needed to create an arbitrarily styled book), I found this to be a great primer on getting your web pages to the print medium. It’s much more accessible than the W3C guidelines.
Thanks for it!
If you want to make accessible PDFs for online reading, you need a tool that, unlike the current version of Prince, converts your HTML markup to PDF tags. I’ve seen a lot of interesting PDF generation tools that fail to take advantage of PDF tags. Maybe the feature is hard to implement, but until Prince produces tagged PDFs, I’ll have to stick with Word and Acrobat.
While I think the original focus of the article is to show that HTML and CSS can use the CSS3 Paged Media Module to prepare an HTML document for printed material, I think using it solely for books is mistaken.
Also, format editors like LaTex are proprietary and will, in the near future be marginalized by free (as in beer) tools that will use common semantics and transforms.
I think the two authors show us that it is possible to created printable books using HTML and XML (using Prince tool as a the transform). However, I believe the smarter application lies in the Print command within our browsers. Heavily annotated webpages, especially health care provider pharma, would best be suited for this CSS3 module as a linkable “Print this page” URL on the intended page.
I myself have created a CSS module for print jobs using the Firefox add-on, Greasemonkey. Whereas the NY Times prints incredibly small font pages, the Greasemonkey CSS module allows me to format the articles into a much more useable printed document format.
If I remember correctly, Joe Clark, in the writing of his book, Building Accessible Web Sites, did something like this: he wrote using HTML and produced his book from the HTML content although I don’t think he (or his publisher) took the output directly from HTML. Furthermore, he had a dual purpose in mind, to post the book to his web site so for his purposes, writing the book using HTML was perfectly fine.
This is certainly an interesting idea and learning about and using the paged media CSS properties alone is well worth this article but, like some others, I am not convinced that HTML is the right source for all books as others have also stated.
If I remember correctly, Joe Clark, in the writing of his book, Building Accessible Web Sites, did something like this: he wrote using HTML and produced his book from the HTML content although I don’t think he (or his publisher) took the output directly from HTML. Furthermore, he had a dual purpose in mind, to post the book to his web site so for his purposes, writing the book using HTML was perfectly fine.
This is certainly an interesting idea and learning about and using the paged media CSS properties alone is well worth this article but, like some others, I am not convinced that HTML is the right source for all books as others have also stated.
After I clicked submit, I got called away from my desk. When I returned, I didn’t see my comment (Opera 9TP) so I clicked again (several minutes had passed, short memory).
Was I the only person suprised to see how poorly Firefox (even the beta) handles the print stylesheet? Even IE handles it better.
“Docbook”:http://docbook.org/ is a better tool for this job. From their “FAQ”:http://www.dpawson.co.uk/docbook/reference.html#d16e16 :
bq. “DocBook provides a system for writing structured documents using SGML or XML. It is particularly well-suited to books and papers about computer hardware and software, though it is by no means limited to them.
In short, DocBook is an easy-to-understand and widely used DTD. Dozens of organizations use DocBook for millions of pages of documentation, in various print and online formats, worldwide.”
By marking a book up in Docbook XML, you can export your book to any suitable format (via an XSLT transformation). There are already tools to convert Docbook documents to html, xhtml, PDF, and more. If writing/marking up a book is your goal, you would be better off using Docbook than reinventing the wheel and writing your own microformat.
Prince costs $350, works from the command line on Mac and Linux, and doesn’t do auto-hyphenating. I own Lie’s book; a whole lot of manual hyphenating must have gone into it. It would to take less time to reformat in free LaTeX than to manually hypenate an entire book.
bq. Also, format editors like LaTex are proprietary and will, in the near future be marginalized by free (as in beer) tools that will use common semantics and transforms.
In response to Thom Wiley’s views on LaTeX, it should be noted that LaTeX is free (as in beer) and also as in liberty (it’s under the GNU license). It’s a template extension of an underlying language–TeX–that’s got its own markup defined, multiple viewers, editors, and tools, and it’s been around for more than a decade (TeX began to be formally defined around 1989). You could say that LaTeX is what CSS/(X)HTML is to SGML.
CSS is a great tool, but it’s not really made for the print job. There’s way too many things that need to be considered that CSS simply doesn’t give you control over in real printing/publishing (Fine typesetting and typography, anyone?) It would be a nice way of getting long sets of texts off the web and onto paper, but it’d still look better and be more readable if you threw an HTML page through a text processor (sed or otherwise) and converted it into a LaTeX document, then printed it.
LaTeX isn’t as obscure as one might think. While it’s certainly not a desktop publishing type solution, it’s been around for long enough that it’s a very widely used method of typesetting academic papers and documents.
Maybe, if CSS gained a slew of options that’d allow authors to set and define every aspect that was needed for print, it’d succeed in this field. But for now, it’s a somewhat half baked solution, and its options need a whole lot more implementation (which might never get done) to really fit the bill, even for the smallest self-read page jobs.
That all being said, it’s a very neat concept.
I’m not planning on writing any books with HTML/CSS, but there’s a lot here that can be put into my next print stylesheet.
One thing though, what’s with the first sub-heading: “Print vs. paper”? Aren’t they the same thing? Shouldn’t it be “Print vs. pixel” or “Screen vs. paper”?
bq. Was I the only person suprised to see how poorly Firefox (even the beta) handles the print stylesheet? Even IE handles it better.
What upsets me is that in Opera (even the latest version), ticking the “Print background colour” option appears to include the page background _as defined in the screen stylesheet_, not the (usually white) one in the print stylesheet. And then the background screen colour clashes with the foreground print colours as these are different from the screen colours – very upsetting!
And it’s very irritating that you can’t allow _eg_
Very nice, very very nice, I always wondered if it was possible to print in a decent way using XHTML and CSS 😉 and I think it’s great!
In this tutorial I have seen a large use of unknown-to-me syntax, where can I find a full guide to CSS syntax (apart from w3c specifications?).
Thanks
Tommaso Urli
bq. What upsets me is that in Opera (even the latest version), ticking the “Print background colour”? option appears to include the page background as defined in the screen stylesheet, not the (usually white) one in the print stylesheet.
Next preview of Opera 9 will probably solve that. See post about that “here”:http://my.opera.com/community/forums/topic.dml?id=102299 .
I pretty much agree with the consensus that HTML+CSS are not really tools for printing books. CSS isn’t really designed to turn HTML or XML into print media as much as it’s geared toward styling HTML or XML into something suitable to be printed. HTML doesn’t even have enough semantics to properly markup most college-level papers much less semantics for proper printed media. HTML is lightweight and feature poor.
*One side note*: I think the use of ‘rel’ and ‘rev’ attributes should have been used with ‘link’ and ‘a’ elements in the HTML.
What about mentioning that this is written by a member of the Prince team, touting a product that costs U$349 as the only alternative to produce output from the “microformat” proposed in the article?
Not that the article itself is bad, but a disclaimer would be nice. Or is it there and I read too fast?
Related to what Jared Hales said, I think it would be interesting if someone were to write the necessary XSL stylesheets to transform DocBook to boom! Last time I checked, the stylesheets for XHTML included in the standard distribution outputted non-semantic XHTML 1.0 Transitional. Outputting to boom! instead would, I think, be a big improvement for those looking to get their DocBook content to web.
I am a web guy but I have exported to Quark Express without converting the colours to CMYK to a company who have digital printers. I presume they simply convert it automatically?
bq. I think it would be better to use an xml file with xhtml markup tags and a xsl fo stylesheet to make the pdf.
I don’t. I challenge you to write the XSL style sheet that generates a similar PDF file from our XHTML source file. It’s certainly possible, but much more troublesome. For more arguments along this line, see “Printing XML: Why CSS Is Better than XSL”:http://www.xml.com/pub/a/2005/01/19/print.html
bq. this is written by a member of the Prince team … a disclaimer would be nice. Or is it there and I read too fast?
There was a disclosure in our original text. Somehow it disappeared in the publication process, but it’s now back in the bio section. Thanks for notifying us.
bq. If writing/marking up a book is your goal, you would be better off using Docbook than reinventing the wheel and writing your own microformat.
Docbook is nice, and you can quite easily adapt the sample CSS style sheet to work with Docbook. The main benefit of using the Boom! microformat is that you can display the document in a billion browsers… and also print it!
bq. This is a great idea, one I’ve certainly never thought of using, but I don’t think it’s ready for use on large projects. Companies like Quark and Adobe shell out big bucks to produce programs that do the same thing, why not leave it to them?
Why leave it to them if we can achieve the same using simple web standards? I think XML/HTML + CSS *is* ready for large projects and the article shows how.
A batch-formatting approach (which both CSS and XSL uses) cannot compete with a human designer in (say) glamour magazines, but most published books can be produced with simple web standards. Our book is relatively advanced compared to (say) a novel.
bq. Prince costs $350, works from the command line on Mac and Linux, and doesn’t do auto-hyphenating. I own Lie’s book; a whole lot of manual hyphenating must have gone into it.
Actually, we used a perl script which added soft hyphens (­) in the right places. Given soft hyphens entities, Prince will do the right thing. It would be nice for Prince to fully automate the process though, I know it’s on the todo list. (That’s one of the great perks of being on the YesLogic board; I get to influence the todo list 🙂 Email me if you need the script.
bq. One thing though, what’s with the first sub-heading: “Print vs. paper”?? Aren’t they the same thing? Shouldn’t it be “Print vs. pixel”? or “Screen vs. paper”??
My goodness, yes! Blush. Sloppy authors, I’d say. Fixed. Thanks.
bq. Very nice, very very nice, I always wondered if it was possible to print in a decent way using XHTML and CSS 😉 and I think it’s great! In this tutorial I have seen a large use of unknown-to-me syntax, where can I find a full guide to CSS syntax (apart from w3c specifications?).
Thanks for your kind words. Alas, some of the syntax isn’t even described in W3C specifications yet. The article you were reading explains some of the more advanced extensions, and any book on CSS will tell you the basics. I hope that most of the functionality will be described in W3C Working Drafts in the next 6 months. And some of them are even quite readable.
bq. Advocating generic (X)HTML (with classes) over [the caption element] demands more than a casual dismissal. I’d really like to see statements like this clarified (a link would suffice). I’m sure there are good reasons. Share them 🙂
Ideally, I would like to use the caption element; I believe in HTML semantics. The element is troublesome, however, for two reasons. First, it appears *inside* the table element while you typically want it to be presented *outside* of the table. Second, various browsers have tried to add support for the caption element and its attributes. Some have “failed”:http://www.blooberry.com/indexdot/html/tagpages/c/caption.htm , and as a result you enter a minefield when trying to use it. At least, that’s how I felt when I tried. Maybe I didn’t try hard enough. I’d be happy to see you find a way to achieve the same formatting by using the caption element.
bq. I don’t think that at this point CSS is mature enough to fulfil a truly professional role in book-printing.
Perhaps not. Certainly, one can point to features that are missing (e.g. automatic line numbers). However, most books don’t use any such advanced features. I’d estimate that CSS and Prince could produce 90% of all books published in Latin scripts. I’m happy to leave the remaining 10% to a guy and his Quark.
bq. The pages have no corner trims and no bleed; the black text separates into CMYK leaving the black at only 90%. We would not be happy to go to press with this file
Sure. The sample file is optimized for screen use rather than print. Thanks for your partial list of requirements for PDF Prince — feel free to email me your complete list 🙂
In documentation management and production, we’ve been trying to get away from the conversion game for years. Tools that convert one format of document or text into another abound. If you’re familiar with help authoring tools, think Doc-to-Help, for example.
Although you’re using style sheets and proper markup, you’re still trying to convert one format (HTML) into another (Print/PDF). In the long run, this is not the way to go. And your PDF will suffer as a result. Try to build bookmarks from that PDF …
True single-sourcing demands that content and format be separated completely. And there are tools out there that do that splendidly, where content is stored in unformatted data blocks and published at run-time using CSS (for HTML/XHTML/XML) or MS Word DOT templates for print. Incidentally, an MS Word DOT file is to MS Word precisely what CSS is to HTML. It only controls a lot more (like headers and footers) a lot better for print – because that’s what it was designed to do.
bq. Although you’re using style sheets and proper markup, you’re still trying to convert one format (HTML) into another (Print/PDF). In the long run, this is not the way to go.
In general, I agree with your views that single source is good and document conversion is bad. This is why CSS has the concept of media types; the same source document (in, say, HTML) should be usable on all sorts of devices. The sample document has style sheets for print and screen and support for other media types can easily be added if you want more control of the final presentation.
The only reason for converting our files to PDF is to send it to the printer. They only accept PDF files, and, as such, PDF works quite well. Until printers accept HTML/CSS natively, using PDF is good solution which doesn’t change the fact that we have a single source.
I have been trying to solve this web-to-print quandary for a project over the past month, and after trying to use RPDF with Ruby on Rails, I am retreating to a simply printable, CSS-styled web page.
So, this article really got me excited! Then after reading it and hopping over to princexml.com to get Prince, I feel like this article is an advert for the $350 Prince software.
It bums me out to read an article on my old favorite ALA, which is all about standards, open-ness & accessibility on the web, only to find out that I have to buy expensive proprietary software to put the knowledge to use.
HÃ¥kon, I really appreciate you for all of the standards goodness you bring to us, but this is a TOTAL BUMMER.
bq. I feel like this article is an advert for the $350 Prince software.
This article is about open standards, namely HTML and CSS. At the time of writing, Prince is the only software that can process our code. Not referring to it would have been negligent. We hope other software will start supporting our code — this was one of the reasons for writing the article in the first place.
Did you notice that you can download and use a fully functional demo version of Prince for free? You can also publish academic works without buying a license. And, if you compare Prince with other standards-based software that produces printed materials (some XSL tools come to mind), Prince is a bargain.
HÃ¥kon, I think that you will agree with me, that if you want to present some content stored in XML to user, you often need to do two things: (1) transform your document and (2) assign visual characteristic to individual components of the document.
During transformation you can do things like building a table of contents, numbering chapters and figures or adding some fixed content like a word “Figure” in the front of each figure name. CSS is able to do some basic modification to a document like adding figure numbers. More complex transformations like building of ToC must be created by something more powerful. My tool of choice for this task will be XSLT, but you can use any language which is able to read, manipulate and store XML document. You told me previously, that ToC for your book was created with some script.
After (well, yes in CSS this is not after but at the same time) document is transformed, visual characteristics like fonts, colors, spacing, margins, etc. are applied to elements in the source document. If I understand correctly your position, you are advocating CSS over XSL-FO here because CSS syntax is easier and you are assigning properties directly to elements from source XML document. I think that I can agree with your position here… but only as long you are using CSS with some general, document oriented XML format like XHTML or DocBook. Let me explain.
If you have book in XHTML you can easily add ToC into this document, because XHTML contains general markup for paragraphs and lists and ToC is nothing else then list of chapter titles with links.
But you can’t do this with more specific XML formats. For example imagine a simple invoice:
… invoice metadata here …
You probably would present it as a table.
During document transformation you need add new row with table header, new column with subtotals and finally new row for total. But XML schema of invoice doesn’t allow you to specify such informations.
This example clearly shows that there are classes of documents which must be transformed to some more general markup prior assignment of visual characteristics. XSL-FO is a such intermediate markup. I can imagine that you can also use XHTML+CSS for this purpose. But you are loosing big advantage of CSS then — your CSS rules are no more working against original markup, but against intermediate XHTML code.
So my conclusion from this is: CSS can be used for formatting documents that are written in some very generic, free text oriented vocabulary like XHTML. For more rigidly structured XML formats CSS can be used, but it is no longer easier to use then XSL-FO.
The difference in complexity is mainly caused by fact that all XHTML elements have some default formatting behaviour. Once you are not using XHTML, there is no big difference between:
CSS:
… { display: block;
color: red;
font-weight: bold; }
and XSL-FO:
It is just matter of syntax, because basic formatting model of XSL-FO and CSS is very similar and many XSL-FO properties were directly taken from CSS.
But if there is a way how to handle my invoice example using only CSS without introducing another intermediate format, I would like to know.
Jirka
bq. And, if you compare Prince with other standards-based software that produces printed materials (some XSL tools come to mind), Prince is a bargain.
I use XSL-FO toolchain for print production, namely XEP from RenderX. It’s even little bit cheaper then Prince and feature list is more complex IMHO. For example hyphenation is done directly with XEP. Hyphentation patterns are weighted, because some places inside word are more appropriate as hyphenation point. This is something you can’t acheive with soft-hyphens placed into document. Other XSL-FO implementations offer similar functionality for similar price.
But it is good to have more competition on the XML formatting market.
bq. If I understand correctly your position, you are advocating CSS over XSL-FO here because CSS syntax is easier and you are assigning properties directly to elements from source XML document. I think that I can agree with your position here”¦ but only as long you are using CSS with some general, document oriented XML format like XHTML or DocBook.
Yes, this is an important part of the argument. CSS is well suited for structured document formats where the content comes roughly in the order of presentation. I believe content should be in this near-presentation state when it “crosses the wire”. Styling should be applied as close to the reader as possible, i.e. in the client.
The other argument for using CSS in printing is that one can reuse many of the CSS style sheets written for the web.
bq. This example clearly shows that there are classes of documents which must be transformed to some more general markup prior assignment of visual characteristics.
I agree completely. And CSS hasn’t been designed for that purpose. XSLT has, and is perfectly fine to use. It’s Turing-complete and can perform the computations needed to calculate your columns. My only problem with XSLT is that it has “Style” in its name.
bq. You told me previously, that ToC for your book was created with some script.
Yes, we use Bert Bos’ “multitoc”:http://www.w3.org/Tools/HTML-XML-utils/ to generate a TOC. There have been proposals for how to handle this in CSS, but it’s probably too much of a transformation thing to make it into the CSS standards.
bq. XSL-FO is a such intermediate markup. I can imagine that you can also use XHTML+CSS for this purpose. But you are loosing big advantage of CSS then—your CSS rules are no more working against original markup, but against intermediate XHTML code.
I don’t see any problem with working against ‘intermediate code’. I think the XHTML code is what you should offer on the web since it uses well-known semantics. Your invoice example uses tag names not universally known. That’s fine as an internal format, but shouldn’t be published on the web. Also, I “don’t think XSL-FO should be published on the web”:http://people.opera.com/howcome/1999/foch.html — but that’s a different debate 🙂
Does anyone know how to set a background color to have alpha transparency using CSS? I know this isn’t supported yet until CSS 3.0, but I believe some browsers already support the feature.
I really like this demonstration of the developing capabilities of CSS. As I see it, there are situations where you want to use CSS + XHTML for multiple presentations ( views ), as when you want to print content that’s mainly aimed at the web browser.
The single source idea is certainly a good one, and solutions like Apache Cocoon uses XSLT for transforming an originating XML document for structure to produce XHTML for the browser or mobile platform and XSL-FO for printing purposes. It can use FOP ( Formatting Objects Processor ) to get PDF for printing.
The XSLT “having Style in it” is a bit confusing, but as you know ( HÃ¥kon was only expert from the start :-), XSL was introduced as “a style sheet for XML/XHTML”, to separate content from presentation. This was taken over by CSS and XSL took on another route. Modern browsers can take whatever domain specific XML document and render it using CSS styles.
XSLT is the XSL for Transformation, using an XSLT engine to transform one document into another, possibility reordering or filtering out parts of the original content.
XSL-FO became the styling part of the XSL standards, better used for printing purposes.
I completely agree that XSL-FO wouldn’t be suitable for sending documents to a browser. Even if the browser could render the document, it’s far too verbose and not easily human readable, and View source has taught us so much.
XSL-FO is complicated, and the possibility to use XML/XHTML + CSS to render print quality documents are good news.
The people who are slagging this approach are completely missing the point, IMO. For me, the good part is not so much the use of CSS to format the book, but the use of XHTML to mark it up. The printing back-end can be ripped out and replaced with whatever works for you — FO (yuck), groff, LaTeX — or load up the HTML in M$ Word or OpenOffice, apply a stylesheet, and print.
We can talk about DocBook until we’re blue in the face, but it’s such an incredibly complex DTD that most writers would give up before finishing the first chapter of the first document. DITA is a step in the right direction, but it’s probably still too complex for non-gearheads without a fair amount of motivation. Just about everyone knows enough XHTML to write a document, and there are plenty of tools — Free and commercial — that provide a pretty GUI for people who need it.
As long as writers have to associate XML with complex large-scale publishing systems with six-figure deployment costs and five-figure support costs, it will be “eXcellent, Maybe Later” outside of Fortune 100 companies. HTML brought on-line publishing to the masses through a simple syntax; now it can bring single-source on-line/paper/PDF publishing to the masses as well.
Others have pointed out the advertizing. I would also point out that HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended), so his opinion on XSL should be taken with a grain of salt.
But now to Prince itself—
It seems like a good way of bringing web pages to print. Someone mentioned its applicability to the blog-streaming world. Prince can fill this niche well. But going from web to print has, up till now, been carried out by printing according to a print stylesheet, not downloading a PDF. What Prince has over, for instance, Firefox is that the latter doesn’t support the CSS page model properly. When browsers come up to that functionality, Prince will be out of job.
Mr Lie might answer that the niche isn’t web to casual print, it’s XHTML to books, with the XHTML not necessarily ever being hosted on a web server, and books like the kind we get from Framemaker or Quark. However, that’s a niche Prince, or more accurately an XHTML to PDF tool, can’t fill either. Maybe CSS is already up to the task of heavy formatting (and I doubt that), but XHTML isn’t up to the task of rich markup. XHTML is a limited tagset. You know it when you have to use span tags where in general XML you’d use an element. You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML. XHTML is suitable for the simplest books, but anything beyond that, like any random book you pick up in the college library, requires a more feature-rich markup language. XHTML for books could only be hobbyists’ fare, and maybe not even that, since hobbyists are far more likely to opt for WYSIWYG tools than textual stuff.
In short, I don’t see Boom finding its niche among any of the possibilities. It’s overkill for simple web to print, and underpowered for professional typesetting.
Great idea I will have to read the book to give a better review but excellent job on making it via this method.
bq. HÃ¥kon Wium Lie has been long known to be an XSL foe (pun intended)
🙂 It’s the «FO» part I have a problem with. Formatting objects don’t have any semantics and should therefore not be represented in XML. It’s just a bunch of font tags. Which is why I “once wrote”:http://www.xml.com/pub/a/1999/05/xsl/xslconsidered_1.html?page=4
bq. I can understand why overworked undergraduates think FONT is cool, but I’m very disappointed when a group of highly skilled adults tell kids to stop playing, form a committee – and then come out with a set of supercharged FONT tags
Anyway, your main argument is not CSS vs. XSL-FO, it’s against the use of HTML as the basis for our markup. You write:
bq. XHTML isn’t up to the task of rich markup. XHTML is a limited tagset.
Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back — without losing information.
bq. You know it because the new OpenDocument format for office applications didn’t duplicate XHTML, it was formulated with its own tagset, which is much bigger than that of XHTML.
I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.
bq. Formatting objects don’t have any semantics and should therefore not be represented in XML.
Why not? In the letters that make up the initialism XML, I don’t see anything that stands for Semantics. XML is just a toolchest for building any markup language you wish, and one of those happens to be the page layout language called XSL-FO. And it isn’t “just a bunch of font tags”? anymore than CSS is—I think we both know XSL-FO is to be generated from XSLT rather than written by hand, and when generated from an XSLT script it’s equivalent to a CSS stylesheet in separating content and presentation.
bq. Indeed, the tagset is limited, but HTML has a wonderful extension mechanism: the «class» attribute. Using the class attribute, you can convert any XML document into HTML and back—without losing information.
Doesn’t the use of a kluge indicate the inappropriateness of the format? And, um, talking about semantics, are you aware that class attributes are style directives, containing no more semantics than I or B or TT tags? You’re like proposing the use of such constructs as divs with style attributes instead of H1/H2/H3 tags, which Ian Hixie complained about on his blog, but on steroids!
This isn’t the right tool for the job. Anyone who so much prefers CSS to XSL can use CSS to style XML, and that would be better. I shudder to the thought of using HTML, with kluges and all, for preparing a college grammar book. But even CSS isn’t wholly satisfactory—you’ve had to write an external script to generate the TOC, while you can do it with XSLT, and then style with FO in the same gulp. Looking at it that way, the XSL approach could be said to be more deskilling than the HTML/CSS one.
bq. I think this was a big mistake. By basing OpenDocument on HTML (much the same way Bert and I did Boom on HTML), the format would have had a huge installed base from the beginning: 1 billion browsers.
But OpenDocument is for office applications, not for browsers. You seem to be very Web-centric. Additionally, even if ODF were based on HTML, the type of HTML that office applications generate approaches the elegance of Frontpage’s output.
Personaly I am more familiar with css than with Microsoft word. Now I can type my esseys in Dreamweaver 🙂
How novel, using a language based on SGML (Standardised Genral Markup Language) to make a printer paint a page instead of a browser.
Ah yes – my first program documentation (1983) was generated using IBM DCF on a 3090 mainframe. And our documents had strange markup like
,
etc etc – and it had ‘stylesheets’ to add ornamentation (other than bold or underline) etc etc when IBM released its first advanced function printers (3800-3 and 3820s).
IIRCit was a superset(or was that a subset?) of SGML. It even generated tocs, indexes etc etc
Amazing its still around … http://www.printers.ibm.com/internet/wwsites.nsf/vwwebpublished/dcfhome_z_ww
Kim Mihaly
Just give me the needed CSS print functionality, and a web browser that supports it. Then all I’ll need to do is File -> Print to Postscript/PDF. Even better:
% firefox –print http://www.alistapart.com/print/me –output ala.pdf
TeX is different thing. I wrote a number of papers, and even a whole book using it. These days I use XSL and produce PDF. This is XML based and more flexible. Still I personally like TeX much more. But PDF and TeX are about actually printing content in high quality.
The point of this article seems to be, that using XHTML/CSS *can* be used to publish a *real* book. It’s a prove-of-concept by the persons who created CSS — and that’s nice.
Printing web-pages is always a pain. And I hope that’s what this is about. Printer-friendly pages are never really what the claim to be. XHTM/CSS can not compete with PDF. But it can complement it. And make it easier to import web-pages into publishing systems.
Very nice paper, thanks. And Prince may be just the tool I need for a “print-on-demand” adjunct to my (free) ebooks site (http://etext.library.adelaide.edu.au)
I’ve been tinkering for a long time with ebooks — mostly public domain novels and essays (which it is true to say are quite simple compared to technical works), using HTML. My main interest has been in formatting books for the web rather than print, but there’s always that lingering, “wouldn’t it be nice” feeling that it would be great to be able to print them too, if desired. And I have had some limited success in that direction using rudimentary CSS (see the FAQ), which produces a nice result if you don’t mind A4 and don’t much care about page numbering etc.
But it is very pleasing to see someone pushing the envelope to see what can be done with CSS. Now, if only my browsers supported all those features, I’d be very happy.
Of course, with or without Prince, there’s no reason I should not use the CSS3 features, even if they are not currently supported. They will be one day, and then my ebooks will be ready and waiting!
(And I’ve heard all the “wrong tool” arguments from the ebook crowd already, thanks! LaTex, Docbook, XSL, yadda yadda. Most of them are still producing ugly results whatever the tool.)
I opened up this article mainly because I’ve been looking for a means to create invoices and proposals quickly, easily, and with some customization.
I’ve always hated opening up Indesign or Msoft Word just to fudge a couple variables and print. I’m on a slow laptop, and it can seem like forever to load up these bloated apps, only to close them after seconds of use.
I love the possibility that I can create a printed page template, and only have to open a simple text editor to edit and then send it to a browser (which is always on!) and hit print.
I know everyone’s been knocking the book format application, but I’m very excited about other possible applications.
I am righting a docbook “book”. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.
Thanks,
Raj
bq. I am righting a docbook “book”?. Some of the fonts are not looking good and would like to enhace the fonts. I learned that I could use CSS for enhancing fonts output in html. I am trying to see if someone has gone thru the experience and have used a good CSS stylesheet file. I would like to get a copy of the CSS stylesheet if possible. Or point me to good location where I can get a good CSS file.
Prince ships with a CSS style sheet that does rudimentary styling of DocBook files. It’s called “docbook.css” and it should be a reasonable starting point.
thanks
While the article is great, i don’t get the point.
I tried printing you HTML file via the latest version of firefox and ie and the footers (page numbers) do not appear.
if we need to have another program to convert to pdf to then be able to print, what is the use? why cannot CSS just work with browsers for printing?
I am trying to create documents that can be visible on the web and then when somene wants to print them, have a page number and a footer appear and your solution does not work, why is the question? since it is CSS and XHTML, there should not be a problem.
i have to revert to the idea of making 2 style scheets one for print and one for screen which is also a bad solution.
Unless i missed something
I don’t think this is ready for any serious use. Consider for example
which really breaks orphan control code. There is simply no way that a client figure out like that a sentence like this belongs to the paragraph above, and isn’t a paragraph on its own.
(La)TeX has rather advanced algorithms to do this right, since it seriously breaks text flow. It may not be the kind of thing that people point their fingers at, but if you ask them, they tell you that your text was “heavy”. Any typographer worth his salt, and a serious book publisher will give this high priority.
Don’t get me wrong: I would really like to see a LaTeX replacement, as the HTML tools are much more widespread than LaTeX, and LaTeX is often a pain to write. However, it is important to realise that there are many good reasons why people use it for high-quality work, and that is not going to change before certain flaws in the original design of HTML is corrected, and I know it breaks your heart, HÃ¥kon, but that means backwards-incompatible changes must be made to HTML.
Also, it means that we have to put some effort into high-quality printing in the UAs, and I don’t see us doing that…?
I like the idea, since we already have the API documentation of our software generated as HTML. It must not be perfect for printing (I would prefer Latex over XSL-FO/Docbook, but that is not important here).
What I am really missing is the possibility to create a reference to a numbered element: E.g. images are numbered with a chapter prefix and a counter for the image, i.e. a caption like “Fig. 2.3” for the third image in the second chapter.
bq. %&&/%%%$$
%&&&%%%&%
%%&&&&&&&
{text-align:center}Fig. 1.1
This could be easily done using counters. But now I would like to create a reference in the text to this image like, e.g. “see Fig. 2.3” where the “Fig 2.3” is automatically generated. Is this possible?
bq. bla bla bla (see Fig 1.1) bla bla bla
I’m involved in a project that requires a “clean” print option, but none of the developers have specific expertise in printing (nicely) from XHTML. Frankly, we have been dreading the day when we would have to buckle down an learn an unfamiliar print technology. After stumbling on this thread I picked up the free Prince demo today. Within an hour, I was outputting reasonably complex pdf layouts with tables borders, backgrounds and images (oooooooh… ahhhhhhh…) and without touching the original content. The CSS2 implementation is refreshingly solid (this the week that IE7 and FF2 were released). Now, my team is genuinely excited about printing. Offset press publishing might be a stretch, but Prince proves that more modest goals are achievable with a fraction of the effort.