Do you know what the browser support is like for the Page Media Module? Prince and Antenna House are great, but there would be no need for them if we could just use cmd + p and the native (in os x anyway) save as pdf.
Copy & paste the code below to embed this comment.
Nellie McKesson
@afonsoduarte AFAIK, there’s no browser support for paged media yet, though I believe the intention is that paged media will work on the web eventually as well. It would be pretty awesome to be able to set web content up in fixed pages—goodbye epub, hello streaming books!
If you go into a bookshop today the chances are you’ll be surrounded by books produced from markup. The world’s largest publishers (and many smaller publishers) use XML (including XHTML) and XSLT to produce XSL-FO and make books. So when you start out with, “While historically, it’s been difficult at best to create print-quality PDF books from markup alone” I can accept that markup is difficult for many people but maybe the effect of the sentence is misleading.
It’s great to see CSS starting to catch up (overall it’s still a long way behind, but vendor extensions are bridging the gap, and in some ways CSS 3 is ahead). We can expect improvements in the CSS support for print and for paged media in general over the coming months and years.
Copy & paste the code below to embed this comment.
Nellie McKesson
@barefootliam Thanks for pushing for clarification. Yeah, at O’Reilly we’ve been using an XML->XSL-FO toolchain for many years to produce our books. The difficult part is the FO, as it’s a tricky language that creates a barrier between the common man and beautiful PDFs created from markup. XSL-FO isn’t an impossible language to learn, but it doesn’t have the simplicity that CSS has, which is why CSS3 support is so potentially revolutionary. The time it takes us to iterate and create new templates with CSS3 is 1/3 or less the time it would have taken with XSL-FO, and the formatting control that was previously silo’d in the hands of elite FO coders is now available to anyone with CSS knowledge.
It’s probably worth pointing out that WeasyPrint [1] is an open-source HTML+CSS->PDF system, written in Python. Simon Sapin, the lead developer, is very active on the W3 CSS mailing lists. It’s very definitely not at the same level as Prince and the like, but it’s certainly worth checking out.
Copy & paste the code below to embed this comment.
bowerbird
nellie, after just skimming your article,
my head hurts! so it’s no wonder that
you drink bourbon and play the drums.
if my job was this difficult, i would too.
fortunately, there are ways to do this
that are much simpler to accomplish.
i’m sure you’ll learn about them soon…
Copy & paste the code below to embed this comment.
bowerbird
jeffrey zeldman said:
> Snark aside, to which specific methods
> of easy epub creation are you referring?
well, mr. zeldman, i will be
happy to give you a pointer.
but first you should tell me
if you consider “snark” to
be a good and funny thing,
or a bad and nasty thing…
because i just said “i love you”
(yes, you, jeffrey zeldman)
in a comment earlier today:
> http://blog.readability.com/2012/06/announcement/#comment-3418
so… you know, it would be…
ironic if you’re criticizing me
while i’m professing fandom.
(but… you’re in luck, because
i like irony as well as snark!)
and even if it was “snark”, hell,
i’m sure nellie can take it well.
after all, she uses sticks to beat
an instrument to produce music, and drinks her share of bourbon,
so i’d guess she’s a tough chic…
(and if you’re wondering about her
sense’o’humor, grok her twitter pic.)
anyway, you let me know, jeffrey!
i will check back here tomorrow…
meanwhile, these underscores
and asterisks i have used here,
but not the non-link above, might
give a little clue about my answer.
-bowerbird
p.s. oh, i should also note that
this article is really on the topic
of turning (x)html into a .pdf,
rather than an .epub, but i will
be happy to give you a pointer
on the overall question of using
a “master” source-text to attain
many different formats we want
(e.g., .epub, .mobi, .pdf, .html5).
@Nellie McKesson, yes, I agree 100% that FO is too hard.
The challenge will be to make CSS as powerful without also making it too hard.
The current drafts for CSS don’t support multiple streams of footnotes, or “page 3 or 5” or “this page left intentionally blank” or stacked marginalia or collapsing index entries into ranges; there are products (e.g. from Antenna House) that have extended CSS as a basis for some or all of this, but such extensions are product-specific and in some cases may even be customer-specific rather than general.
So I’m hoping that over the next few years we get to the point where HTML + CSS can be used for the majority of the world’s books, without an unacceptable loss of functionality or beauty.
Copy & paste the code below to embed this comment.
Nellie McKesson
@barefootliam I agree that there’s still plenty to add/polish in the paged media spec, and your point about being careful not to turn CSS into FO by building in all that same messy functionality is a very good one.
We actually haven’t had to lean too heavily on Antenna House extensions (I think we really only use them for floats, which are a pain right now), and have managed to replicate or improve our old FO layout. Granted our book designs aren’t exactly the most intricate things, but the point is I think it’s possible, and closer than you think!
@Nellie McKesson actually I’ve a pretty good idea how close it is :-) and agree it’s possible (and that floats are a pain). We’re hoping to write up some of the places CSS would need enhancements, in the W3C Print and Page Layout community group, and then to move on to proposals and actually making it happen.
Really I just wanted to remind readers here that in fact people have been producing (with varying amounts of difficulty) printed books from XML (and SGML before it) for decades, and that it’s not even a rare and unusual thing to do :-)
Copy & paste the code below to embed this comment.
whole1959
Long before xhtml, xml, html, publishing companies and large typesetting foundries used sgml. Actually; html was derived from sgml. Sgml was complex and huge/bulky so in the above mentioned way is good as it is always good to separate data, from form and function.
Do you really think that CSS starting to catch up. I hope you are indeed right that we can expect improvements in the CSS support for print and for paged media in general over the coming months and years barefootliam.
Copy & paste the code below to embed this comment.
TellMeMore
This might seem like a silly question, but it is actually a serious one. What if I wanted to publish looseleaf supplements to existing publications? That is, updated page groups to replace pages in existing printed books. (The books are in something like a three-ring binder, so pages can easily be replaced.)
Some of the unusual things I’d want to do would be:
specify arbitrary page breaks (but only temporarily—for printing purposes)
override automatic list numbering (because what if my updated page starts with list item 32?)
have the ability to do plenty of manipulation with page numbers (if I’m putting in more pages than I took out, I’ll be making “point pages.”)
“scrape” information from headings, manipulate it, and show it in dictionary-style guide heads/ears, as with Word’s “StyleRef” when used inside a field in a page header
… So, for anyone in the know… can XHTML with CSS3 do any or all of these things?
I’m looking in the direction of PrinceXML. Am I likely to find what I need?
22 Reader Comments
Back to the Articleafonsoduarte
Nice and thorough article.
Do you know what the browser support is like for the Page Media Module? Prince and Antenna House are great, but there would be no need for them if we could just use cmd + p and the native (in os x anyway) save as pdf.
Nellie McKesson
@afonsoduarte AFAIK, there’s no browser support for paged media yet, though I believe the intention is that paged media will work on the web eventually as well. It would be pretty awesome to be able to set web content up in fixed pages—goodbye epub, hello streaming books!
Charlie Clark
Opera released a labs build with support for this. It’s fantastic and completely transforms the experience of reading in the browser.
http://dev.opera.com/articles/view/labs-more-fun-using-the-web-with-getusermedia-and-native-pages/
sjs
I’ve used mPDF int he past to create PDFs for print using CSS. It supports functions like page breaks, but unfortunately doesn’t do so via CSS.
There are a lot of issues with mPDF, but if you can fight through them, it’s the best free tool for HTML->CSS that can do page breaks.
Nellie McKesson
A couple Google+ readers also just pointed me to this tool: http://code.google.com/p/wkhtmltopdf/
It’s built on webkit, so it’s not trying to reinvent the wheel. I haven’t played with it yet, but I’ll poke around as soon as I can.
barefootliam
If you go into a bookshop today the chances are you’ll be surrounded by books produced from markup. The world’s largest publishers (and many smaller publishers) use XML (including XHTML) and XSLT to produce XSL-FO and make books. So when you start out with, “While historically, it’s been difficult at best to create print-quality PDF books from markup alone” I can accept that markup is difficult for many people but maybe the effect of the sentence is misleading.
It’s great to see CSS starting to catch up (overall it’s still a long way behind, but vendor extensions are bridging the gap, and in some ways CSS 3 is ahead). We can expect improvements in the CSS support for print and for paged media in general over the coming months and years.
Nellie McKesson
@barefootliam Thanks for pushing for clarification. Yeah, at O’Reilly we’ve been using an XML->XSL-FO toolchain for many years to produce our books. The difficult part is the FO, as it’s a tricky language that creates a barrier between the common man and beautiful PDFs created from markup. XSL-FO isn’t an impossible language to learn, but it doesn’t have the simplicity that CSS has, which is why CSS3 support is so potentially revolutionary. The time it takes us to iterate and create new templates with CSS3 is 1/3 or less the time it would have taken with XSL-FO, and the formatting control that was previously silo’d in the hands of elite FO coders is now available to anyone with CSS knowledge.
barryvan
It’s probably worth pointing out that WeasyPrint [1] is an open-source HTML+CSS->PDF system, written in Python. Simon Sapin, the lead developer, is very active on the W3 CSS mailing lists. It’s very definitely not at the same level as Prince and the like, but it’s certainly worth checking out.
[1] http://weasyprint.org/
bowerbird
nellie, after just skimming your article,
my head hurts! so it’s no wonder that
you drink bourbon and play the drums.
if my job was this difficult, i would too.
fortunately, there are ways to do this
that are much simpler to accomplish.
i’m sure you’ll learn about them soon…
-bowerbird
Jeffrey Zeldman
Dear Bowerbird:
Snark aside, to which specific methods of easy epub creation are you referring? URLs, please. Thanks.
bowerbird
jeffrey zeldman said:
> Snark aside, to which specific methods
> of easy epub creation are you referring?
well, mr. zeldman, i will be
happy to give you a pointer.
but first you should tell me
if you consider “snark” to
be a good and funny thing,
or a bad and nasty thing…
because i just said “i love you”
(yes, you, jeffrey zeldman)
in a comment earlier today:
> http://blog.readability.com/2012/06/announcement/#comment-3418
so… you know, it would be…
ironic if you’re criticizing me
while i’m professing fandom.
(but… you’re in luck, because
i like irony as well as snark!)
and even if it was “snark”, hell,
i’m sure nellie can take it well.
after all, she uses sticks to beat
an instrument to produce music,
and drinks her share of bourbon,
so i’d guess she’s a tough chic…
(and if you’re wondering about her
sense’o’humor, grok her twitter pic.)
anyway, you let me know, jeffrey!
i will check back here tomorrow…
meanwhile, these underscores
and asterisks i have used here,
but not the non-link above, might
give a little clue about my answer.
-bowerbird
p.s. oh, i should also note that
this article is really on the topic
of turning (x)html into a .pdf,
rather than an .epub, but i will
be happy to give you a pointer
on the overall question of using
a “master” source-text to attain
many different formats we want
(e.g., .epub, .mobi, .pdf, .html5).
amit
nice article
barefootliam
@Nellie McKesson, yes, I agree 100% that FO is too hard.
The challenge will be to make CSS as powerful without also making it too hard.
The current drafts for CSS don’t support multiple streams of footnotes, or “page 3 or 5” or “this page left intentionally blank” or stacked marginalia or collapsing index entries into ranges; there are products (e.g. from Antenna House) that have extended CSS as a basis for some or all of this, but such extensions are product-specific and in some cases may even be customer-specific rather than general.
So I’m hoping that over the next few years we get to the point where HTML + CSS can be used for the majority of the world’s books, without an unacceptable loss of functionality or beauty.
Nellie McKesson
@barefootliam I agree that there’s still plenty to add/polish in the paged media spec, and your point about being careful not to turn CSS into FO by building in all that same messy functionality is a very good one.
We actually haven’t had to lean too heavily on Antenna House extensions (I think we really only use them for floats, which are a pain right now), and have managed to replicate or improve our old FO layout. Granted our book designs aren’t exactly the most intricate things, but the point is I think it’s possible, and closer than you think!
barefootliam
@Nellie McKesson actually I’ve a pretty good idea how close it is :-) and agree it’s possible (and that floats are a pain). We’re hoping to write up some of the places CSS would need enhancements, in the W3C Print and Page Layout community group, and then to move on to proposals and actually making it happen.
Really I just wanted to remind readers here that in fact people have been producing (with varying amounts of difficulty) printed books from XML (and SGML before it) for decades, and that it’s not even a rare and unusual thing to do :-)
Webdesign Den Haag
Thanks for sharing! Great article but kinda tough
Christoph Päper
It’s been “˜::before’ and “˜::after’ for quite some time now.
Christoph Päper
The comma between name and type in the “˜attr()’ pseudo-fuction is gone, it only appears in front of the fallback value. So it’s “attr(href url)”.
whole1959
Long before xhtml, xml, html, publishing companies and large typesetting foundries used sgml. Actually; html was derived from sgml. Sgml was complex and huge/bulky so in the above mentioned way is good as it is always good to separate data, from form and function.
bowerbird
still no response from mr. zeldman, over a week later.
i guess he wasn’t interested in conversation after all.
-bowerbird
Smartycrowd
Do you really think that CSS starting to catch up. I hope you are indeed right that we can expect improvements in the CSS support for print and for paged media in general over the coming months and years barefootliam.
TellMeMore
This might seem like a silly question, but it is actually a serious one. What if I wanted to publish looseleaf supplements to existing publications? That is, updated page groups to replace pages in existing printed books. (The books are in something like a three-ring binder, so pages can easily be replaced.)
Some of the unusual things I’d want to do would be:
… So, for anyone in the know… can XHTML with CSS3 do any or all of these things?
I’m looking in the direction of PrinceXML. Am I likely to find what I need?
Thanks!