I was dismayed that in your article about eBook Standards, you did not address the issue of eReaders deliberately not following the spec in their benevolent but misguided attempt to display ebooks more attractively to readers (as described at the beginning of this article: http://www.pigsgourdsandwikis.com/2010_03_01_archive.html)
I think it is absolutely essential for Web standardistas to stand up right now and call for eReaders to follow those specs. Otherwise, it is only a matter of time before someone does develop an eReader that listens to the book designer regardless of the standards (Netscape anyone?) But by then it will be too late.
Further, I am so incredibly tired of people being paternalistic about what should be allowed and what shouldn’t. Whether you think video in a book is useful is not relevant to eBook standards. I can think of many instances, particularly in technical books, in which case a short video tutorial would be much more instructive than a series of screenshots.
And Apple says not to use fonts when designing eBooks for the iBookstore, because it “creates a bad user experience”. Excuse me? Get your hands off my design. If you don’t like it, don’t buy it.
Regardless, it’s not for you, Apple, or anyone else to decide on aesthetic grounds. Either it follows the standards, or it does not.
(and frankly, it’s a pain constructing a complicated rebuttal in this tiny text box!)
Copy & paste the code below to embed this comment.
David Leader
When I posted initially I hadn’t actually looked at any ebooks, which is why the idea that they should be done in HTML seemed so bizarre to me. Since then I’ve acquired an iPhone (not an iPad) and have downloaded a few free ebooks in different formats, and the eReader. I can see now why Mr Clark was going on about em-dashes and block paragraphs (hatred of the latter being one area I agree with him) – and to which I’d add dumb quotes – because the typographic execution in the examples I’ve seen (and those in screen shots from paid books) varies from poor to dreadful. I certainly couldn’t bring myself to read anything on my iPhone set like that, and I see little prospect of any improvement, except perhaps on the iPad, where the Kennedy book that Jobs presented may raise expectations. Mr Clark seems to have got his wish for the triumph of HTML in eBooks. He should have perhaps been a bit more careful what he was wishing for.
Copy & paste the code below to embed this comment.
Brian Kim
Thanks Joe Clark. Good points. Many times change for the good means diminishing status for icons.
Thanks a list apart. Another fine topic well presented.
With respect to ebooks. Purchase two books a couple months ago from O’Reilly.
Automating System Administration with Perl, 2Ed CSS Cookbook, 3Ed
Purchased – both print and ebook. It was bundle, and I was curious. For my purposes, with Acrobat 8 Pro, the PDF version has more utility than EPUB. At least for now. BTW, I don’t have a reader.
I viewed the EPUBs with Adobe Digital Editions 1.7.4 (I think) on a Vista PC.
But for those interested, I looked into the EPUB files with PKZIP. Perhaps some of the other commenters may be interested.
I unzipped the EPUB file and looked inside the files with Wordpad. There’s both HTML and XML inside.
Eg.for Automating System Administration with Perl, 2Ed
There’s XML here –
in OEBPS/content.opf
in OEBPS/toc.ncx
in META-INF/container.xml
and HTML in
in OEBPS/index.html
Kind of quirky, it looks like this –
<?xml version=“1.0” encoding=“UTF-8” standalone=“no”?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd”>
<html ><head>
… etc.
EPUB looks more like a hybrid.
I tried looking here too – OEBPS Container Format (OCF) 1.0 Specification Documents HTML Version
http://www.openebook.org/ocf/ocf1.0/download/ocf10.htm
See APPENDIX B: Example
I’m a writer, a poet at least. My current project, though—a custom document management system to resolve some bad version control problems I run into as I write at home and on the road—led me to see what ALA had to say about the topic.
While I agree with the value of HTML for presenting these documents (which represents the potential for a massive open-standards win over the proprietary systems to which we lend our literary art and scholarship) I’m not sure the XHTML-now approach is the best. I’m inevitably biased by my own proclivities, but I think it’s easier to expect writers, editors and publishers (“book people” as Clark says) to produce well-formed XML with semantic markup in terms they already understand.
I don’t think it’s hard to convince a poet that she can write his poems in XML with tags like and <TITLE> and <STANZA> and <LINE> but it’s going to be a much greater leap (and will, undoubtedly, be pushed off on someone else) to expect her to understand how to use HTML’s not-quite-perfect-yet markup in a semantic manner. Even if a few variations of the XML tags crop up, the sets used will be relatively simple, and this sort of parsing made relatively trivial.
Let writers and publishers invoke their natural vocabularies through XML, and let the standardistas make decisions about how to parse this XML into display-ready XHTML as necessary. Then our parsers can adapt to the further evolution of HTML as a language (and common practice in e-publishing, no doubt) by continually adapting comprehensive XML documents into the markup that makes the most sense today, and markup which makes sense 20 years from now.
The sample chapter is “Chapter 3 The amazing em unit and other best practices”
http://www.pearsonhighered.com/assets/hip/us/hip_us_pearsonhighered/samplechapter/0321193121.pdf
I’m not a poet, but I humbly suggest that markup – HTML, XHTML, or XML, should be easy for a poet. Or for each poem.
To me, from a formatting standpoint, a poem has a lot to do with where the line ends and the “paragraphs”.
“Music is the space between the notes.” Debussy
In this case, I would “try” surrounding the poem with a div and an id. Then use classes with repeating block elements to make related sets of lines look “right”.
If I were generating the content, I might try writing in Notepad without markup tags, and separating each set of lines with a pair of carriage returns and linefeeds. Then, when done, I would turn word wrap off, and save.
This would make if easy to add the markup at the beginning of each “line” or “paragraph”. And somewhat simple to add the closing tag.
Then, into my web page, with something like –
<body>
<div id=“poem”>
</div> <!—close poem>
</body>
… Cut and Paste from the notepad file.
Perhaps the classes might be defined like –
#poem p.haiku
#poem p.limerick
#poem p.the-bop
Just a thought.
Sometimes it can be straight-forward to do a stand-alone markup document. As compared to a multitude of documents and media types in a web site.
Copy & paste the code below to embed this comment.
elmimmo
Dashes.”ƒAs commonly used in print books, em dash (—) with no spaces on either side does not work in onscreen text.
Not only that, in Spanish you just do not use em dash (—) with no spaces on either side. Em dashes —if used correctly in that language, that is— function just like parenthesis if in the middle of the sentence —although you do not close them at the end of one.
Even if the above is incorrect usage in English, I just wanted to illustrate.
They require a space before the interruption (and after, if there is no period), and just like parenthesis, you absolutely do not break the line between it and the letter that sits next to it.
Unfortunately, “Unicode’s Line Breaking Algorithm”:http://unicode.org/reports/tr14/ is English centric (booh!) and says that em dash “provides a line break opportunity before and after the character”, a complete aberration in Spanish typesetting (should be the exact opposite). As a result, pretty much any engine that displays text on screen, modern or old (including of course any browser or ebook reader out there) is chopping lines in Spanish text leaving orphan em dashes at the end of lines. No single ebook or webpage is surviving this. Unless one goes and manually litters all em dashes with zero width no-break spaces at both sides, which is rather gross.
“Peter’s comment”:http://www.alistapart.com/comments/ebookstandards/P10/#16
We need this also as a base to support annotations. Everyone (Adobe, Stanza, Ibis, Apple) seems to be introducing proprietary solutions for annotations, in some flavor of XML (that gets written into a file and added to the epub manifest? or which lives only in the reader?), but eventually we want to share and aggregate annotations.
“Joe’s comment”:http://www.alistapart.com/comments/ebookstandards/P10/#20
Nonetheless, the problem of a defined structure for annotations is real and rather pressing for production workflow. (Why else are people addicted to the methadone of MS Word?) I have no solution, but then again, I can’t be expected to have one.
Supporting “marginalia”:https://secure.wikimedia.org/wikipedia/en/wiki/Marginalia (annotations, etc.) has been a goal for a long time, and not just for ePubs. See these references:
“Seeing the picture – Crowdsourcing annotations for books (and eBooks)”:http://blog.lib.uiowa.edu/hardinmd/2009/06/08/crowdsourcing-annotations-for-books-and-ebooks/
“From Personal to Shared Annotations”:http://www.csdl.tamu.edu/~marshall/CCM-AJB.pdf
“Social Annotations in Digital Library Collections”:http://www.dlib.org/dlib/november08/gazan/11gazan.html”
“How to express and exchange annotations”:https://github.com/nichtich/marginalia/wiki/Support-of-PDF-annotations focuses on PDF annotation methods.
“The Fascinator”:https://fascinator.usq.edu.au/trac/wiki/Annotate/existing also has some information, as does “WikiPedia’s Web Annotation article”:https://secure.wikimedia.org/wikipedia/en/wiki/Web_annotation
“ncarr’s comment”:http://www.alistapart.com/comments/ebookstandards/P20/#25
This can be over come by simply adding unique identifiers to the document objects for use as anchors. This is almost a job for microformats”¦
I disagree. I think it should be based on “DocBook”:http://www.docbook.org/ or some other XML format. (In DocBook, it’s a solved problem.) DocBook has support for several missing features of ePub: <chapter>, <section>, <sidebar>, <equation>, <figure>, <footnote>, <annotation>, <set> (a collection of books-like an encyclopedia or The Art of Computer Programming), as well as support for “MathML”:http://www.w3.org/Math/ and “SVG”:http://www.w3.org/Graphics/SVG/ .
Still, as simple as the solution is, it would be great if someone would take the lead and publish some basic conventions so that the problem didn’t have to be solved and re-solved over and over and over again. The citation industry is both progressive and vibrant but it is geared towards problems a lot bigger than putting a few footnotes in an ebook.
I agree. Having some high quality examples would be beneficial.
“Daniel Bennet’s comment”:http://www.alistapart.com/comments/ebookstandards/P30/#38
First there should be a URL that is associated with the publication. It may be that there are thousands of posted versions of public domain books, but each should have a corresponding URL. On the possibility that each version should have differences, intentional or not, having a separate URL for each instance is important.
This actually exists. See “Document Object Identifier”:https://secure.wikimedia.org/wikipedia/en/wiki/Digital_object_identifier (though this also “has problems”:https://secure.wikimedia.org/wikipedia/en/wiki/Baen_Books#Baen_Digital_Object_Identifiers_.28DOI.29 .)
This is by far one of the most well written and thoroughly researched articles on ebooks and html. Having a little knowledge in html can make a huge difference in creating ebooks. Without it good luck keeping the formatting of your original document. Keep up the great work!
49 Reader Comments
Back to the Articledpapathanasiou
Sigil is an open source project which is also challenging InDesign. It is a WYSIWYG editor which runs on Mac, Windows, and Linux.
Elizabeth Castro
I was dismayed that in your article about eBook Standards, you did not address the issue of eReaders deliberately not following the spec in their benevolent but misguided attempt to display ebooks more attractively to readers (as described at the beginning of this article: http://www.pigsgourdsandwikis.com/2010_03_01_archive.html)
I think it is absolutely essential for Web standardistas to stand up right now and call for eReaders to follow those specs. Otherwise, it is only a matter of time before someone does develop an eReader that listens to the book designer regardless of the standards (Netscape anyone?) But by then it will be too late.
Further, I am so incredibly tired of people being paternalistic about what should be allowed and what shouldn’t. Whether you think video in a book is useful is not relevant to eBook standards. I can think of many instances, particularly in technical books, in which case a short video tutorial would be much more instructive than a series of screenshots.
And Apple says not to use fonts when designing eBooks for the iBookstore, because it “creates a bad user experience”. Excuse me? Get your hands off my design. If you don’t like it, don’t buy it.
Regardless, it’s not for you, Apple, or anyone else to decide on aesthetic grounds. Either it follows the standards, or it does not.
(and frankly, it’s a pain constructing a complicated rebuttal in this tiny text box!)
David Leader
When I posted initially I hadn’t actually looked at any ebooks, which is why the idea that they should be done in HTML seemed so bizarre to me. Since then I’ve acquired an iPhone (not an iPad) and have downloaded a few free ebooks in different formats, and the eReader. I can see now why Mr Clark was going on about em-dashes and block paragraphs (hatred of the latter being one area I agree with him) – and to which I’d add dumb quotes – because the typographic execution in the examples I’ve seen (and those in screen shots from paid books) varies from poor to dreadful. I certainly couldn’t bring myself to read anything on my iPhone set like that, and I see little prospect of any improvement, except perhaps on the iPad, where the Kennedy book that Jobs presented may raise expectations. Mr Clark seems to have got his wish for the triumph of HTML in eBooks. He should have perhaps been a bit more careful what he was wishing for.
Brian Kim
Thanks Joe Clark. Good points. Many times change for the good means diminishing status for icons.
Thanks a list apart. Another fine topic well presented.
With respect to ebooks. Purchase two books a couple months ago from O’Reilly.
Automating System Administration with Perl, 2Ed
CSS Cookbook, 3Ed
Purchased – both print and ebook. It was bundle, and I was curious. For my purposes, with Acrobat 8 Pro, the PDF version has more utility than EPUB. At least for now. BTW, I don’t have a reader.
I viewed the EPUBs with Adobe Digital Editions 1.7.4 (I think) on a Vista PC.
But for those interested, I looked into the EPUB files with PKZIP. Perhaps some of the other commenters may be interested.
I unzipped the EPUB file and looked inside the files with Wordpad. There’s both HTML and XML inside.
Eg.for Automating System Administration with Perl, 2Ed
There’s XML here –
in OEBPS/content.opf
in OEBPS/toc.ncx
in META-INF/container.xml
and HTML in
in OEBPS/index.html
Kind of quirky, it looks like this –
<?xml version=“1.0” encoding=“UTF-8” standalone=“no”?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.1//EN” “http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd”>
<html ><head>
… etc.
EPUB looks more like a hybrid.
I tried looking here too –
OEBPS Container Format (OCF) 1.0 Specification Documents
HTML Version
http://www.openebook.org/ocf/ocf1.0/download/ocf10.htm
See APPENDIX B: Example
Have a good week.
Brian
abathur
I’m a writer, a poet at least. My current project, though—a custom document management system to resolve some bad version control problems I run into as I write at home and on the road—led me to see what ALA had to say about the topic.
While I agree with the value of HTML for presenting these documents (which represents the potential for a massive open-standards win over the proprietary systems to which we lend our literary art and scholarship) I’m not sure the XHTML-now approach is the best. I’m inevitably biased by my own proclivities, but I think it’s easier to expect writers, editors and publishers (“book people” as Clark says) to produce well-formed XML with semantic markup in terms they already understand.
I don’t think it’s hard to convince a poet that she can write his poems in XML with tags like and <TITLE> and <STANZA> and <LINE> but it’s going to be a much greater leap (and will, undoubtedly, be pushed off on someone else) to expect her to understand how to use HTML’s not-quite-perfect-yet markup in a semantic manner. Even if a few variations of the XML tags crop up, the sets used will be relatively simple, and this sort of parsing made relatively trivial.
Let writers and publishers invoke their natural vocabularies through XML, and let the standardistas make decisions about how to parse this XML into display-ready XHTML as necessary. Then our parsers can adapt to the further evolution of HTML as a language (and common practice in e-publishing, no doubt) by continually adapting comprehensive XML documents into the markup that makes the most sense today, and markup which makes sense 20 years from now.
Brian Kim
I don’t know anything about the business of publishing, but I use CSS, and this was interesting to me,
“This book has been written entirely in HTML and CSS.”
… page 353, Chapter 18 – The CSS Saga, the last sentence of the last paragraph.
Cascading Style Sheets: Designing for the Web, 3/E
Hakon Wium Lie
Bert Bos,
ISBN-10: 0321193121
ISBN-13: 9780321193124
Publisher: Addison-Wesley Professional
Copyright: 2005
Format: Paper; 416 pp
Published: 04/25/2005
http://www.pearsonhighered.com/educator/product/Cascading-Style-Sheets-Designing-for-the-Web/9780321193124.page#takeacloserlook
The sample chapter is “Chapter 3 The amazing em unit and other best practices”
http://www.pearsonhighered.com/assets/hip/us/hip_us_pearsonhighered/samplechapter/0321193121.pdf
I’m not a poet, but I humbly suggest that markup – HTML, XHTML, or XML, should be easy for a poet. Or for each poem.
To me, from a formatting standpoint, a poem has a lot to do with where the line ends and the “paragraphs”.
“Music is the space between the notes.” Debussy
In this case, I would “try” surrounding the poem with a div and an id. Then use classes with repeating block elements to make related sets of lines look “right”.
If I were generating the content, I might try writing in Notepad without markup tags, and separating each set of lines with a pair of carriage returns and linefeeds. Then, when done, I would turn word wrap off, and save.
This would make if easy to add the markup at the beginning of each “line” or “paragraph”. And somewhat simple to add the closing tag.
Then, into my web page, with something like –
<body>
<div id=“poem”>
</div> <!—close poem>
</body>
… Cut and Paste from the notepad file.
Perhaps the classes might be defined like –
#poem p.haiku
#poem p.limerick
#poem p.the-bop
Just a thought.
Sometimes it can be straight-forward to do a stand-alone markup document. As compared to a multitude of documents and media types in a web site.
Have a good weekend.
elmimmo
Not only that, in Spanish you just do not use em dash (—) with no spaces on either side. Em dashes —if used correctly in that language, that is— function just like parenthesis if in the middle of the sentence —although you do not close them at the end of one.
Even if the above is incorrect usage in English, I just wanted to illustrate.
They require a space before the interruption (and after, if there is no period), and just like parenthesis, you absolutely do not break the line between it and the letter that sits next to it.
Unfortunately, “Unicode’s Line Breaking Algorithm”:http://unicode.org/reports/tr14/ is English centric (booh!) and says that em dash “provides a line break opportunity before and after the character”, a complete aberration in Spanish typesetting (should be the exact opposite). As a result, pretty much any engine that displays text on screen, modern or old (including of course any browser or ebook reader out there) is chopping lines in Spanish text leaving orphan em dashes at the end of lines. No single ebook or webpage is surviving this. Unless one goes and manually litters all em dashes with zero width no-break spaces at both sides, which is rather gross.
matt bear
Supporting “marginalia”:https://secure.wikimedia.org/wikipedia/en/wiki/Marginalia (annotations, etc.) has been a goal for a long time, and not just for ePubs. See these references:
“How to express and exchange annotations”:https://github.com/nichtich/marginalia/wiki/Support-of-PDF-annotations focuses on PDF annotation methods.
“The Fascinator”:https://fascinator.usq.edu.au/trac/wiki/Annotate/existing also has some information, as does “WikiPedia’s Web Annotation article”:https://secure.wikimedia.org/wikipedia/en/wiki/Web_annotation
I disagree. I think it should be based on “DocBook”:http://www.docbook.org/ or some other XML format. (In DocBook, it’s a solved problem.) DocBook has support for several missing features of ePub: <chapter>, <section>, <sidebar>, <equation>, <figure>, <footnote>, <annotation>, <set> (a collection of books-like an encyclopedia or The Art of Computer Programming), as well as support for “MathML”:http://www.w3.org/Math/ and “SVG”:http://www.w3.org/Graphics/SVG/ .
I agree. Having some high quality examples would be beneficial.
This actually exists. See “Document Object Identifier”:https://secure.wikimedia.org/wikipedia/en/wiki/Digital_object_identifier (though this also “has problems”:https://secure.wikimedia.org/wikipedia/en/wiki/Baen_Books#Baen_Digital_Object_Identifiers_.28DOI.29 .)
Beginner eBook Publishing
This is by far one of the most well written and thoroughly researched articles on ebooks and html. Having a little knowledge in html can make a huge difference in creating ebooks. Without it good luck keeping the formatting of your original document. Keep up the great work!