The Look That Says Book
Issue № 313

The Look That Says Book

The vast majority of books and magazines are typeset using hyphenation and justification
(written as H&J from here on in). In print, it’s everywhere: All lines of text except the last lines of paragraphs are stretched out to the same length. Flush left and flush right. Hyphens are used to break words at the end of lines to help prevent gaps in word spacing. Like this:

Article Continues Below

We hold these truths to be self-ev­i­dent, that all men are cre­at­ed e­qual, that they are en­dowed by their Cre­a­tor with cer­tain un­al­ien­a­ble Rights, that a­mong these are Life, Lib­er­ty and the pur­suit of Hap­pi­ness. That to se­cure these rights, Gov­ern­ments are in­sti­tut­ed a­mong Men, de­riv­ing their just pow­ers…

In contrast, nearly all text on the web is set flush left, with no hyphens at the end of lines. (This assumes a left-to-right Latinate language like English.) In the world of print, this is sometimes called “ragged right” or a “hard rag” because of the sawtoothed edge created on the right by the uneven line lengths. Today on the web, it’s nearly universal:

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. That to secure these rights, Governments are instituted among Men, deriving their just powers…

This no longer needs to continue as it has. And if the many criticisms of iPad typography are any guide, for many design niches like eBooks, it shouldn’t continue if customer expectations are to be met.

But what to do? Well, few web designers are aware of it, but H&J can be a part of their work today. First, a quick look at the history.

Just a dash, please#section1

The hyphen was carried forward from the world of handwritten manuscripts and into the world of print with Johannes Gutenberg’s system of movable type. However, in movable type, the hyphen also solved a mechanical problem:

The Gutenberg printing press required words made up of individual letters of type to be held in place by a surrounding non-printing rigid frame. Gutenberg solved the problem of making each line the same length to fit the frame by inserting a hyphen as the last element at the right side margin. This interrupted the letters in the last word, requiring the remaining letters be carried over to the start of the line below.

Gutenberg’s hyphen was a short, double line, inclined to the right at a sixty degree angle. It looked like this:

Example of Gutenberg's hyphen

Fig 1. Example of Gutenberg’s hyphen.

For Gutenberg, the hyphen served a dual purpose. It provided the spacer block necessary to bring the line of type flush to the inside of the holding frame, while at the same time, it printed a character that announced its purpose to the reader. The hyphen says to the reader, in effect: “Pardon me while I break this word and end the line right here. I’m doing this to preserve the overall look of the text. Ignore me as best you can.”

In this, the hyphen makes a small demand in exchange for a larger aesthetic payoff. If you take a long look at a column of type from one of Gutenberg’s bibles, you’ll find vibrancy and balance. Now, the mechanical problems of movable type are long gone, of course, and typesetting has been digital for decades. Yet H&J is still predominant: the payoff remains.

The hyphen says: “Hey, it still looks good, right?” And it’s hard to argue with the habits and expectations of readers that have built up over five centuries of practice. If you want the look that says book, hyphenation and justification bring the weight of history to bear.

Using hyphenation and justification today#section2

When it comes to new browser features, Flash-y effects get the glory and so it’s no surprise that support for a special unicode font character called the soft hyphen would go largely unnoticed. But the soft hyphen is the key to good-looking hyphenation and justification. And over the years it’s gained support in every A-grade browser: IE6+, Opera 7.1+, Safari 2+, Firefox 3+, and Chrome. This, combined with a little JavaScript jiggery, makes H&J a viable design technique today.

The soft hyphen#section3

What’s a soft hyphen? The HTML spec says:

In HTML, there are two types of hyphens: The plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur.

Those browsers that interpret soft hyphens must observe the following semantics: If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.

In HTML, the plain hyphen is represented by the “-” character (- or -). The soft hyphen is represented by the character entity reference ­ (­ or ­)

OK. So how does it work and what do you do? Here are the main considerations:

Coding the word breaks#section4

When you insert ­ (or ­) within a word, it signals the browser that it’s okay to break the word in that particular spot if doing so helps preserve the integrity of the word spacing. In other words, when deciding whether to break a word at the end of a line, the browser will give a greater priority to maintaining uniform word spacing. Let’s say, for example, the word is “constitution.” “Constitution” can be carved up at three spots, like this: con-sti-tu-tion.

So HTML like this—con­sti­tu­tion—tells the browser that if it needs to wrap a part of that word to the next line to preserve word spacing, it’s okay to wrap it. And if it does, the word can be broken up at any one of the three spots where ­ is inserted. (Note: As you’ll see, hard coding it like this in the HTML is not recommended. This is just an explanation of how it works.)

Hyphens appear where needed automatically#section5

The soft hyphen is an actual character in the font. But the browser will only display it if the word is broken at the end of a line. This show/hide behavior happens automatically.

Apply soft hyphens at all possible breaks#section6

Text on the web can change: Column widths resize along with different window sizes, devices, zoom levels, and text size selections. There is no practical way to predict exactly where and how lines of text will wrap. This is an unavoidable side effect of one of the great features of electronic text.

Completely at odds with the fixed nature of print, this leads inescapably to the right way to apply soft hyphens in HTML: Soft hyphens should be inserted at all possible hyphenation points. Now, at first glance this may seem inelegant and wasteful, but when soft hyphens are added programmatically, as you’ll soon see, it’s not a problem at all.

As an example, here is a sample page with soft hyphens hard coded into the HTML text. (The online tool Hypho-o was used to insert the soft hyphens.) Resizing the browser window or zooming larger or smaller will reflow the text and show how the browser preserves word spacing while hyphens appear and disappear at the ends of each line as needed.

The downsides of hard coding#section7

Hard coding soft hyphens is a good path to understanding how they work, but a bad thing to do in practice. Soft hyphens make the HTML text hard to read and edit. Additionally, they may create difficulties for search engines. Users can’t turn soft hyphenation on and off with a simple UI widget. Using JavaScript to apply soft hyphens makes a lot more sense and works quite well.

Hyphenator.js#section8

By far the most mature library for hyphenation in HTML is Hyhenator.js by Mathias Nater. Hyphenator.js relies on the same data compression algorithms and hyphenation dictionaries found in products like TEX (for which it was originally developed by Franklin Liang in 1983), Open Office, and the HTML to PDF converter Prince which implements the CSS3 Paged Media Module.

Here is a simple page containing both English and German text. There is a toggle widget in the upper right for turning hyphenation on and off. There is also a bookmarklet version of Hyphenator.js.

Based on a Project Gutenberg HTML edition of Joseph Conrad’s Heart Of Darkness, here are some simple examples of the first chapter, each using the same modified version of Hyphenator.js 2.0 and the Sizzle selector engine, with the font size adjusted for the following devices:

Hyphenator.js also has a merge-and-pack tool for creating an optimized and minified single JavaScript file, as well as instructions for rolling your own. Remember that hyphenation is basically a search and replace. If there’s a lot of hyphenation on the page, some delay in page display may be unavoidable. Hyphenator.js also inserts the zero width space (ZWS) character for intelligent URL line wrapping.

The zero width space (ZWS)#section9

The zero width space is essential to getting a good result with H&J. It’s encoded as . Kingdesk Web Design, who have done considerable work on the problem of hyphenation, describes the zero width space this way:

Similar to the soft hyphen, the zero space character communicates allowable line breaks within strings of text. But unlike the soft hyphen, it does not show a hyphen at line’s end. This is ideal for forcing consistent wrapping of long URLs. It also can be used to force line breaks in uncooperative web browsers after hard hyphens in words like “zero-space” and “soft hyphen”.

To control line wrapping problems when long strings are created with “hard” hyphens (or the en dash () or em dash () characters), or when the browser might be confused on where to break a string when using characters such as ( )[ ] { } « » % ° · / ! ?, the ZWS can provide the browser with useful hints on what to do.

For example, to preserve readability, the following tells the browser it’s okay to wrap after a hard hyphen but not before:

The zero-​width space.

For wrapping long URLs, the ZWS is inserted following forward slashes:

http://"‹​code.​"‹google.​"‹com/​"‹p/​"‹hyphenator/

All of this is preferably done with JavaScript. But as a matter of page load time and practicalities, hard coding the ZWS here and there as you need to doesn’t have any serious downsides.

Select/copy/paste#section10

The soft hyphen is a character in the font with its own Unicode designation. This means that in a copy/paste operation, the soft hyphen travels right along with the other characters.

In a plain text editor it might show up as a question mark. In MS Word, the soft hyphens will be stripped, unless you choose “text only” formatting. Search engines like Google or Bing will ignore them when pasted into the search box.

The bottom line is that browsers—rightly or wrongly—don’t strip out the soft hyphens automatically on copy. And whether the soft hyphens are hard coded or inserted with script makes no difference. The only surefire solution is to strip the soft hyphens on copy using a script. Thankfully, this was worked out in Sweet Justice—an English-only hyphenation script—by Facebook developer Carlos Bueno. (Source on Github.) This is also the solution in Hyphenator.js as of version 3.0.

The issue of how browsers will handle soft hyphens and other “empty space” characters like ZWS going forward remains to be seen.

Find on this page#section11

Similar to the select/copy/paste problem is find. As of this writing, only Firefox does this correctly in conformance with the HTML spec: “For operations such as searching and sorting, the soft hyphen should always be ignored.” The browser is supposed to ignore the soft hyphens when searching for a word. But in every browser tested other than Firefox, the search goes wrong after the first syllable “con” in the word “constitution” because of the inserted soft hyphen. Similarly, soft hyphens can also cause unwanted spaces within strings when sending text using right click context menus and the like. The receiving apps usually ignore the spaces even though they’re visible, but still, it’s unsettling to the user.

The solutions to these annoyances lie squarely with browser makers.

Book ’em Danno!#section12

High-res displays like the iPhone Retina, convenient e-reading devices like the iPad, and web fonts have brought a new focus on web typography. Hyphenation and justification is an important and time honored technique. Hopefully the information here will help make it an option for onscreen reading sooner, rather than later.

About the Author

Richard Fink

Richard Fink is a web developer and analyst who focuses on web readability. Blogging a lot lately about fonts at Readable Web, he'll also be speaking this year at FontConf in Minneapolis/St. Paul, and the annual ATYPI conference being held in Dublin, Ireland, in 2010.

46 Reader Comments

  1. Clever though this may be, I can’t think why you would want to do it. Force justified text is much less readable than ragged, and causes problems for people with visual difficulties and learning difficulties in terms of comprehension.

    Force justified text is used purely for economic reasons, ie fitting as much text in as possible, very important when paper was a very expensive commodity. Stick to ragged text – it’s better all round.

  2. I think this is the case when using standard CSS justification because the browser does not automatically hyphenate text. This introduces undesirable gaps between words. The article discusses adding hyphenation points which eliminate gaps and ensure readability, just as in print.

  3. I think it’s brilliant that ALA exists to bring content like this. It’s not the kind of subject that will be brought up regularly on most mainstream web-centric websites.

    Working in book publishing, I can see that it’s extremely important to have decent H&J in HTML. More so for eBooks than for web content.

    I think traditional publishers are going to be putting a lot of pressure on eBook producers to keep them as close as possible to the printed books, so whether ragged-right or justified is better on the eyes is a moot point. What’s important is that we can provide an eBook experience which matches (and hopefully exceeds) the paper-based experience.

  4. As I explained to Richard before this article was published, he did *not* make the case that Web typography will in any respect be improved by any of the following:

    * justification
    * hyphenation
    * pollution of source text with unnecessary soft hyphens
    * reliance of source text on JavaScript libraries just to restore essential functions like copy and paste or find

    The indisputable fact remains that hyphenation is taken care of at the *last* stage of layout or rendering *by the layout or rendering engine*. Your page-layout program hyphenates for you. Only if the program makes a mistake do you fix it, and even then you have to be vigilant that later edits do not reverse the original need for the fix. In the E-book case, the E-book reader does the hyphenation. Such purely automated hyphenation will be only marginally better than none.

    In case this is not clear, no, at no time ever should or must or do humans enter thousands of soft-hyphen characters on the off chance software will use them. *Software does the hyphenation*; humans fix its errors — ideally humans who know what they’re doing.

    There very much *is* a substitute for H&J and we’re using it right now. It is called unhyphenated flush-left text. Richard may not want to read a printed novel typeset that way, but everyone except him and a couple of other people with not enough knowledge of the topic seem quite happy with its use on real Web sites. The Web, for the umpteenth time, is not print, nor is it hot-metal type.

    The proposition that Web authors pollute their source text with soft hyphens, which the article admits irreversibly alters that text and makes certain formerly-easy tasks impossible, is much worse than advocating presentational markup like FONT and CENTER (and BLOCKQUOTE to indent) because at least the latter leaves your copy more or less alone. It is just nuts to then state “The bottom line is that browsers — rightly or wrongly — don’t strip out the soft hyphens automatically on copy.”

    Optional returns are as old as WordPerfect 5.1 and are another thing the rendering engine has to handle.

    Web pages are not continuously changing their set width. Your width vs. my width may be two different things, as might a cellphone’s and an iPad’s and a 30-inch display’s, but that is not the same as the article’s implied condition of continuously variable column width happening before our very eyes, like a snake undulating on the waves.

    The Gutenberg history is as irrelevant as the history of print typography. We are not typesetting for print.

    Richard cannot actually prove that “The vast majority of books and magazines are typeset using hyphenation and justification” for the simple reason Richard has not examined “the vast majority” of them.

    The article’s use of the U.S. Constitution as “neutral, generic” body copy is actively offensive to foreign nationals.

    The article falls prey to an error I associate with American Web developers that “special” characters have to be entered using entities of one form or another. They don’t. Not even soft hyphens do.

    The entire concept is *completely wrong* and this article’s endless contortions make that even more apparent.

    What’s next? Instructing us to pollute our text with f-ligatures?

    The solution to full-justified text on electronic displays is not to use it.

    Is this the least factually defensible article ever to run in A List Apart?

  5. First may I say that I found your article informative (I didn’t know that anyone had tried to implement hyphenation in HTML before) and timely, because of the dubious decision to use HTML for electronic books, and the generally appalling typography that has resulted from this. I fully support what you are trying to achieve.

    However the ‘gotchas’ with find, copy and paste would seem to rule this out for the web without the goodwill of the major browser vendors, but I don’t see why it shouldn’t be used for iPhone and iPad ebooks where there may be no need to allow the user to to manipulate the text. I must say I’m thinking of going into that area and will consider it.

    A few niggles:
    1. The link to the hyphenator example doesn’t work. The correct url is:
    http://hyphenator.googlecode.com/svn/tags/Version 2.2.0/WorkingExample.html
    2. Hyphenating ‘equal’ as you have done is not good typographic practice – it looks terrible. Throw your hyphenation dictionary away and get a better one. Any reader who does not know about automatic hyphenation will hardly be convinced by this. Leave the US Constitution be, and choose a text that illustrates your case better.
    3. It’s easy to make the mistake, but looks bad in an article on typography.
    “The soft hyphen is a character in the font with it’s own Unicode designation.” Spot the incorrect apostrophe.
    4. And please let your writing justify good typography. I can’t believe that educated people can write “going forward” when they mean “in future”.

  6. I’m not sure the hyphen was introduced into press printing to preserve the overall look of the text. It was introduced to save space & time.

    Inserting the characters into the frame and discovering the last word was too long to fit the line just took too much time to undo (or continually counting ahead to avoid it). Adding spacers to fit the printing frame would require more spaces (the typesetter would probably run-out) and also waste more space.

    This quickly became standard practice and ‘readability’ only resulted because these days everyone learns to read text in this style, although modern press isn’t following this approach as strictly: This weeks TV guide magazine (What’s On TV) is completely non-hyphenated, today’s Metro newspaper uses a mix. Bulletproof Web Design has a ragged edge AND hyphenation (uh?). The Design Of Everyday Things is ‘properly’ hyphenated. – That’s all I’ve got at my desk…

  7. Very interesting article philosophically, but from a pragmatic point of view, I doubt it’s worth it. Yes, it makes text a bit more readable (which is debatable, as you can easily see from the comments) and a bit more beautiful, but with the cost of around 80 extra KB on the page, which is not trivial (ok, it might get down to 50-60 when minified, but still…)

  8. My question is why are we trying to set type standards on the web based off of print principles? Many ideas do translate from print to web, but from what I read justification of text is not one of them. As Liam Cromar and Mactonex say, most advice on the web typography seems to explicitly state that justification of text lowers readability and accessibility. I do not want to attack anyone, but as a beginning student in web design I’d like some clarity in advice on typography. Really appreciate an answer on this one…

  9. bq. The article’s use of the U.S. Constitution as “neutral, generic” body copy is actively offensive to foreign nationals.

    Not to this foreign national. You can’t prove that for the simple reason that you haven’t asked “foreign nationals”.

  10. @everybody who’s commented:

    I’m in transit, headed to Dublin, Ireland for the ATYPI conference today and ridiculously pressed for time. But I want very much to respond. And I will be back to do just that. I was hoping this article would generate strong debate. So far, so good!

    One quickie before I hit the road:

    @david leader
    >”However the “˜gotchas’ with find, copy and paste would seem to rule this out for the web without the goodwill of the major browser vendors.”

    Remember, they are two separate issues:
    1) The copy and paste issue *was* a deal-killer for me. However, when I saw what Tynt was doing with manipulating text on copy-paste (BTW – Daring Fireball’s John Gruber’s take on Tynt is a riot – check it out) I realized that at least *they* had developed the JavaScript techniques necessary to handle the problem. So I took a fresh look, and it turned out that Carlos Bueno had included a fix in Sweet Justice, and then Mathias Nater included it in Hyphenator.js.
    Since the soft hyphens are being applied via script anyway, I personally don’t have a problem with them being removed via script. What browsers should do natively – well, it seems pretty clear they should be stripping them automatically. But I intend to put the question to browser makers directly and ask if they see any gotchas attached to that.
    2) As to the “Find On This Page” problem, every browser tested except Firefox is nonconformant with the spec. Even though they support the soft-hyphen, the developers might never have thought about the “search and sort” issue. But it’s time that they did. Even if soft-hyphens are inserted directly into the HTML occasionally, there simply shouldn’t be this problem. I don’t think anyone’s going to argue that soft-hyphen support should be left broken halfway like this.
    However, personally, for me, this one’s *not* a deal killer. It wouldn’t necessarily prevent me from using Hyphenator.js in the interim until there’s native browser support with built-in hyphenation dictionaries.

  11. A tool used in the editing process converted forward slashes to Unicode characters, breaking the link to the hyphenator. The link has now been corrected. Our thanks to those readers who noticed and called it to our attention.

    Also, a shameful misuse of “it’s” has been corrected.

  12. This proposed solution does not address the complexity of hyphenation rules. Generally, only the latter portion of a word should be hyphenated, and you don’t want to create very short hyphenation stubs. “Ad-vertising” should not be hyphenated– better to bump the two “ad” characters to the next line. But “Adver-tising” is an acceptable division of the word at the end of the line. (See rule 4 at http://englishplus.com/grammar/00000129.htm). Blindly hyphenating on syllable divisions will result in a poorly-hyphenated document. Rather than make the document appear professionally typeset, it will instead appear to be h+j by an amateur who’s trying too hard to be sophisticated, sort of like someone who pretentiously uses “I” when “me” would, in fact, be the correct pronoun.

    I agree with many posters that h+j is of questionable benefit in many situations. However, there are times where I would like to use it, so I would like to see a browser-based solution in the future. Perhaps if CSS text-justify automatically invoked an intelligent hyphenation engine, that would work; or perhaps we proopose adding in a CSS hyphenation property to the CSSx spec.

    I also agree that this proposed solution is too kludgy. Too many dependencies; our applications and pages are already becoming overladen with JavaScript and CSS.

  13. The first thing that came to my mind for practical use was not to hyphenate body text (jagged right is just fine for me, personally). Instead, I could see this being useful for really long names and titles for objects that appear in a tight grid.

    Rather than a long name or word breaking out of the grid and crashing with the object beside, it would be much better if a browser could just apply a hyphen somewhere between “St. Petersburg/Clearwater”if needed. Hiding the hyphen if there is enough space.

    Server side applications could handle this too, but perhaps a server side application could intelligently apply ­ to long titles and words, while leaving other critical information alone. This is just the thought process I have when it comes to imagining how to implement this where I work.

    Cheers.

  14. This proposed solution does not address the complexity of hyphenation rules.

    It does a better job of doing so than you might think. Hyphenator.js doesn’t just hyphenate on syllable divisions, it is based on an algorithm derived from extensive study of the hyphenation points listed in dictionaries, which, according to its author (Frank Liang), finds 89% of the allowed hyphenation points, with no false positives.

  15. pakjeem said:
    > our applications and pages are already
    > becoming overladen with JavaScript and CSS.

    thank you! i was hoping someone would say that.

    combined with all the other gotchas involved here
    — copy problems, and search glitches (which _are_
    a “deal-killer” for me, as an end-user, richard, and
    tell me, does “deal killer” have a hyphen in it, or not?)
    — let’s just ignore these obsessive-compulsives, ok?

    my goodness, they’d have us jump through all kinds of
    hoops, just so the right-margin will be perfectly straight.

    besides, there’s a better solution. i won’t give it away in
    a comment, but if “a list apart” wants an article from me,
    i’d be happy to write it up.

    -bowerbird

  16. First, Richard, your example of hyphenated and justified text was really pretty awful – two hyphens on successive lines, quite awful word spacing on some lines. You did the case for hyphenation more harm than good.
    Second, justified text – with word-spacing on successive lines varying only within tight limits – is the best and most relaxing text to read. It suits the way the human visual system – eye, optic nerve, and brain – works. See some very early posts on my blog for much more detail: http://billhillsblog.blogspot.com
    and also my paper, The Magic of Reading , at:
    http://www.billhillsite.com/osprey.doc.
    Please excuse this broken website if you go there. It was optimized for IE when I worked at Microsoft and I haven’t updated it, now I work all the time on FF on the Mac.
    The only way to achieve that is through good hyphenation, dictionary-based not algorithmic.
    Throwing discretionary hyphens into all text is a pretty inelegant solution.
    Hyphenator.js works pretty well but does introduce lag into the system. And it has dictionaries for only three or four languages (or did the last time I looked).The browsers should implement dictionary-based hyphenation. Then they could hyphenate, space the lines properly and then render them.
    It’s not rocket-science. Desktop publishing apps have been doing this for decades.

  17. From Erik Spiekermann’s Typo Tips (http://fontfeed.com/archives/erik-spiekermanns-typo-tips/):

    “6. Not Justified
    Avoid flush settings! Most applications create justified text by hideously stretching and squishing words and spaces. Note that it takes many hours of tedious work to typeset justified text that is truly well-proportioned and legible. For this reason, professionals prefer to use ragged-right composition, either with or without hyphenation, depending on how much line-length variation they wish to allow. This gives the text a more harmonious appearance and makes it easier to read, since all wordspaces have the same width.”

    If ragged right is more legible for print, why wouldn’t it be more legible online, too? Justified text is nice looking in the abstract, as it’s a nice block, but it’s often a pain to read. And note that Spiekermann says that it takes hours to properly justify text. It can take hours to properly set ragged right text in a book. Why are we trying to replicate that workflow online? What user of a corporate CMS is going to take the time to properly hyphenate everything?

    And while I’m ranting, what is up with the ever-growing pile of javascript plug-ins we’re being told we should have on our sites? What happened to streamlined, simple development?

  18. One thing I noticed right away is that hyphenation really hampers the readability of a text the bigger the line-length gets. Stopping halfway a word to travel all the way back to the left and pick up is pretty bothersome. As it is hard to predict how wide a text area will be (and how far the eyes will travel), this sounds like a less than ideal solution for large chunks of electronic text.

  19. bq. Second, justified text — with word-spacing on successive lines varying only within tight limits — is the best and most relaxing text to read

    Bill, although I agree that going the Javascript route to do H&J is not A Good Thing, I tend to question the very principle of doing H&J on a systematic basis. Do you have any pointers (I mean, peer reviewed papers) that demonstrate that justified text is the “best and most relaxing” ?

    To me this seems counter-intuitive, except perhaps for specific widths, because I feel it harder to bring my gaze back to the correct line after an EOL when the text is justified. It is also the opinion of Edward Tufte, among others, if I am not mistaken. When the text is justified, it becomes a block of indistinguishable lines, and it takes more time to find which one you were reading, and which is the next, if your attention drops for an instant.

  20. H&J without carefully considered H&J rules (like those found in inDesign) seems a pointless endeavor. I shudder to think of the less typographic-savvy web designers who will interpret, “We can hyphenate!” as “We can justify!”, producing web pages full of rivers.

    *Let’s not forget*: Ragged text columns don’t have to be hard-set. They can hyphenate as well, to create a more even edge. _This_ is where the focus should be right now.

  21. Hi

    Thanks to Richard Fink and ALA for this article and for the pointer to my Hyphenator.js poject. As its developer I’d like to share my thoughts, too.

    And special thanks to all those critic voices, too. I think those are the most important! I’m currently getting lots of mails and requests about my project.
    There are many good ideas about how to improve Hyphenator.js and I will implement most of them in the coming weeks. Hyphenator.js is a growing project that I’m maintaining in my spare time (@Bill Hill: There’s currently support for over 30 languages, more to come!).
    Just one appeal: RTFM! You’ll save my time and for many things there’s already a solution.

    @those being doubtful about readability of hyphenated text
    IMO this highly depends on literacy of the reader. Reading lots of books is getting used to hyphenated words since most books are set with H&J. (Want a prove? Go to the library!) Personally I’m a fast, experienced reader and capturing a hyphenated word isn’t that difficult to me. Together with the context the first syllables are enough to guess the whole word. So hyphenation gives me a valuable hint about being on the right line when my eyeballs jump to the next line (unliterally spoken 😉
    True, hyphenation may be not that easy to read for less experienced or handycapped readers and reading webpages is generally not the same as reading printed text. But what about eBooks? What about optional hyphenation?
    This article does not say you should do hyphenation in every text. It says: “it can be part of your work”. I.e. you can use it now, if you feel like or your customers make you using it.

    @those saying unhyphenated flush-left text is a good alternative
    This my be true for english texts with short words on average. It isn’t true for other languages with longer words. In German there are lots of very long compound words:
    adjudication of the federal administrative court (en)
    Bundesverwaltungsgerichtsentscheid (de)
    I recently read this word in an article on the front page of nzz.ch; it didn’t fit the layout!

    @those complaining about the size of the hyphenator script
    I completely agree. It’s too large (largest part are the hyphenation patterns). But lets take this ALA-article as an example. The JPEG on top of it (the one with the sliced carrots) is about 45KB. It looks nice but it not relevant to the the content, too”¦
    The script and the the patterns are cached if you set it up correctly and can be reused on every page of your website.

    @those who don’t trust automatic hyphenation
    I’m using a quite old and sophisticated algorithm originating in TeX. To compute the patterns a list of hyphenated word is used. The better this list the better the patterns, the better are the results of hyphenation. There’s ongoing work on a list for german patterns (http://groups.google.de/group/trennmuster-opensource) but afaik not for english”¦

    @those who say “this should be done by the browser”
    Definitely. I hope that once upon a time a browser will fix it’s text layout engine and do hyphenation (according to CSS 3). Until then Hyphenator.js is just a crutch.

    Finally: I was quite surprised by some very determinating declararions in the discussion above. Nobody drives you into using Hyphenator.js and piling up scripts on your website. We’re not as free in all our decisions as we may think but in this case we are very free. Hyphenator.js is just an option and it may be valuable for some cases and completely disappropriate for others. But I think it’s great to have this option.

    Kindly, Mathias

  22. I think we all can agree that when you have text automatically justified you should also have it hyphenated, even if it’s automatic and less than perfect. (That’s where the _current_ rule comes from: “don’t justify text on the Web!”) If you use (automatic) hyphenation you don’t necessarily need justification. The benefits are still there, especially in languages with average word length higher than in English. People also seem to forget or ignore that you can configure client- and server-side hyphenators to your likings, i.e. minimum characters to keep on line and push to nex, maximum number of consecutive lines hyphenated, exceptional words etc.

    Javascript solutions, of course, are not solutions but hacks. They’re fine as user-side workarounds (bookmarklet, plugin/addon), but, in general, shouldn’t be provided by site owners, at least not enabled by default.

    Some commenter suggested to propose additions to CSS. This has, unsurprisingly, been done long ago. It was even mentioned in the article, but hardly (re Prince), and currently resides in “Generated Content and Paged Media”:http://dev.w3.org/csswg/css3-gcpm/#hyphenation but will be moved to a “more appropriate module”:http://dev.w3.org/csswg/css3-text/#hyphenate

  23. Just thought I’d mention that viewing the Heart of Darkness example in Opera on Windows, there’s an odd question-mark character that precedes every em dash.

  24. As an author of an article on “Antidisestablishmentarianism”:http://en.wikipedia.org/wiki/Antidisestablishmentarianism may have discovered, text benefits from hyphenation and justification — whether justified or ragged. Like using

    and float:, the solutions in the article are workarounds to fundamental layout problems better solved by browser developers working with good specifications.

    If reading on electronic devices is going become as easy and elegant as print, browsers and e-readers need to have built-in sophisticated hyphenation and justification routines which are applied at the point of use, along with better handling of soft-hyphen and fixed codes so that they don’t pollute what the viewer receives.

    And once hyphenation is sorted, can we move onto good kerning and tracking and correct handling of ligatures, please?

    Reading on the web is like riding a unicycle or flying a biplane. We do it for the experience not the ride.

  25. For accessibility’s sake, there needs to be an “off” button from the site visitor’s end.

    I’ll also agree that too many Web pages suffer from JavaScript bloat. On a Windows embedded system, I see pages that take minutes to load, or even sometimes to scroll.

  26. @joe clark
     ”The article’s use of the U.S. Constitution as “neutral, generic” body copy is actively offensive to foreign nationals.”
    – Joe, as I wrote you in the email you referred to – did you not read my reply? – the “neutral text” quote is not from the US Constitution. It’s from the Declaration Of Independence and it was written by Thomas Jefferson to apply to all peoples at all times. It belongs to the world, not to citizens of the United States. It’s yours, too. And it beats Lorem Ipsum.

    @spinfuzz
     ”My question is why are we trying to set type standards on the web based off of print principles?”
    – We shouldn’t and I didn’t write that we should. Screens are different than ink on paper. Even high-res screens like the Retina display.
    But – and I admit that I haven’t interviewed every literate human on the planet in reporting this as Joe Clark would have me do, it seems – the habits of readers today have been formed by print and its conventions. (This is changing, and changing rapidly but it’s still the case.) We should examine the conventions of print. What doesn’t work should be tossed out. Traditions should be examined and analyzed. And then purging can be an informed decision, not a knee jerk reaction to “old media”.

    @ritz
     ”The first thing that came to my mind for practical use was not to hyphenate body text (jagged right is just fine for me, personally). Instead, I could see this being useful for really long names and titles for objects that appear in a tight grid.”
    – Thanks for thinking outside the box. This is not an either/all proposition. (Do those who object on principle to this technique, object to JavaScript being used to insert “smart quotes”. Smart quotes are OK, but hyphens not?)
    And your idea ties in, I think, with Mathias’s observation that word-length is not the same in all languages and that the problem of line wrapping has levels of urgency. With auto-translation becoming more and more prevalent, this is an issue that rates some thought. Hyphenation is one tool that can be in the toolbox.

    @christoph
     ”Javascript solutions, of course, are not solutions but hacks.”
    – I’ve got no problem with, and I thank you for the rest of your comment, but this broad-brush labeling of anything done with JavaScript as “hacks” is nonsense. And dismissing the work done in libraries like jQuery as “hacks” is offensive. (Although you most probably didn’t mean it that way.) JavaScript is the most widely used programming language in the world. It is embedded in many products besides browsers – Adobe Acrobat and InDesign, to name just two among the hundreds if not thousands of apps. JavaScript is and will remain an integral and irreplaceable part of the crafting of web pages now and into the future. There is nothing hacky about programmatic solutions to problems – they are perfectly appropriate. It is just another approach. Would it be nice if H&J were natively supported? Yes. But even then you would be dependent upon the included hyphenation dictionaries and you might have to turn to JavaScript to tweak the result. This stuff is hard to get right and there will never be a perfect world.

    @jon faulds
     ”there’s an odd question-mark character that precedes every em dash.”
    Thanks for reporting this. It has a bearing on backward compatibility
    Happens to me on XP, also. And Opera’s behavior is per-spec and technically correct. Here’s why: if the spacing character or the ZWS character that I’ve used to surround the em dash *isn’t in the font*, and Opera can’t find it any of the fallback fonts, either, the browser should show a box, as Opera does. Other browsers will synthesize certain characters even if they’re not in the font. As of Vista, Microsoft began including these characters in the system fonts exactly because of complaints about “boxes”.

  27. One thing I wanted to add:
    Where hyphens are inserted is dependent upon hyphenation dictionaries. Whether or not the hyphens are inserted into the text using JavaScript or done with code built into the browser, this does not change.
    The result you see with Hyphenator.js is, theoretically, exactly the same result you would see with native support. For screen, the visual results would be indistinguishable.

  28. In no way does force justified hyphenated text make the text easier to read — in fact it is just the opposite.

    I regularly work with people who have visual difficulties, learning difficulties and who are reading text that is not in their first language. I can guarantee you that most of them find force justified hyphenated text more difficult to read and understand. It’s rarely necessary in print and totally unnecccessary on the web.

    Setting text this way is an anachronism and it would be a terrible shame to see it spread on the web.

  29. I don’t know why I have to keep reminding Richard that shoving U.S. legislative documents down our throats as “neutral, generic”Â examples offends people who aren’t American. We never declared independence, hence don’t have a Declaration of Independence.

    I guess this is one of those times when it’s pointless to argue with Americans about their view that everybody fundamentally is one.

  30. Your comment shows that you haven’t yet understood the web. Whereas in print the content and its presentation is often one big unity (the book, the newspaper), there’s a three layered model on the web:

    # content — well structured HTML (w/o any styling)
    # presentation — default or user defined CSS that styles the HTML
    # behaviour — how the user can interact with layer 1 and 2: JavaScript and server side languages

    If a webdesigner decides to not respect this model, it’s his fault and he hadn’t understood the web, either!

    As it comes to accessibility and reception of text a website is well done — among may other important things — when it is receptionable when CSS and JavaScript (Layers 2 and 3) are turned off.
    H&J belongs to layer 2 only (it’s done by JavaScript in the case of Hyphenator.js because layer 3 is the place to change layer 2, but in case of native Hyphenation support layer 3 isn’t involved any more).

    There are interfaces for every user to change layer 2 (user defined stylesheets and extensions) and layer 3 (Bookmarklets and extensions). So if one doesn’t like how the context is presented he can change its presentation and its behaviour. (BTW: it’s exactly what I am doing with Hyphenator.js: it hyphenates every webpage for me, because I don’t like text layouts with ugly rags).

    It’s not a shame that there’s H&J for the web. If this would be the case then it is also a shame that there’s color (color-blindness) and sound (deafness) and many other things.

    I thing that you’re wrong.

    (But it’s a shame that people still don’t know about the model described above and still don’t know how to use and adapt the web for their needs!)

  31. Practically using hyphens on the web are still in early stage. For example, we designers have to avoid 3 word-break hyphens for 3 consecutive lines within the same paragraph. To be able to avoid this, we need a sultriness adjustment.

    We can’t wait for controlling content on the web the same way we do on print, though.

  32. I have implemented Hyphenator before on a website using the JS library. But personally I do not believe that this is the best way to implement hyphenation for the Web.

    Loading the JS library with every new web page creates quite a heavy load and takes time. And this also means that hyphenation is only available on website that actively offer it.

    Using it as a bookmarklet makes hyphenation available on any website that the user desires to have it to improve readability by his own judgement. Much better because it gives users a choice. But still you actively need to click that button for each new page which renders again then. What a waste of time!

    My conclusion is that the browser should add such functionality – preferable as an add-on – and make it configurable. One option could be to hyphenate all web pages by default. Another option would be to only hyphenate the current page on demand by clicking a button. But then the add-on could ask if you want to remember that website and set it individual default to hyphenate every time. That would be choice plus ease of use.

    Now we would need to find someone able and willing to write such an add-on to make us happy.

  33. When discussing whether ragged or justified style is better readable, I would like to remind everybody that many other languages than English – especially German and Finnish – have extremely long words. Using narrow columns (often seen with image captions as well) without hyphenation you often end up with just one word per line and large holes.

  34. When I started reading this article, I thought, “Cool!” By the time I finished it and read the comments, I was swayed that this is an interesting tech-demo, but not good practice.

    I can see how there would be special cases that might merit, like Heribert Wettels mentioned. However, I think the right answer is to avoid creating such tight spots in the design phase, thinking globally long before you’re putting in content.

  35. The author takes umbrage at javascript workarounds being labelled “hacks”, but that’s exactly what they are.

    hyphenator.js is a hack, because it uses javascript to provide a feature that should be implemented natively in the browser. It’s a stop-gap measure, just like using javascript to fix poor CSS support in old browsers (max-width, fixed positioning, etc.).

    The point about labelling something “hacky” is to draw attention to the costs of using it. In the case of hyphenator.js, you’re adding javascript to make H+J work. The cost is additional complexity (maintenance), the nasty bug in find-on-page, and performance.

    Remember that javascript is a blocking download, so the cost of javascript in kB cannot be directly compared to an image. Anything that interferes with the display of text content should be subjected to a harsh performance assessment, because a delay in supplying the text greatly affects the perceived responsiveness of the page.

    hyphenator.js is an impressive project. The typophile in me longs to use it. But the pragmatic website owner in me says that the objective cost greatly outweighs the small, subjective benefit.

    In other words, it’s just too hacky for my taste. Brilliant, but hacky.

  36. Wonderful article. There is a lot of annoyingly repetitive stuff in this thread, and some complete falsehoods.

    @Mike Hopley: JavaScript does not have to be a “blocking download.” This is why tools like YSlow’s analyzer recommend putting it at the end of your code. There are even tools out there that will compress, cache, and reposition your JS automatically.

    @Everybody who wants the world to know the web is not print: We are in a post-web-is-not-print world now. The web is the *new print*. If you don’t want to come along for the ride, you don’t have to use tools like those mentioned in the article.

    I used to be one of those “The web is not print” people, until I realized it made me instantly recognizable as someone who designed “like a web designer.” We need to push the body of web design work forward, not coddle it.

    I applaud Mathias for his work in pushing the cutting edge forward a bit more. I work next to a print designer of 25 years, and knowing about this sort of tool helps me bring her visual language to the web. It’s worth the effort.

  37. Sure. But if you put hyphenator.js at the end of your code, you will get a jarring “flash” when the hyphenation kicks in.

    You can’t have it both ways. Either you take the performance hit, or you live with the FOUC-like effect. Or you could just stick to ragged-right.

    It’s much the same problem as using @font-face: either you delay the text, or you get a flash of restyling when the font file arrives. For custom fonts, perhaps it’s worthwhile (at least for some designs). But for justified text?

  38. To all:
    Thanks for the frank, sometimes passionate comments.
    I wrote this article in the spirit of “hey, take a look at this, what do you think?” and you’ve certainly let me and ALA’s readers know your mind. (And if bowerbird wants to let us all in on his secret sauce for better H&J, I’m all ears.)
    One thing in particular that I’d like to point out is that HTML is not only the future of ebooks, it’s the future of print, too. At least that’s what I see with my binoculars on, and pretty clearly, too.
    If I may make a suggestion: try viewing the quick’n’dirty desktop browser example from the article in Print Preview. (However you might feel about IE, it happens to have a good Print Preview mode.)
    It looks like a book. And if I were to include a print style sheet, I wouldn’t be locked into the pixel grid and I could make use of the high res environment of print just as easily as any PDF. Add a dash of web fonts and anything InDesign can do, I can do in the browser. (In fact, I’ve issued a private face-off challenge to an experienced book designer of my acquaintance and I’m hoping he sends me a few pages by years end so I can try my hand at duplicating them – with every typographical nuance intact – in browser rendered HTML.)

    Simply put: as a web author, H&J is a design option I want. I don’t care if it’s unnecessary. I don’t care if some people like it or don’t like it. I want the *option*. I want my H&J.

    Personally, I happen to like H&J onscreen for long passages of text, especially narrative. I also like it on reading devices like the iPad where the viewing distance is more intimate.
    But that’s *my* druthers and it would take special circumstances for me to consider imposing H&J as the default.

    With regards to using the soft-hyphen and javascript to get H&J today:
     Everything has its advantages and disadvantages.
    Let me say that again:
     Everything has its advantages and disadvantages.

    Hyphenator.js *is* what it *is* and I haven’t heard anybody argue that it isn’t the best we can do for H&J for now.
    Hacky, shmacky, whacky or not.
    I certainly do admire Mathias Nater’s effort and will be using it on occasion, absolutely. I have no doubt the implementation will improve. Getting some more eyeballs on it was a spur, for sure, and – the way I see it – a part of what ALA is all about.
    ‘Til later…

  39. I happen to like justified text, even without hyphenation. I’m not sure how it harms readability, for English text, if the width of the text is sufficient. For narrow columns: yeah, it produces weird spacing between words. But for wide blocks of text, it just looks so much nicer at a glance, and you don’t really notice the spacing issue (rarely is it more than a few extra pixels per space) when reading.

    That said, I still don’t think automatic hyphenation is something I’m going to implement on my own site.

    Also, the Declaration of Independence is clearly not a “U.S. legislative document”. Even if it was, what could possibly be offensive about it? The quasi-religious mention of a creator? Or just the fact that it was written by Americans?

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

Nothing Fails Like Success

Our own @zeldman paints the complicated catch-22 that our free, democratized web has with our money-making capitalist roots. As creators, how do we untangle this web? #LetsFixThis