Walking Backwards: Supporting Non-Western Languages on the Web

IBM has released a version of Netscape 4.61 with BiDi support. Which means
absolutely nothing, to most Westerners. But this new version can display
Hebrew and Arabic web pages natively, allowing developers to build more
advanced sites, at far less expense.

Article Continues Below

Since I work mainly with Hebrew web sites in the Hebrew market, this article will focus on the
Hebrew side of things. In Arabic, matters are similar but not identical, with some additional problems unique to the Arabic language. I would love to hear comments from people who create Arabic language web sites.

Hebrew and Arabic are Bi-Directional (BiDi for short) languages. Meaning that most of the text is written from right-to-left, while some of the text (like numbers) is written from left to right.

Historically, since Netscape lacked any kind of Hebrew support, a cloggy workaround was developed which is called “Visual Hebrew.” In general, it has two parts:

On the Client side, the user must install a “Web View” font, which has a Western encoding but includes Hebrew glyphs (and most of the web view fonts are of very low quality).

On the Developer side, the developer must use certain techniques to have the page readable with the web view font:

  1. All Hebrew text must be reversed, while leaving any numbers or English text intact. For example, the first sentence below would be presented as displayed in the second sentence.
    I love Lucy and will meet with her on May 13

    13 yaM no reh htiw teem lliw dna ycuL evol I
  2. All line breaks must be hard coded into the HTML; you cannot let the browser wrap long lines, since, if you do, the words will get out of order.
  3. All the text must be manually aligned to the right – either with <p align=“right”> or with tables.
  4. You cannot use lists (<ol> or <ul>), since they would be indented to the left instead of to the right.
  5. You cannot define font faces (either via CSS or via the <font> tag), since the Hebrew fonts on the system are logical fonts, and would not work with web pages.
  6. Some elements, like forms and page titles, the browser uses the OS directly to display, which means that they have to be written differently – since the OS use logical Hebrew. (In logical Hebrew, the data is stored in the order it was entered, with a flag marking the directionality. When the data is processed and displayed, the OS uses that flag to keep the correct direction of the element.)

It is rather obvious that the visual method has huge shortcomings, both on the user side (you cannot copy and paste directly from web pages; the browser search function is useless) and on the developer side (the extra cost of converting existing documents to the visual encoding; the limitations of design; and a need to add an extra Hebrew flipping function to any data that is going in or out of a database, or being accepted from the user).

Microsoft, with version 3 of Internet Explorer, introduced a separate “Hebrew Enabled” version which uses the Unicode BiDi algorithm on Hebrew operating systems in order to display visually encoded web pages with any system font, and new support for “Logical” web pages, which work similarly to the OS in allowing the Arthur’s “flag” the directionality of elements, and render both Right-To-Left (RTL) and Left-To-Right (LTR) elements properly.

In version 5 of Internet Explorer, Microsoft went one step further, allowing anyone, on any language Windows system, to view Hebrew web pages – both logical and visual encoded. (Unfortunately, Mac IE5 has no Hebrew support.)

However, to be able to write in Hebrew (for example, in web forms) the user still needs to have a Hebrew supporting OS (or windows 2000 with the Hebrew language pack installed).

The W3C, in its HTML4 spec, also included the Unicode BiDi algorithm, introducing among others the DIR (direction) attribute that can go with any element and mark it’s directionality (RTL or LTR) and the ‎ (Left to Right Mark) and ‏ (Right to Left Mark) named entities, which can control the directionality of single characters.

All this time, the Netscape browser continued to lack any BiDi support whatsoever.

This caused an interesting chicken-and-egg problem. Since just about 80% of the users were using IE, web sites did not want to loose 20% of their users, so they continued to use visual Hebrew encoding for their pages. (Even Microsoft Israel’s own web site used visual Hebrew for it’s pages for a surprisingly long time.)

Of course, the fact that most web pages were written visually and were therefore viewable with Netscape, did not give end users any real reason to cry for BiDi support in their browser. The problem of copying and pasting to and from web pages was solved by a booming market of utilities and applications that did just that.

Until last week.

Last week IBM released a version of Netscape 4.61 which they had licensed from Netscape and to which they have added BiDi support.

The IBM version, Netscape 4.61i, includes the full Communicator suite, but only the browser has BiDi support. (By contrast, in IE, the full package – including the browser, Front Page Express, and Outlook Express, all support Hebrew.)

The Netscape user interface has no Hebrew option (again, unlike IE which has a Hebrew interface available for users of localized Hebrew windows), but is finally aware of BiDi.

No more need to define a special web view font in order to view Hebrew web pages – any Hebrew font installed on the system will do.

The fonts for Hebrew are defined independently from fonts for other languages. The user can, for example, define Trebuchet MS (which has no Hebrew glyphs) as his/her default Latin1 font, and Arial Hebrew as his/her default Hebrew font.

There is a full new section in the preferences in order to define BiDi options like the default direction of a web page (LTR or RTL), the default user encoding etc.

Sites that have no encoding defined or have incorrect encoding defined, can be viewed by switching to the correct character set from the new encoding menu. This time, it has all four Hebrew character sets:

  1. Hebrew logical (Windows-1255)
  2. Hebrew implicit (ISO-8859-8I, similar but not identical to the one above)
  3. Hebrew visual
  4. Hebrew DOS (which is almost totally out of use)

Yes, logical Hebrew is finally here in Netscape. It still suffers some bugs, but it works well with most of my test pages.

I should note though, that the MSN Israel web site (http://www.msn.co.il/homepage.asp) the only major web site written in logical Hebrew, caused Hebrew Netscape to crash consistently. Is it the web site? Is it something in logical Hebrew? Is it the browser? At the moment I haven’t done enough testing to know for sure.

One issue I did find, though, is that the User Agent string of this browser is identical to any Netscape 4.6 international browser; therefore there is no way to tell from standard server logs how many of the Netscape visitors to a site actually have Hebrew support.

IBM apparently be basing their work on Hebrew support in the Mozilla project upon the work they have done here, but AOL/Netscape has of yet not said a word about their plans, if any, for including the BiDi support code in the upcoming Netscape 6.

For more information#section2

About the Author

Shoshannah L. Forbes

Shoshannah L. Forbes is a web developer in Israel.

No Comments

Got something to say?

We have turned off comments, but you can see what folks had to say before we did so.

More from ALA

A Content Model Is Not a Design System

Why do so many content models still look more like design systems rather than reflecting structured data? Mike Wills takes us on a personal journey as he examines his own past experiences and invites us to conceive content models that articulate meaning and group related content together for use on any channel.