World Grows Small: Open Standards for the Global Web

by Molly E. Holzschlag

14 Reader Comments

Back to the Article
  1. Please, for declaring the character encoding you really want to use HTTP. More specifically, the charset parameter of the media type. See also http://www.w3.org/International/O-charset.html Also, you really only want to use and to promote UTF-8, in my humble opinion. Apart from those nitpicks, nice article!
    Copy & paste the code below to embed this comment.
  2. I’m creating a site that presents English, Hebrew and Arabic on the same page and I’m having real difficulty creating a page that has cross-browser compatibility AND validates. Can anyone recommend a good online resource?
    Copy & paste the code below to embed this comment.
  3. Some hints on better naming of classes and _id_s can be found in Richard Ishida’s “Locali[z]ation Considerations in DTD Design”?: http://xml.coverpages.org/IshidaDTD-Paper.html Although actually we could use more documentation about that. If we aren’t supposed to use anything like class=“red” or =“left”, preferring class=“sidebar”, in some other language it isn’t gonna be called a “sidebar.”?
    Copy & paste the code below to embed this comment.
  4. I agree with Joe that “in some other language it isn’t gonna be called a ‘sidebar’”. True, of course, but don’t we have the same problem with “class”, “id”, not to mention “<strong>”, “<em>”, or “document.getElementsByTagName” for that matter? Maybe an interim solution is to adopt some kind of “standardized” tags (see for instance “What’s in a Name” at Malarkey http://www.stuffandnonsense.co.uk/archives/whats_in_a_name.html). Designers may not be using id and div names in their own language, but at least there might be some agreement as to what they are. Better than nothing?
    Copy & paste the code below to embed this comment.
  5. If it is presenting three languages in abundance then usually UTF-8 will cover quite a lot of evils. However, it will also depend upon the fonts installed on the user computer as to how they display and text direction, i.e. Bidirectional text. Furthermore you’d have to consider both the CSS and the x(ht)ml ‘_lang_’ attributes.
    Copy & paste the code below to embed this comment.
  6. Re Robert S. “More on MULTI-Language support”; Using UTF-8 as mentioned by Robert W., Arabic, Farsi and Hebrew(not sure) are written right to left. If you are using combination of them in a page dir=“rtl” -direction Right To Left- can be helpful. I have used it for English/Farsi in rokni.net.
    Copy & paste the code below to embed this comment.
  7. Molly, Thanks for bringing to light a subject that really needs to be focused on more by web developers. Over the last year, I have been developing multi-lingual websites (http://www.engvocab.com - for example), and in the process learning a lot more about what is needed in these cases. It’s amazing how easy it could be to simple underestimate the amount of thought that needs to go into developing truly multi-lingual, multi-cultural websites. I’m glad that you brought out; 1. Character encoding. UTF-8 is the safest bet - but support for it in some editors can be off. TopStyle doesn’t support it. 2. Language declaration. lang=“en” xml:lang=“en” for example. I think a bit more information on this may be needed. These are the ISO 639-1 codes. I believe that ISO 639-2 offers three letter alternatives also. As well as specifying languages, the standards stipulate that you can also specify varieties, ‘fr-ca’ for example; Canadian French. 3. the Semantic Implications. I learnt quite early on that Japanese has no Bold and Italics for example. The idea of emphasising words with a background colour is a good one, but has that been confirmed by a Japanese person to be an accepted method?  4. LTR (Left to Right) and RTL (Right to Left) Text Direction. This is something that seems a little vague still to me. More attention needs to be brought to this, and the best way of implementing it. You can specify text-direction in CSS, but this seems like a bad idea, since it is not a presentational aspect of web development. It is in-fact very Semantic. So a block of text in a RTL language should be specified as dir=“RTL”. All the best Molly. Thanks for bringing this out. Hope to hear more about it on ALA!
    Copy & paste the code below to embed this comment.
  8. Sorry, for the double post. I just found this good page on Joe Clarkes site regarding codes. http://joeclark.org/book/sashay/serialization/AppendixB.html Worth a read.
    Copy & paste the code below to embed this comment.
  9. Molly, un grand merci for your continuous efforts in spreading the good word (OT: and for trying to make I.E. less of a wreck, an important task since most IT managers lack the spine to try anything else). ALA, uh, guess what… The version Textile used in your commenting system does _not_ allow us to use the ‘lang’ attribute: %[fr]Quel dommage,% after such a nice article dealing with localization!
    Copy & paste the code below to embed this comment.
  10. Joe and Marc, _localization is not translation_. The article suggests to use semantics-driven names and elements, _not_ language-independant names (is there such a thing?). Clearly it does not matter what language you pick for your names. Just pick a language that you and your coworkers are familiar with—or have to work with. If you pick english, ‘sidebar’ is a perfectly good name as long as it is used in the meaning of its “Webster definition”:http://www.webster.com/cgi-bin/dictionary?va=sidebar (hint: it is not “a bar on the side”). Regardless of which human language is displayed within your sidebar, regardless of its location (top, side, bottom…), regardless of its appearance, it is still a sidebar. When, in 6 months from now, after two site redesigns, you come back to your code and find this class named ‘sidebar’, you will know right away what it is about.
    Copy & paste the code below to embed this comment.
  11. The only true method for writing international markup would be to define it in a database using XML. Why? Because it would allow each country to write the elements in their own language. Eg: <cat>yes</cat> for English, <chat>oui</chat> for French. This would then be converted to HTML, which of course is in American-English. Though I’m not sure if XML allows for kanji to be used in tags! As for emphasis in Japanese, they may use katakana, as explained here: http://en.wikipedia.org/wiki/Katakana#Usage _"Katakana are also used for emphasis, especially on signs, advertisements, and hoardings… and words to be emphasized in a sentence are also sometimes written in katakana.“_
    Copy & paste the code below to embed this comment.
  12. For those looking for help developing content in right-to-left scripts, the following may help: Creating (X)HTML Pages in Arabic & Hebrew. Wrt language codes, note that RFC 3066 used to tell you how to create values for xml:lang and lang attributes.  There is now a successor - it is approved by the IETF, but we are still awaiting an RFC number. The new IANA registry of codes is already available.  See a recent article by co-author Addison Phillips on the W3C i18n site, Understanding the New Language Tags. Hope that helps.
    Copy & paste the code below to embed this comment.
  13. Well done Molly. Internationalisation is often a very tricky field and one which I suspect the vast majority of designers/developers are not used to dealing with even though any site on the web is available to all. I imagine that if html was invented by a Frenchman (for example) and all tags were in French then we would all have a very different perspective on building sites and be much more aware of language choices.
    Copy & paste the code below to embed this comment.
  14. bq. The only true method for writing international markup would be to define it in a database using XML. Why? Because it would allow each country to write the elements in their own language. Eg: <cat>yes</cat> for English, <chat>oui</chat> for French. This would then be converted to HTML, which of course is in American-English. Though I’m not sure if XML allows for kanji to be used in tags! I though Internationalization applied to the content of the page - not the markup?
    Copy & paste the code below to embed this comment.