herefore any technique like this is really only applicable to English and possible a few other languages unless the code gets a lot more complicated.
TOEFL english CELTA certification esl tutoring toronto
International House Toronto took its first registration in 1996 and has hosted over 2000 students from around the world. The school’s founding philosophy was to offer high quality English programs in a warm and comfortable atmosphere for our traveling students. Although we have grown in size, our philosophy has stayed the same.
Copy & paste the code below to embed this comment.
neil.leathers
I must admit this article makes me cringe. It is important when comparing texts to account for whether various characters and character sequences should be considered identical. I am glad that this article highlights this and give some examples of its importance.
However, the area is more complex than discussed in the article and the suggestions made are in some cases dangerously wrong. The discussion of normalization is important since several sequences of character are identical alternative representations or may be depending on the rules one is following. Additionally, after this there is the question of comparison where, as mentioned, certain sequences are considered equivalent. However, these equivalence classes depend on the locale of the user, and the locale of the text data, also and desired usage (for example whether case insensitive).
These definitions of equivalence are collation sequences. When ever comparing text one should not use a simple “string == string” idiom but something along the lines of “currentCollator = I18nLibrary.GetCollation; currentCollator.Compare(string, string);”.
I would urge people to look at the Java documentation for java.text.Collator since that is one of the nicer starting reference. I do not know of a javascript implementation.
23 Reader Comments
Back to the Articlejeanlee
herefore any technique like this is really only applicable to English and possible a few other languages unless the code gets a lot more complicated.
TOEFL english
CELTA certification
esl tutoring toronto
International House Toronto took its first registration in 1996 and has hosted over 2000 students from around the world. The school’s founding philosophy was to offer high quality English programs in a warm and comfortable atmosphere for our traveling students. Although we have grown in size, our philosophy has stayed the same.
yfcteam
Right, FaceBook seems to try to implement this in search function
neil.leathers
I must admit this article makes me cringe. It is important when comparing texts to account for whether various characters and character sequences should be considered identical. I am glad that this article highlights this and give some examples of its importance.
However, the area is more complex than discussed in the article and the suggestions made are in some cases dangerously wrong. The discussion of normalization is important since several sequences of character are identical alternative representations or may be depending on the rules one is following. Additionally, after this there is the question of comparison where, as mentioned, certain sequences are considered equivalent. However, these equivalence classes depend on the locale of the user, and the locale of the text data, also and desired usage (for example whether case insensitive).
These definitions of equivalence are collation sequences. When ever comparing text one should not use a simple “string == string” idiom but something along the lines of “currentCollator = I18nLibrary.GetCollation; currentCollator.Compare(string, string);”.
I would urge people to look at the Java documentation for java.text.Collator since that is one of the nicer starting reference. I do not know of a javascript implementation.