These are fixed-width versions of plain ASCII characters designed to line up with Chinese/Japanese/Korean characters (e.g., “1979年8月15日”). Nearly every alphabet has a few extra-special marks that do affect meaning, and, of course, non-Western alphabets have completely different rules.Ī minor gotcha is the Unicode “fullwidth” Roman alphabet. You will almost certainly get it wrong for somebody, somewhere. With accent-folding, it doesn’t matter whether users search for cafe, café or even çåFé the results will be the same.īe aware that there are a million caveats to accent rules. Anywhere you apply case-folding, you should consider accent-folding, and for exactly the same reasons. accent-folding in an autosuggest widgetĪn accent-folding function essentially maps Unicode characters to ASCII equivalents. Entering them can be cumbersome, especially on mobile devices.įig 2. Accents (a.k.a diacritical marks) are pronunciation hints that don’t affect the textual meaning. In specific applications of search that favor recall over precision, such as our address book example, á, a, å, and â can be treated as equivalent. There is no excuse for your software to play dumb when the user types “ cafe” instead of “ café.” Áçčềñṭ-Ḟøłɖǐṅg #section2 Spock: Rÿszarḋ Kåpuścińsḳi (accent-folded, redirects to canonical URL).Wikipedia: Rÿszarḋ Kåpuścińsḳi ( not found).Wikipedia: Ryszard Kapusciński ( not found).Wikipedia: Ryszard Kapuscinski (hand-coded alternate).Wikipedia: Ryszard Kapuściński (canonical URL).Look at the journalist Ryszard Kapuściński and how different websites handle his name: Think about inboxes, social bookmarks, comment feeds, users who speak multiple languages, users in internet cafés in foreign countries, even URLs. This problem is not just in address books. These applications “support Unicode,” in the sense that they don’t corrupt or barf on it, but that’s all.įig 1. If I compose a new message and type “lo” in the To: field, what should happen? In many applications only Lorena will show up. The locale I prefer is only loosely correlated with the locales I expect applications to process. Today users deal with data from everywhere, in multiple languages and locales, all the time. This was fine because it was better than nothing, and because users spent most of their time with documents they or their coworkers produced themselves. One byte equaled one character, no exceptions, and you could only load one language’s alphabet at a time. Brief books for people who make websites.Ī common assumption about internationalization is that every user fits into a single locale like “English, United States” or “French, France.” It’s a hangover from the PC days when just getting the computer to display the right squiggly bits was a big deal.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |