Have you ever thought about what goes into making text appear on your screen?
How an app that’s created in Estonia can be used by someone in Canada? How a blog post written on a Macbook can be read on a Samsung Galaxy S7?
Or how the Wingdings fonts - the most fascinating thing to hit my family computer before Full Tilt! Pinball - worked; and more importantly, why we needed 3 of them?
I hadn’t. Until I was investigating the impact of one country not using the international standard for character encoding.
Which really should be the case study for why international standards are a really big deal. And why understanding the local culture and user behaviour is, in many cases, more important that just providing language translations.
What Happened in Myanmar
When I think about Myanmar I think about the iconic hot-air balloons rising over the Bagan temples. About the views from Mandalay Hill as my desktop background. And about the longest civil war in the history of humankind.
It was - and in some areas still is - a series of ongoing insurgencies that began in 1948, shortly after the country gained independence from the United Kingdom that same year.
But even with the insurgencies, Myanmar of the early 1960s was the largest exporter of rice in the world. With an educated workforce and well-run economic and legal systems.
In Asia’s high-school yearbook of 1961, Myanmar was voted most likely to become fully industrialised.
So how did Asia’s rising star end up with decades of economic stagnation?
A coup. A coup in 1962. And economic sanctions by the US and EU. And borders that were closed to mass immigration and emigration.
This led to an under-developed Myanmar, far behind its neighbouring nations. SIM cards cost as much as $3000 USD. And with the context of a per capita income at $800 USD - mobile phones, landlines and internet were limited to the wealthy.
This started to change in 2010 when the elected president, and succeeding ruling party, gradually sought to reverse decades of mismanagement. And this eventually led to more-widely-opened borders, influences from the outside world, the introduction of $7 USD SIM cards and a reduction in censorship on the internet. Which in turn led to widespread connectivity through mobile phones and the internet.
But while Myanmar was enclosed within its own borders, fighting off insurgency , the outside world was discovering antibiotics. Looking for life on Mars. Reeling from the debut and shocking disbandment of the Beatles. And using the internet to facilitate communication by eliminating geographic, cultural and linguistic barriers.
Insert Unicode
The international standard for encoding, representing and handling text, developed in 1991. Basically, if you’re creating or using any app or website that involves showing text on a screen, Unicode is most likely involved. It defines the way in which text characters should be stored and processed. So that they are correctly rendered by any Unicode-compliant font anywhere in the world.
It covers all the characters for all the writing systems of the world, modern and ancient. And yes, this includes Burmese.
I hope you’ve been paying attention to the dates. And everything else I’ve been saying. If you have, you’ll notice that this international standard was developed while Myanmar was closed off to the world. So while the rest of the world nodded in agreement to use this standard, Myanmar could barely hear the conversation. And no one could see Myanmar furiously shaking its head.
Insert Zawgyi
The Burmese standard for encoding, representing and handling Burmese text. The concept is similar to Unicode. Except Zawgyi is only used for Burmese characters in the Burmese language.
For both types of encoding, they can only be used with fonts that are compliant. With Unicode, this is easy, as it’s an international standard. For Zawgyi, this is restricted to specific phones and one specific country - Myanmar.
Sadly, as both Unicode and Zawgyi use similar ranges of code-points to store and process the characters, there will never be one universal font that can render both of them.
What this Means for App & Website Developers
In case you’re wondering, this is what it would look like:
Encoding
|
With Unicode (Padauk) font
|
With ZawgyiOne font
|
Unicode text
| ||
Zawgyi-encoded text
|
Table 1: Showing how different characters are rendered depending on the encoding and fonts
Today this presents a unique and challenging problem. Apps and websites developed outside of Myanmar assume that users have devices - phones, laptops, tablets, smart watches, car displays, etc. - that use Unicode.
Once tech companies around the world realised the internet was becoming more widespread in Myanmar they offered Burmese translations. And they patted themselves on the back. And then sat back as their data on Myanmar barely changed.
They didn’t realise that users with Zawgyi devices won’t be able to accurately view the entire UI or any content created using Unicode. This could be anything from articles, blog posts, social media posts or even comments.
And over 90% of devices in Myanmar use Zawgyi.
Any features involving data validation - I’m mainly thinking of names, phone numbers and email addresses at registration - are also at risk if the user enters Zawgyi characters where Unicode was expected or vice versa.
If you’re hoping people land on your website by searching through a search engine, sit up and take note - it’s unlikely that search engines are built to understand all text encodings. Especially when there’s a widely-accepted international standard. Unicode text on your site can be consistently misinterpreted and thus excluded from search results on Zawgyi devices.
If you’ve got an internal search feature, think about what would happen if a user enters the search text in Zawgyi but all your content is in Unicode - again I’m expecting misinterpretation and exclusion.
And what about whether your users have to constantly switch between your app and a converter, copying and pasting text just to understand what the UI is saying. I’m not sure that’s the user flow you’re aiming for. Or whether your app (or their phones) can handle that constant switching.
And if it were me, I’m not sure I’d stick around long enough to find out the answer. Also, as we’re talking about what I’d do, I’m pretty sure I wouldn’t refer your app to friends or family - there goes any hope you had for gaining users by word of mouth.
Even if only 60% of speakers had access to the internet, that’s 20 million people. Even if only 50% of those had devices that worked with your website/app, that’s 10 million. Even if only 10% of those would use your website/app, that’s 1 million people. 1 million users you could have. But don’t.
So what can you do about that?
I don’t have all the answers, but I have some solutions for you to consider. And I also don’t have the details on how you’d implement these solutions, but a quick Google would probably point you in the right direction.
- Detecting the encoding and converting to Unicode. Now’s a good time to tell you that correctly detecting the encoding at all times is impossible. But, you can use tools like
chardet that make educated guesses about a text’s encoding by using computer algorithms to study large volumes of text. Once detected, convert the text to Unicode and display with a Unicode-compliant font.
- Use
webfonts (only for websites). For websites, once the encoding is detected, you can usewebfonts on your site so that each block of text is displayed correctly. By usingwebfonts , you jump the barriers of device limitations as the fonts are loaded along with the text instead of being downloaded. For your UI, this is easy as you don’t even need to detect the encoding, you know which one you used. But it is slightly trickier for displaying user-generated content - as you’ll need to detect the encoding first.
- Use bundled fonts (only for mobile devices). Some devices and operating systems don’t allow users to change or replace fonts, I’m looking at you Apple. You can bundle the font within the app, but it can only be used for this one specific app. A better solution for you than it is for the users. But a solution nonetheless.
- Let users switch. This adds complexity to the client and the server. But you could give the users the option to switch themselves, trust them to know which one they need. And from what I understand of Myanmar, the majority of users are aware that there’s a text display issue, and will likely know what to do.
Think of it similar to the way you’d switch languages using a language selector - if the label of each language is in that language then the users know which to chose. You just need to make it obvious where the selector is. Hint: don’t bury it in settings where the user would have to go through multiple pages of text they can’t read.
- Don’t do anything. The final solution will limit your usability and reach. You can completely ignore this problem. And let the users decide whether they want to use your website or app.
Conclusion
For the people in Myanmar, they’re used to this as they, and the rest of the world, slowly caught on to the implications of Zawgyi and Unicode. And they’ve found ways to deal with it:
- Using converters. The flow of copying and pasting and switching between apps is not the most user-friendly but for a very long time, it was the only way to understand what international apps were saying.
- User-loaded fonts. Once the tech-savvy caught on to what was happening, they’d either install the Unicode fonts or the Zawgyi fonts, depending on which device they bought. The main downside of this being that they could then only exclusively use international apps/websites (Unicode) or Burmese apps/websites (Zawgyi). The users would also have to download the font using precious wifi which still isn’t what you’d consider affordable to the masses in Myanmar.
- Store-loaded fonts. Once the phone shops picked up on this issue, they began offering to load the fonts in store when a device is bought. This got over the wifi issue as they could use copies of the same files. This type of service is similar to phone shops in Nigeria and India that pre-load specific applications, for the same goal of saving internet allowances.
- Factory-loaded fonts. Once the manufacturing companies, most notably Chinese device manufacturers and Samsung, found out about Zawgyi and the market for phones with Zawgyi and Unicode, they started
pre-loading the devices with both fonts. They looked at the numbers, the ones that I was telling you about earlier, and realised that even a small percentage of a 55 million population is significant.
- Recognising symbols. As the Burmese people started using international apps and websites on their Zawgyi phones, they began to recognise the distorted characters. It’s like when the
archeologists and historians began to decode the hieroglyphics. Which when you think about it, is really amazing. While some characters areunreadable, or replaced with empty space, others allowed the users to understand enough to get to specific pages, decide which text to paste into a converter or figure out how to change the settings.
So if you’re going to go for this option, you’ll need to consider these user behaviours.
From colonisation to independence to insurgencies to the brink of industrialisation to economic mismanagement and diplomatic isolation to democratic reforms, Myanmar has had a tumultuous time over the last few decades.
As they emerge from their civil war and reform their country and economy, the internet will be more involved than just lending a helping hand. As they introduce new technologies and develop their own, it’s important to sit up and take notice of these 55 million people.
And it’s important to recognise the technological and cultural differences that you’ll need to address to serve this user-base.
Battle of the fonts
The boom in smartphone use has fuelled a long-simmering controversy in the IT community over adopting Unicode as the standard for character encoding in Myanmar, instead of Zawgyi.
By GRIFFIN HOTCHKISS | FRONTIER
A passionate Michael Suantak was speaking in the Phandeeyar tech hub’s downtown Yangon office.
“They should not cheat the people; they should not cheat the future,” Mr Suantak said. “At the moment they will be very popular...but when the people realise [they have been cheated], after five or ten years, they will be very angry. So we cannot compromise,” he said.
You might be mistaken for thinking that he is discussing the national ceasefire, constitutional reform, or any other of the myriad challenges that await Myanmar’s new government when it takes power on April 1. But Mr Suantak is talking about computer fonts. Well, not fonts, exactly – about character encoding in Myanmar languages, and the grand battle for the minds of the country’s new smartphone-only generation of internet users.
Mr Suantak is the author of BIT font, one of the early solutions for Myanmar text encoding in modern operating systems. To understand the gravity of his comments, we need to understand character encoding and how it applies to languages such as Burmese.
What follows is a quick primer for the non-Burmese speaker on Unicode and Zawgyi, a history of the dispute over Myanmar fonts, and how the resolution (or non-resolution) of these issues will affect the future of Myanmar’s languages in silico.
Burmese character encoding 101
These words were written on my keyboard and saved to a small file on my computer. The file, like everything else on a computer, is nothing but a series of numbers. In a text file such as this article, each letter, punctuation mark, space, and paragraph break gets its own unique number, called a code point, so that when another computer reads the file, it has enough information to reproduce the sequence of letters exactly as I typed them. The conversion from written letters to a long string of numbers is called encoding.
Going the opposite way, from a long string of numbers to images of letters for printing, is called decoding. The process of encoding and decoding only works if both computers agree on the same letters corresponding to the same numbers. That is to say, there needs to be an encoding scheme so that all text is handled in the same way by all computers. In other words, character encoding must be standardised. For most of the world’s writing systems, the Unicode standard has been adopted so that alphabetisation, sorting, and encoding remain consistent across all operating systems and applications.
For English and any language that uses the familiar Latin alphabet, following the Unicode standard is a relatively simple task of assigning each letter in the alphabet to a unique code point. But for many languages with more complicated writing systems, choosing the best coding scheme has been difficult. Burmese, in particular, involves many modifications to a single written character.
Usually these modifications appear above, below, or to the left of the base consonant character. The challenge for encoding, then, is finding a way to assign a unique code point to each of the component parts so that when combined together, a computer can render the desired character.
Writing ‘Myo’
For example, the word “myo” is a single complex character made up of five simple character elements. ‘Ma’ is the base consonant, “ya-yit” adds a “ya” sound to become “mya”, “loungji-tin” and “tachaun-ngin” combined modify the vowel sound to become a tight “myoh”, and the final “auka-myit” signifies a creaky tone at the end of word. For encoding, each simple element must be assigned a code point.
Additionally, each element in a character might change how another element should be rendered. In this example, the “ya-yit” must be cut off slightly so as not to cover up the “loungji-tin”, and the “auka-myit” must be placed further to the right to make space for the “tachaun-ngin”. Unicode handles all this by using an intelligent rendering engine – each element has one and only one code point, but the character will modify the shape and width of the element automatically depending on which other elements are present.
The different ways 'Myo' can be rendered in Unicode or Zawgyi. (Credit Griffin Hotchkiss & Soe Lwin)
The birth of Burmese fonts
In the early days of Burmese fonts, getting a computer to display all the possible shapings for a character was difficult, because Windows did not support intelligent rendering of fonts. Ko Ngwe Tun, the author of one of the first Burmese fonts, Myazedi, devised a workaround still used by Zawgyi to this day: He mapped each individual variation of a character element to its own unique code-point. To write our example word, myo, users of Myazedi had to find the correct “ya-yit” manually from eight possible variations, and the correct “auka-myit” from three possible variations.
Myazedi, BIT, and later Zawgyi, circumscribed the rendering problem by adding extra code points that were reserved for Myanmar’s ethnic languages. Not only does the re-mapping prevent future ethnic language support, it also results in a typing system that can be confusing and inefficient, even for experienced users.
In Zawgyi, there are six different ways to write the word “myo” that render a superficially “correct” character, and many more if you allow for “incorrect” variations that would look strange but still intelligible to a reader. A computer, however, sees these variations as completely different words. Modern Unicode, by contrast, has only one code point per element, and will only render if the characters are encoded in the correct sequence, meaning that for each word there is one and only one encoding.
“We knew that it was going to be just a temporary solution, because eventually Microsoft and others would support a standard,” said Ko Ngwe Tun. “Once the standard was developed, we informed our customers that we could no longer support [them].”
Non-standardisation was not the only problem for Myazedi, though. It was also very expensive. A user licence was US$100, and a developer licence – needed for content producers such as online media – was $1,000. In 2002, this price was well beyond the means of most companies. Just like any other piece of expensive software, there was an incentive for piracy.
The rise of Zawgyi
The Zawgyi-One font was released in Mandalay as freeware in 2006, and it bore a striking resemblance to Myazedi – the first version even contained some of Ko Ngwe Tun’s copyright messages intact and unnoticed before release. That did little to dissuade people from downloading it. Soon many of the largest software companies in Myanmar were using or were planning to use Zawgyi instead of Myazedi.
Ko Ngwe Tun’s company, Solveware Solution, published a legal notice threatening to sue any company using a pirated Myazedi font, as well as the developers of Zawgyi. This angered many in the software community (especially the implicated companies), and in response Zawgyi was modified – the change brought Zawgyi even further from the Unicode standard – to make it harder to prove intellectual property theft.
Meanwhile, internationalisation efforts continued for Unicode. With the release of Windows XP service pack 2, complex scripts were supported, which made it possible for Windows to render a Unicode-compliant Burmese font such as Myanmar1 (released in 2005).
Getting a Unicode font to work with Windows, however, still required a bit of technical knowledge and some configuration. Ravi Chhabra, a Unicode researcher and the author of the first Zawgyi/Unicode detection engine, says this was the first advantage Zawgyi had over Unicode.
The second, he says, was the adoption of the internet as a source of information. Starting with the monk-led protests in 2007 called the Saffron Revolution and continuing with Cyclone Nargis in 2008, an increasing number of people in Myanmar began to realise the power of the internet as a medium, and the demand for Burmese content rose dramatically.
“The reason for Zawgyi’s huge success is planet.com.mm,” said Ko Ravi. “It was the web portal for news, and they used Zawgyi. When people went [to the site], there was a link to download the font, because font embedding didn’t work back then,” he said.
“The third thing is blogging. Nyi Linn Sat, one of the proto-bloggers, used Zawgyi and wrote instructions on how to use Zawgyi to set up blogs. People loved it, and they started blogging with Zawgyi.”
Ko Ngwe Tun eventually backed down from his legal threats, but the damage was done. Galvanized by the dispute with Solveware Solution, the developers of Zawgyi continued promoting their own product online as the best font for Burmese.
Subsequent Zawgyi releases made the font easier to install for an average user, and much harder to migrate away from. In some cases it forced the user’s whole system to default to Zawgyi, either by injecting it into the default Microsoft Arial font, or by installing as an Internet Explorer plugin with no option for uninstall.
“If you ask me,” said Ko Ravi, “these things did not happen in good faith. If they did, we wouldn’t be where we are today.”
Faults in the rendering of Zawgyi on Facebook. (Griffin Hotchkiss / Frontier)
Where are we?
Unicode for Myanmar languages has been refined and updated continuously since those early days. Complex characters and intelligent rendering are built-in to Unicode, and the standard has been endorsed by Google, Apple, and Facebook as the future of Myanmar language support. Many of Myanmar’s ethnic languages such as Shan, Mon, Kayah, and Karen, are also supported within the Unicode Myanmar codespace.
If you access Facebook to find a post written in Burmese, however, it will almost certainly be written in Zawgyi. Some news media websites offer Unicode as an option, but most do not. Huawei and Samsung, the two most popular smartphone brands in Myanmar, are motivated only by capturing the largest market share, which means they support Zawgyi out of the box.
More Myanmar people are going online for the first time, and when they open their new smartphone, they are unknowingly being inducted into Zawgyi’s massive userbase. Zawgyi is the font of the layman and that’s the crux of the problem.
The network effect is the only thing keeping Zawgyi alive. Switching to Unicode is a risk. Content producers such as media websites risk losing readers, phone makers risk losing customers, and ordinary folks risk simply alienating themselves from their friends. Zawgyi will have to die eventually – Unicode is the de facto standard worldwide, and for good reason.
Until that happens, however, all of the digital content produced by Myanmar’s rapidly growing population of internet users will be flawed. Searching, ordering, and manipulating content written in Zawgyi is a nightmare for developers, who must account for all of the redundancies of Zawgyi’s inefficient coding scheme, and who cannot make use of any software built for Unicode-compliant content.
“In the future, there will be machine translation, there will be optical character recognition, there will be text-to-voice, and all [the Zawgyi content] will not be usable.”
“They also had this [problem] in Cambodia, and the government stood up and declared: ‘If you want to sell or distribute in this country, you must use Unicode.’ And the issue was solved.”
Mr Suantak believes this will result in a lot of orphaned information, but he remains optimistic for those who choose to migrate their content. “BBC uses a fully Unicode-compliant font, and people still follow the BBC. If the information is good enough, the people will change to get the information.”
The great migration to Unicode might seem like an arduous undertaking, but it’s less challenging than some of the other obstacles Myanmar will face in its transformation. Ko Thura Hlaing, an avid supporter of Unicode and author of a Zawgyi/Unicode conversion script, believes that a clear directive from the government is all that is needed to push people to change.
“They also had this [problem] in Cambodia, and the government stood up and declared ‘If you want to sell or distribute in this country, you must use Unicode’. And the issue was solved,” Ko Thura Hlaing said.
Hopefully, it’ll be just as easy in Myanmar.
asnthanhan blog's
No comments:
Post a Comment