The online language gap in the U.S. is an urgent issue for Asian Americans

Chen said that although content review policies from Facebook, Twitter, and other companies have successfully filtered out some of the most obvious false information in English, when the system uses other languages, the system often misses the content. Instead, the work must be done by volunteers like her team, who look for false information and receive training to resolve false information and minimize its spread. She said: “The mechanisms used to capture certain words and content will not necessarily capture such errors and misinformation when using another language.”

Google’s translation services and technologies, such as Supersonic with Real-time translation headset Use artificial intelligence to switch between languages. But Xiong found that these tools were not enough to implement the Miao language, which is an extremely complex language in which context is very important. She said: “I think we have become very complacent and rely on advanced systems like Google.” “They claimed to be’language accessible’, and then I read it and it said something completely different. ”

(A Google spokesperson acknowledged that smaller languages ​​“brought more daunting tasks for translation,” but he said that the company has “invested in research that is particularly conducive to low-resource language translation” through machine learning and community feedback. )

Keep going down

The challenges faced by online languages ​​are not limited to the United States, but literally also involve basic codes. Yudhanjaya Wijeratne is a researcher and data scientist at the Sri Lankan think tank LIRNEasia. In 2018, he began to follow his social media activities to encourage a network of robots targeting Muslim violence: In February and March of that year, a series of riots among Sinhalese Buddhists drove the city of Ampara and Kandy. Muslims and mosques are targeted.His team record on file The “search logic” of these bots classified thousands of Sinhalese social media posts and brought the information they found to Twitter and Facebook. He said: “They will say all kinds of good and kind words-basically fixed statements.” (Twitter said in a statement that it uses manual review and automated systems “to apply our fairness to everyone in the service.” Rules, regardless of background, ideology or political background.”)

When contacted by MIT Technology Review, a Facebook spokesperson stated that the company commissioned an independent human rights assessment of the platform’s role in violence in Sri Lanka. Released in May 2020Changes were made after the attack, including the hiring of dozens of content managers who speak Sinhala and Tamil. They said: “We have deployed active hate speech detection technology in Sinhala to help us identify potentially illegal content faster and more effectively.”

“The three lines of code I wrote in English in Python actually allowed me to spend two years researching 28 million Sinhala words”

Linasia Yudhanjaya Wijeratne

As the rover’s behavior continued, Wijeratne was suspicious of the clichés. He decided to look at the code bases and software tools that the two companies were using, and found that a mechanism for monitoring hate speech in most non-English languages ​​has not yet been established.

Wijeratne said: “In fact, many studies on many languages ​​like ours have not yet been completed.” “The work I can accomplish with three lines of code written in English and Python has cost me two years of research on 28 million. Sinhala words to build the core corpus, build the core tools, and then improve things to a level that I can achieve. This level of text analysis may be performed.”

After a suicide bomber hit a church in Colombo, the capital of Sri Lanka, in April 2019, Wijeratne built a tool to analyze hate speech and misinformation in Sinhala and Tamil.The system is called Watchdog, Is a free mobile application that aggregates news and attaches warnings to false stories. The warning comes from volunteers who have undergone fact-checking training.

Wijeratne emphasized that this work goes far beyond translation.

He said: “Many algorithms that we often quote in our research, especially in natural language processing, are taken for granted, and they show excellent results for English,” he said. “There are still many identical algorithms, even languages ​​that are only a few degrees apart, whether they are West German or Romance languages, they may return completely different results.”

Natural language processing is the foundation of an automatic content review system.Vielatne Published a paper The difference between the accuracy of different languages ​​was checked in 2019. He believes that the more computing resources (such as data sets and web pages) exist in a language, the better the algorithm can play. Languages ​​from poorer countries or communities are at a disadvantage.

“If you want to build the Empire State Building in English, then you have a blueprint. You have the materials,” he said. “You have everything on hand, and all you have to do is put them together. For every other language, you don’t have a blueprint.

“You don’t know the source of the concrete. You don’t have steel or workers. So you will sit there, digging only one brick at a time, hoping that your grandson or granddaughter might complete the project.”

Deep-rooted problems

The movement that provides these blueprints is called linguistic justice, which is nothing new.American Bar Association Descriptive language justice As a “framework” for preserving people’s rights, that is, “they communicate, understand and understand in the language they like and find the most expressive and powerful.”

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *