Understanding the Issue
The decline of language diversity is a pressing concern, with predictions that up to half of the world’s languages may vanish by 2100. Generative AI is seen as a potential accelerator of this decline, especially for indigenous and low-resource languages. Currently, the majority of AI training data is in English, leaving many languages underrepresented. This imbalance raises questions about language equity in the AI landscape.
Key Points to Note
- Most of the world’s 7,000 languages lack sufficient resources for AI training.
- AI models primarily learn from English, marginalizing speakers of less common languages.
- Major tech companies are launching initiatives to develop multilingual models, but challenges remain.
- Advances in speech technology could help preserve oral languages, yet they still lag behind text-based systems.
The Bigger Picture
The future of language diversity is not entirely bleak. While the dominance of English in AI is concerning, there is growing awareness of the need for inclusivity. Multi-language models can facilitate knowledge transfer, allowing languages to learn from one another with less data. Innovative projects, like the Fon-French translator and Meta’s speech-to-speech translation for Hokkien, illustrate the potential for technology to preserve and promote underrepresented languages. As society pushes for more equitable AI, the hope is that no language will be left behind.











