Google DeepMind’s India unit is leading an ambitious artificial intelligence (AI) project named Morni (Multimodal Representation for India). This groundbreaking initiative aims to encompass 125 Indic languages and dialects, including 73 that currently have no digital representation. This effort is part of Google’s broader mission to preserve and promote the linguistic diversity of India, a country with a rich tapestry of languages spoken across its vast population.
The Need for Digital Representation
Hindi, one of India’s most widely spoken languages, is spoken by nearly 10% of the global population. However, its representation in online text is a mere 0.1%. This stark contrast highlights the urgent need for digital resources in Indic languages. Recognizing this, Google DeepMind is working to create digital corpora for these languages, ensuring that they are adequately represented in the digital age.
Project Vaani: Building a Language Database
To address the lack of digital data, Google has launched Project Vaani in collaboration with the Indian Institute of Science and ARTPARK (Artificial Intelligence & Robotics Technology Park). This initiative has made significant progress, completing its first phase with the creation of an open-source database containing over 14,000 hours of speech data from 58 languages. This data, contributed by 80,000 speakers across 80 districts, is a valuable resource for AI development. The project is now in its second phase, aiming to collect 154,000 hours of anonymized speech data from all Indian districts.
Expansion of Google Translate Via Project Morni
In addition to Project Morni, Google has recently expanded its language coverage in Google Translate by adding 110 new languages, including five Indian languages. This expansion, powered by the advanced PaLM-2 transformer model, now supports over 1,500 global languages and serves more than 600 million people worldwide. This development underscores Google’s commitment to making digital tools accessible to a broader audience, including speakers of lesser-known languages.
Digital Agri-Stack Initiative
Beyond language preservation, Google is also focusing on modernizing India’s agricultural sector. The company is developing a digital agri-stack, a data-driven initiative aimed at enhancing agricultural practices. This platform will facilitate loans to farmers, provide affordable crop insurance, and improve subsidy programs. By leveraging digital tools, Google aims to support government programs and improve the livelihoods of farmers across the country.
Key Summary of Google DeepMind’s Project Morni
Google DeepMind’s Project Morni is a significant step towards preserving and promoting India’s linguistic diversity in the digital age. By creating digital resources for 125 Indic languages, including those with no existing digital corpus, Google is ensuring that these languages are not left behind. Alongside the expansion of Google Translate and the development of a digital agri-stack, these initiatives reflect Google’s broader mission to empower communities through technology.
Project Morni is an AI initiative by Google DeepMind’s India unit, aiming to encompass 125 Indic languages and dialects, including 73 with no digital representation.
Project Vaani is a collaboration between Google, the Indian Institute of Science, and ARTPARK to create a database of speech data for Indic languages. It has already collected over 14,000 hours of data and is aiming for 154,000 hours in its second phase.
Google Translate has added 110 new languages, including five Indian languages, now supporting over 1,500 global languages and reaching more than 600 million people.
The digital agri-stack is a data-driven platform being developed by Google to enhance agricultural practices in India, offering loans, affordable crop insurance, and improved subsidy programs for farmers.
Digital representation ensures that Indic languages are preserved, promoted, and accessible in the digital world, preventing them from becoming obsolete as technology advances.