Microsoft’s Corpus to support Multi-Lingual Digital Literacy in India

Microsoft’s Corpus to support Multi-Lingual Digital Literacy in India

American writer, Will Durant once said, “India was the motherland of our race, and Sanskrit the mother of Europe’s languages…” Apart from varied culture and traditions, India is also the land of many languages. An interesting fact: There is no national language of India as declared by the constitution of India. However, English and Hindi inscribed in the Devanagari script are designated as the official languages of the Government of India.

Microsoft India Private Limited recently declared the availability of Microsoft Indian language Speech Corpus, contributing test data and speech training for three Indian languages namely Gujarati, Tamil, and Telugu. Extended by Microsoft Research Open Data initiative, this Indian language Speech Corpus is a group of independent sets of data from Microsoft Research to advance state-of-the-art research in fields such as domain specific sciences, natural language processing and computer vision. The company states that it is the largest openly accessible Indian language speech dataset that includes both audio as well as corresponding transcripts. The main motto is helping academics and researchers form an Indian language speech recognition application wherever speech is used.

Voice-Based Computing to reduce Language Barrier in Knowledge Acquisition

The Indian language Speech Corpus was analyzed at the world’s largest and most extensive conference on science and technology of spoken language processing called “Interspeech 2018”. Contenders built Automated Speech Recognition (ASR) systems from the Microsoft Indian Language Speech Corpus in a Low Resource Speech Recognition Challenge.

General Manager of Artificial Intelligence, Microsoft India states that the reason behind this new innovation is basically to reduce language barriers and enable Indians to utilize full potential of the internet. Using the Speech Corpus technology developed by the company, research and academics may bring innovations in voice-based computing in India. With the support of Deep Neural Networks and Artificial Intelligence, it is working to improve real-time language translation for Bengali, Tamil, and Hindi and now they have extended this to Telugu as well. Apart from that, Microsoft India also announced email address support for various Indian languages covering most of its email services and applications. India, is thus, one step closer to development in terms of knowledge and education.

Leave a Reply

Your email address will not be published. Required fields are marked *