Indian Flag
Government Of India
A-
A
A+

Indic Trans2

AI4Bharat's Indic-Trans-v2 is a multilingual Transformer (~1.1BM) NMT model trained on Samanantar v2 dataset which is the largest publicly available parallel corpora collection for languages of India at the time of writing (23 March 2023). We currently release two models - Indic to English and English to Indic and support all the 22 scheduled languages of India.

  • Digital India BHASHINI Division
    Digital India BHASHINI Division
  • BHASHINI_shailendra
    BHASHINI_shailendra

About Model

Bhashini - IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages — including multiple scripts for low-resouce languages like Kashmiri, Manipuri and Sindhi. It adopts script unification wherever feasible to leverage transfer learning by lexical sharing between languages. Overall, the model supports five scripts Perso-Arabic (Kashmiri, Sindhi, Urdu), Ol Chiki (Santali), Meitei (Manipuri), Latin (English), and Devanagari (used for all the remaining languages).

We open-souce all our training dataset (BPCC), back-translation data (BPCC-BT), final IndicTrans2 models, evaluation benchmarks (IN22, which includes IN22-Gen and IN22-Conv) and training and inference scripts for easier use and adoption within the research community. We hope that this will foster even more research in low-resource Indic languages, leading to further improvements in the quality of low-resource translation through contributions from the research community.

This code repository contains instructions for downloading the artifacts associated with IndicTrans2, as well as the code for training/fine-tuning the multilingual NMT models.

For more details about the use of model, refer to github: https://github.com/AI4Bharat/IndicTrans2/tree/main

Indic Trans2

Metadata Metadata

MIT

AI4Bharat

Machine Translation Model

Open

Digital India BHASHINI Division

Sector Agnostic

05/03/25 15:24:29

Admin

214.60 KB

Activity Overview Activity Overview

  • Downloads 16
  • Views 175
  • File Size 214.60 KB

Tags Tags

  • Machine Translation
  • Language Modeling
  • Bilingual Translation
  • Multilingual Translation
  • Machine Translation
  • Regional Languages
  • Indian Languages
  • Indic-TransV2
  • NLP
  • Computational Linguistics

License Control License Control

MIT

Version Control Version Control

FolderVersion 1(214.60 KB)
  • admin·1 month(s) ago
    • zip
      IndicTrans2-main.zip