Home/Datasets/Tamil to Sindhi Translation Benchmark Dataset

Tamil to Sindhi Translation Benchmark Dataset

Bhashini's Tamil-Sindhi Translation Benchmark is a detailed text dataset for testing machine translation quality. It includes document-level information and helps researchers build better multilingual translation systems.

About Dataset

The dataset NTREX_ta_sd_benchmark provides news test references for Machine Translation (MT) evaluation, focusing on translations from Tamil to Sindhi. As part of a comprehensive collection supporting translations into 128 target languages, this dataset includes document-level information, making it a valuable resource for multilingual MT benchmarking. Designed for the news domain, it facilitates the evaluation of translation quality and supports the development of robust translation systems. Submitted by Microsoft, this dataset is essential for researchers and developers working on Tamil-to-Sindhi translation tasks.