The role of synthetic data in Multilingual, Multi-cultural AI systems: Lessons from Indic Languages Paper • 2509.21294 • Published Sep 25 • 4
The role of synthetic data in Multilingual, Multi-cultural AI systems: Lessons from Indic Languages Paper • 2509.21294 • Published Sep 25 • 4 • 2
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval Paper • 2509.16442 • Published Sep 19
The role of synthetic data in Multilingual, Multi-cultural AI systems: Lessons from Indic Languages Paper • 2509.21294 • Published Sep 25 • 4
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Paper • 2406.05967 • Published Jun 10, 2024 • 6
Towards Inducing Document-Level Abilities in Standard Multilingual Neural Machine Translation Models Paper • 2408.11382 • Published Aug 21, 2024
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published Feb 19 • 41
IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages Paper • 2305.16307 • Published May 25, 2023
An Empirical Study of In-context Learning in LLMs for Machine Translation Paper • 2401.12097 • Published Jan 22, 2024
IndicTrans2 Collection Models(En-Indic, Indic-En, Indic-Indic) in 2 variants (base and dist) and Benchmarks (IN22-Gen and IN22-Conv) released as a part of IndicTrans2. • 10 items • Updated Sep 5 • 22