Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress Paper • 2408.14960 • Published Aug 27, 2024
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning Paper • 2410.10801 • Published Oct 14, 2024 • 3
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 16
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 19
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models Paper • 2406.03368 • Published Jun 5, 2024
Bridging the Data Provenance Gap Across Text, Speech and Video Paper • 2412.17847 • Published Dec 19, 2024 • 10
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier Paper • 2412.04261 • Published Dec 5, 2024 • 6
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge Paper • 2411.19799 • Published Nov 29, 2024 • 16
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation Paper • 2412.03304 • Published Dec 4, 2024 • 19
M-RewardBench: Evaluating Reward Models in Multilingual Settings Paper • 2410.15522 • Published Oct 20, 2024 • 12
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm Paper • 2406.18682 • Published Jun 26, 2024
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives Paper • 2407.01490 • Published Jul 1, 2024 • 1
On the Limitations of Compute Thresholds as a Governance Strategy Paper • 2407.05694 • Published Jul 8, 2024 • 2
Consent in Crisis: The Rapid Decline of the AI Data Commons Paper • 2407.14933 • Published Jul 20, 2024 • 14
To Code, or Not To Code? Exploring Impact of Code in Pre-training Paper • 2408.10914 • Published Aug 20, 2024 • 45
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22, 2024 • 18