Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Translation Paper • 2511.17290 • Published Nov 21
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28 • 17