Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 18
Distribution-Level Feature Distancing for Machine Unlearning: Towards a Better Trade-off Between Model Utility and Forgetting Paper • 2409.14747 • Published Sep 23, 2024
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 5 days ago • 6
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs Paper • 2601.01836 • Published 5 days ago • 6
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 4 days ago • 5.92k • 56 • 9
COMPASS Collection A Framework for Evaluating Organization-Specific Policy Alignment in LLMs • 5 items • Updated 4 days ago • 4
AIM-Intelligence/COMPASS-Policy-Alignment-Testbed-Dataset Viewer • Updated 4 days ago • 5.92k • 56 • 9
Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition Paper • 2505.15367 • Published May 21, 2025 • 2
PIQA: Reasoning about Physical Commonsense in Natural Language Paper • 1911.11641 • Published Nov 26, 2019 • 5
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought Paper • 2510.04230 • Published Oct 5, 2025 • 26