World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models Paper • 2511.22787 • Published 4 days ago • 5 • 1
Are they lovers or friends? Evaluating LLMs' Social Reasoning in English and Korean Dialogues Paper • 2510.19028 • Published Oct 21 • 7 • 2
BenchHub: A Unified Benchmark Suite for Holistic and Customizable LLM Evaluation Paper • 2506.00482 • Published May 31 • 8 • 2