τ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge Paper • 2603.04370 • Published 11 days ago • 2
τ-Knowledge: Evaluating Conversational Agents over Unstructured Knowledge Paper • 2603.04370 • Published 11 days ago • 2
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration Paper • 2506.05579 • Published Jun 5, 2025 • 4
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration Paper • 2506.05579 • Published Jun 5, 2025 • 4 • 2
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval Paper • 2407.12883 • Published Jul 16, 2024 • 13
IMPersona: Evaluating Individual Level LM Impersonation Paper • 2504.04332 • Published Apr 6, 2025 • 2
IMPersona: Evaluating Individual Level LM Impersonation Paper • 2504.04332 • Published Apr 6, 2025 • 2
view article Article Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs +5 Apr 16, 2024 • 16