DSGym: A Holistic Framework for Evaluating and Training Data Science Agents Paper • 2601.16344 • Published 8 days ago • 10
ReasonIF: Large Reasoning Models Fail to Follow Instructions During Reasoning Paper • 2510.15211 • Published Oct 17, 2025 • 2
view article Article Back to The Future: Evaluating AI Agents on Predicting Future Events +5 Jul 17, 2025 • 50
the-real-gabagool/qwen-s1-fede-7b-dynamic-cheatsheet-shuffled-v2-checkpoint-142 8B • Updated May 30, 2025 • 1