Running Agents 3 AutoBench Leaderboard 👀 3 Multi-run AutoBench leaderboard with historical navigation
Running Agents 3 AutoBench Leaderboard 👀 3 Multi-run AutoBench leaderboard with historical navigation
view post Post 537 AutoBench 1.0 is live. The Collective-LLM-as-a-Judge model benchmarkhttps://huggingface.co/blog/PeterKruger/autobench See translation 👀 2 2 + Reply