SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 5 days ago • 25
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Paper • 2410.05603 • Published Oct 8, 2024 • 11