SlopCodeBench: Benchmarking How Coding Agents Degrade Over Long-Horizon Iterative Tasks Paper • 2603.24755 • Published 5 days ago • 24
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published 9 days ago • 75
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 13 days ago • 134
TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas Paper • 2603.16448 • Published 13 days ago • 58
InCoder-32B: Code Foundation Model for Industrial Scenarios Paper • 2603.16790 • Published 13 days ago • 304
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 14 days ago • 183
SWE-Skills-Bench: Do Agent Skills Actually Help in Real-World Software Engineering? Paper • 2603.15401 • Published 14 days ago • 18
In-Context Reinforcement Learning for Tool Use in Large Language Models Paper • 2603.08068 • Published 21 days ago • 42
Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams Paper • 2603.07392 • Published 23 days ago • 18