THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning Paper • 2509.13761 • Published 29 days ago • 16
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation Paper • 2509.25849 • Published 16 days ago • 45
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published 12 days ago • 23
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 10 days ago • 385
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published 20 days ago • 53
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published 7 days ago • 27