salmannyu/Llama-3B-Nemotron-Math-Mid-Train-Full-non-think-nopack-lr1.5e5-ep3 3B • Updated 13 days ago • 12
salmannyu/Llama-3B-Nemotron-Math-Mid-Train-Full-non-think-nopack-lr1.5e5-ep3 3B • Updated 13 days ago • 12
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper • 2512.12692 • Published Dec 14, 2025 • 14
On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models Paper • 2512.07783 • Published Dec 8, 2025 • 39
SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning Paper • 2512.03244 • Published Dec 2, 2025 • 17
SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning Paper • 2512.03244 • Published Dec 2, 2025 • 17