Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment Feb 11, 2025 • 95
The Unreasonable Effectiveness of Scaling Agents for Computer Use Paper • 2510.02250 • Published Oct 2, 2025 • 24