Diversity or Precision? A Deep Dive into Next Token Prediction Paper • 2512.22955 • Published 8 days ago • 3
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient Paper • 2509.26313 • Published Sep 30, 2025 • 4