Precision-RL Collection Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025
Precision-RL Collection Defeating the Training-Inference Mismatch via FP16 • 2 items • Updated Nov 14, 2025
Defeating the Training-Inference Mismatch via FP16 Paper • 2510.26788 • Published Oct 30, 2025 • 29 • 1
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2, 2025 • 228
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 76
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2, 2025 • 83
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published Jul 2, 2025 • 56