PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28, 2025 • 37
Qwen/Qwen2.5-VL-32B-Instruct Image-Text-to-Text • 33B • Updated Apr 14, 2025 • 2.38M • • 474