Grad2Reward: From Sparse Judgment to Dense Rewards for Improving Open-Ended LLM Reasoning Paper • 2602.01791 • Published Feb 2 • 1
view article Article Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs +3 5 days ago • 20
view article Article How we OCR'ed 30,000 papers using Codex, open OCR models and Jobs 6 days ago • 41
ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning Paper • 2603.05863 • Published Mar 6 • 6
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published 11 days ago • 228
Token Warping Helps MLLMs Look from Nearby Viewpoints Paper • 2604.02870 • Published 11 days ago • 33
VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Paper • 2604.02486 • Published 12 days ago • 9
Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence? Paper • 2604.03016 • Published 11 days ago • 36
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published 12 days ago • 137
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization Paper • 2604.02268 • Published 12 days ago • 93