HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 13 days ago • 106
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents Paper • 2603.22386 • Published 7 days ago • 54
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published 5 days ago • 118
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published 4 days ago • 114
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 7 days ago • 130
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published 13 days ago • 91
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published 9 days ago • 75
MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild Paper • 2603.17187 • Published 13 days ago • 134
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation Paper • 2602.24286 • Published Feb 27 • 97
UniG2U-Bench: Do Unified Models Advance Multimodal Understanding? Paper • 2603.03241 • Published 27 days ago • 86
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published 27 days ago • 102
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning Paper • 2603.03790 • Published 27 days ago • 121
Heterogeneous Agent Collaborative Reinforcement Learning Paper • 2603.02604 • Published 28 days ago • 191
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 24 days ago • 117
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs Paper • 2603.05890 • Published 25 days ago • 91