GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 3 days ago • 88
Video Analysis and Generation via a Semantic Progress Function Paper • 2604.22554 • Published 8 days ago • 63
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond Paper • 2604.22748 • Published 8 days ago • 218
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 10 days ago • 238
Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items Paper • 2604.19748 • Published 11 days ago • 249
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published 12 days ago • 90
KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance Paper • 2604.12627 • Published 18 days ago • 100
Running on Zero MCP Featured 256 Qwen Image Edit 2511 Fast 🏆 256 Fast 4 step inference of Qwen Image Edit 2511
WildDet3D: Scaling Promptable 3D Detection in the Wild Paper • 2604.08626 • Published 23 days ago • 245
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability Paper • 2604.06628 • Published 24 days ago • 323
Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding Paper • 2604.05015 • Published 26 days ago • 235