Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning Paper • 2512.20605 • Published 5 days ago • 47
Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models Paper • 2512.21337 • Published 4 days ago • 24
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper • 2512.20557 • Published 5 days ago • 46
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 5 days ago • 49
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs Paper • 2512.17206 • Published 10 days ago • 17
Reinforcement Learning for Self-Improving Agent with Skill Library Paper • 2512.17102 • Published 10 days ago • 26
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion Paper • 2512.19535 • Published 6 days ago • 10
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published 6 days ago • 59
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation Paper • 2512.19134 • Published 7 days ago • 31
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published 6 days ago • 27
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers Paper • 2512.17351 • Published 10 days ago • 22
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 10 days ago • 80
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published 10 days ago • 42
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published 9 days ago • 36
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs Paper • 2512.17008 • Published 10 days ago • 10
Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification Paper • 2512.16921 • Published 10 days ago • 7
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices Paper • 2512.14052 • Published 13 days ago • 39