Submitted by lastdefiance20 125 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI · 10 authors 33 3
Submitted by KangLiao 110 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation MMLab@NTU 148 2
Submitted by YuminChoi 45 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs KAIST AI 12 5
Submitted by hyeoncho01 45 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling · 6 authors 10 3
Submitted by yqi19 44 BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities · 20 authors 18 2
Submitted by taesiri 42 StreamingVLM: Real-Time Understanding for Infinite Video Streams · 7 authors 464 2
Submitted by weirayao 30 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Salesforce 58 2
Submitted by taesiri 28 BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution BigCode 50 3
Submitted by lulululuyi 23 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth? LongCat 14 2
Submitted by Kurt232 21 Which Heads Matter for Reasoning? RL-Guided KV Cache Compression · 5 authors 2 2
Submitted by taesiri 17 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km · 11 authors 29 3
Submitted by arubique 14 DISCO: Diversifying Sample Condensation for Efficient Model Evaluation Eberhard Karls Universität Tübingen 2
Submitted by Yunzhen 13 Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting · 5 authors 3
Submitted by JoeYing 12 ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping · 10 authors 2
Submitted by taesiri 9 PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs · 9 authors 19 2
Submitted by arashmarioriyad 9 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization · 6 authors 2
Submitted by yanchi3dv 9 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction · 2 authors 15 2
Submitted by cmhungsteve 7 TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control · 7 authors 2
Submitted by siyue 7 MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval · 8 authors 2
Submitted by Rbin 7 LightReasoner: Can Small Language Models Teach Large Language Models Reasoning? Data Intelligence Lab@HKU 309 2
Submitted by jasonyux 6 Dyna-Mind: Learning to Simulate from Experience for Better AI Agents · 9 authors 2
Submitted by Leo-Dai 6 StatEval: A Comprehensive Benchmark for Large Language Models in Statistics Shanghai University of Finance and Economics 2
Submitted by jacksukk 6 Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition · 7 authors 2
Submitted by kotekjedi 5 Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols CLAIRE Lab @EPFL 2
Submitted by dd101bb 5 Parallel Test-Time Scaling for Latent Reasoning Models The Hong Kong Polytechnic University 3 2
Submitted by taesiri 4 Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models · 12 authors 11 2
Submitted by demfier 4 ReviewerToo: Should AI Join The Program Committee? A Look At The Future of Peer Review · 4 authors 2
Submitted by nielsr 4 Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models Massachusetts Institute of Technology 37 1
Submitted by Ruggero1912 4 One Patch to Caption Them All: A Unified Zero-Shot Captioning Framework · 6 authors 8 2
Submitted by tytyt 3 Speculative Jacobi-Denoising Decoding for Accelerating Autoregressive Text-to-image Generation · 10 authors 2
Submitted by ssz1111 3 A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks · 8 authors 2
Submitted by zsqzz 2 GTAlign: Game-Theoretic Alignment of LLM Assistants for Mutual Welfare University of Illinois at Urbana-Champaign 0 3
Submitted by Jessemel 2 How to Teach Large Multimodal Models New Skills University of Illinois at Urbana-Champaign 20 2
Submitted by Sajib-006 2 LLM4Cell: A Survey of Large Language and Agentic Models for Single-Cell Biology Virginia Polytechnic Institute and State University 3
Submitted by cmhungsteve 2 Temporal Prompting Matters: Rethinking Referring Video Object Segmentation · 6 authors 2
Submitted by avanturist 2 ELMUR: External Layer Memory with Update/Rewrite for Long-Horizon RL · 3 authors 14 2
Submitted by LawrenceLiu 2 ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization University of California, Los Angeles 2
Submitted by WenyaoZhang 1 Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation · 10 authors 6 2
Submitted by EasonFan 1 ACE: Attribution-Controlled Knowledge Editing for Multi-hop Factual Recall · 8 authors 2
Submitted by jlbaker361 1 MONKEY: Masking ON KEY-Value Activation Adapter for Personalization · 1 authors 2