BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration Paper • 2510.00438 • Published 18 days ago • 4
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published 21 days ago • 44
Advancing Speech Understanding in Speech-Aware Language Models with GRPO Paper • 2509.16990 • Published 27 days ago • 18
T2R-bench: A Benchmark for Generating Article-Level Reports from Real World Industrial Tables Paper • 2508.19813 • Published Aug 27 • 25
PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning Paper • 2508.21104 • Published Aug 28 • 32
Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-View Scenes Paper • 2509.06266 • Published Sep 8 • 10
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published Sep 11 • 230
Reasoning over Boundaries: Enhancing Specification Alignment via Test-time Delibration Paper • 2509.14760 • Published about 1 month ago • 52
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision Paper • 2505.13427 • Published May 19 • 26
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published May 1 • 26
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published May 1 • 28
LightLab: Controlling Light Sources in Images with Diffusion Models Paper • 2505.09608 • Published May 14 • 36
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency Paper • 2505.01658 • Published May 3 • 39
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Paper • 2505.16707 • Published May 22 • 45