AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration Paper • 2510.10395 • Published Oct 12 • 29
RealUnify: Do Unified Models Truly Benefit from Unification? A Comprehensive Benchmark Paper • 2509.24897 • Published Sep 29 • 46
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction Paper • 2502.17239 • Published Feb 24 • 3
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies Paper • 2503.14324 • Published Mar 18 • 2
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning Paper • 2503.19470 • Published Mar 25 • 19
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis Paper • 2503.22420 • Published Mar 28
Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning Paper • 2410.12952 • Published Oct 16, 2024