InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing
Paper
• 2603.09877 • Published
• 25
Computer Vision
RIVER: A Real-Time Interaction Benchmark for Video LLMs
InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision