OpenGVLab

community

https://github.com/opengvlab

Activity Feed Request to join this org

AI & ML interests

Computer Vision

Recent Activity

yuezhengrong authored a paper about 6 hours ago

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

yuezhengrong submitted a paper 2 days ago

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

yuezhengrong authored a paper 3 days ago

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

View all activity

Papers

RIVER: A Real-Time Interaction Benchmark for Video LLMs

InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision

View all Papers

authored a paper about 6 hours ago

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

Paper • 2605.07915 • Published 6 days ago • 7

submitted a paper to Daily Papers 2 days ago

What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion

Paper • 2605.07915 • Published 6 days ago • 7

authored 9 papers 3 days ago

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Paper • 2408.10605 • Published Aug 20, 2024 • 2

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Paper • 2410.19702 • Published Oct 25, 2024 • 1

VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning

Paper • 2506.06097 • Published Jun 6, 2025 • 1

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Paper • 2503.10200 • Published Mar 13, 2025

VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception

Paper • 2509.21100 • Published Sep 25, 2025 • 1

UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation

Paper • 2510.10575 • Published Oct 12, 2025 • 2

Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing

Paper • 2510.08157 • Published Oct 9, 2025

VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning

Paper • 2511.19524 • Published Nov 24, 2025

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

Paper • 2605.06376 • Published 7 days ago • 25

authored 2 papers 16 days ago

OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

Paper • 2604.15093 • Published 28 days ago • 28

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 132

authored a paper about 1 month ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 235

authored a paper about 1 month ago

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Paper • 2603.25730 • Published Mar 26 • 53

authored a paper about 2 months ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 132

authored 2 papers about 2 months ago

MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites

Paper • 2510.12126 • Published Oct 14, 2025 • 2

ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

Paper • 2603.20644 • Published Mar 21 • 5

authored a paper about 2 months ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 132

authored a paper about 2 months ago

Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 132