RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 9 days ago • 23
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent Paper • 2502.03207 • Published Feb 5
In-Context Learning with Unpaired Clips for Instruction-based Video Editing Paper • 2510.14648 • Published Oct 16
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 101
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Paper • 2509.15212 • Published Sep 18 • 21
VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? Paper • 2505.23359 • Published May 29 • 39
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published Jan 22 • 90
Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare Paper • 2405.19298 • Published May 29, 2024
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation Paper • 2411.13281 • Published Nov 20, 2024 • 21
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models Paper • 2411.00492 • Published Nov 1, 2024 • 6
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 111
Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation Paper • 2408.03588 • Published Aug 7, 2024 • 8
Latte: Cross-framework Python Package for Evaluation of Latent-Based Generative Models Paper • 2112.10638 • Published Dec 20, 2021
ARAUS: A Large-Scale Dataset and Baseline Models of Affective Responses to Augmented Urban Soundscapes Paper • 2207.01078 • Published Jul 3, 2022
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding Paper • 2407.15754 • Published Jul 22, 2024 • 20
Automating Urban Soundscape Enhancements with AI: In-situ Assessment of Quality and Restorativeness in Traffic-Exposed Residential Areas Paper • 2407.05744 • Published Jul 8, 2024