Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models Paper • 2602.07026 • Published 24 days ago • 136
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 75
Running on CPU Upgrade Featured 1.23k Open ASR Leaderboard 🏆 1.23k Explore ASR model performance across languages and datasets
Running 347 VBench Leaderboard 📊 347 Upload video model evaluation data to update the VBench leaderboard