Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis Paper • 2601.14417 • Published 5 days ago • 5
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published 10 days ago • 36
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published 19 days ago • 46
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published 14 days ago • 206
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 4 days ago • 41
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe Paper • 2508.01691 • Published Aug 3, 2025 • 10
tiantiaf/whisper-large-v3-msp-podcast-emotion Audio Classification • 2B • Updated Aug 10, 2025 • 3.2k • 5