End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions Paper • 2601.17640 • Published 28 days ago • 5
daVinci-Dev: Agent-native Mid-training for Software Engineering Paper • 2601.18418 • Published 27 days ago • 124
Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis Paper • 2601.14417 • Published Jan 20 • 5
HeartMuLa: A Family of Open Sourced Music Foundation Models Paper • 2601.10547 • Published Jan 15 • 44
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published Jan 6 • 47
Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published Jan 11 • 212
Voxlect: A Speech Foundation Model Benchmark for Modeling Dialects and Regional Languages Around the Globe Paper • 2508.01691 • Published Aug 3, 2025 • 10