video-SALMONN 2 video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. tsinghua-ee/video-SALMONN-2_plus_72B Updated Sep 28, 2025 ⢠7 ⢠2 tsinghua-ee/video_SALMONN2plus_72B_audioAlign Updated 7 days ago ⢠1 tsinghua-ee/video-SALMONN-2_plus_7B Updated Sep 28, 2025 ⢠720 ⢠6 tsinghua-ee/video_SALMONN2plus_7B_audioAlign 9B ⢠Updated Dec 18, 2025 ⢠404
video-SALMONN 2 video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions. tsinghua-ee/video-SALMONN-2_plus_72B Updated Sep 28, 2025 ⢠7 ⢠2 tsinghua-ee/video_SALMONN2plus_72B_audioAlign Updated 7 days ago ⢠1 tsinghua-ee/video-SALMONN-2_plus_7B Updated Sep 28, 2025 ⢠720 ⢠6 tsinghua-ee/video_SALMONN2plus_7B_audioAlign 9B ⢠Updated Dec 18, 2025 ⢠404