4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation Paper • 2512.17012 • Published 9 days ago • 42
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale Paper • 2511.05705 • Published Nov 7 • 6
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts Paper • 2411.05945 • Published Nov 8, 2024 • 4
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models Paper • 2309.15701 • Published Sep 27, 2023 • 2
Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition Paper • 2310.06434 • Published Oct 10, 2023 • 4
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition Paper • 2309.15223 • Published Sep 26, 2023 • 22