view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 11 days ago • 827
Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement Paper • 2603.02641 • Published Mar 3 • 5
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7, 2025 • 428
FAMA Collection The First Large-Scale Open-Science Speech Foundation Model for English and Italian • 5 items • Updated May 30, 2025 • 10
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction Paper • 2501.12979 • Published Jan 22, 2025 • 1
Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions Paper • 2406.16128 • Published Jun 23, 2024 • 1
voc2vec: A Foundation Model for Non-Verbal Vocalization Paper • 2502.16298 • Published Feb 22, 2025 • 1
Text Style Transfer Collection Model checkpoints of the paper "Self-supervised Text Style Transfer Using Cycle-Consistent Adversarial Networks" • 33 items • Updated Dec 1, 2024 • 2
SEAHORSE release Collection The SEAHORSE metrics (as described in https://arxiv.org/abs/2305.13194). • 12 items • Updated Mar 12 • 21
MT5 release Collection The MT5 release follows the T5 family, but is pretrained on multilingual data. The update UMT5 models are pretrained on an updated corpus. • 10 items • Updated Mar 12 • 24
Health AI Developer Foundations (HAI-DEF) Collection Groups models released for use in health AI by Google. Read more about HAI-DEF at http://goo.gle/hai-def • 22 items • Updated Mar 12 • 209
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 25 items • Updated Mar 2 • 580
mHuBERT-147 models Collection Compact yet powerful multilingual speech representation models based on the HuBERT architecture. • 3 items • Updated Jun 4, 2024 • 8
LLaVa-NeXT Collection LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 33
Benchmarking Representations for Speech, Music, and Acoustic Events Paper • 2405.00934 • Published May 2, 2024 • 1
XLSR Collection A collection of multilingual Wav2Vec 2.0 checkpoints pre-trained on 53 languages and fine-tuned for CTC speech recognition. • 12 items • Updated Jan 16, 2024 • 9
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4, 2024 • 65