Post
๐๏ธParakeet-TDT Fine Tuning: 4 New ASR Models
Four fine-tuned versions of NVIDIA's Parakeet-TDT-0.6B-v3 for Dutch, Portuguese, Estonian, and Slovenian โ among the first community fine-tunes of this architecture for the aforementioned languages
๐ Results on Common Voice 17 test sets:
๐ธ๐ฎ Slovenian: 50.49% โ 11.56% WER (-77%)
๐ต๐น Portuguese: 15.86% โ 10.71% WER (-32%)
๐ช๐ช Estonian: 27.15% โ 21.03% WER (-23%)
๐ณ๐ฑ Dutch: 5.99% โ 5.33% WER (-11%)
All models output cased text with punctuation.
๐ Models:
๐ณ๐ฑ yuriyvnv/parakeet-tdt-0.6b-dutch
๐ต๐น yuriyvnv/parakeet-tdt-0.6b-portuguese
๐ช๐ช yuriyvnv/parakeet-tdt-0.6b-estonian
๐ธ๐ฎ yuriyvnv/parakeet-tdt-0.6b-slovenian
๐๏ธ Training: Common Voice 17 + synthetic speech (OpenAI TTS), filtered with WAVe (yuriyvnv/WAVe-1B-Multimodal-PT) for quality. AdamW + cosine annealing, bf16-mixed precision, early stopping on val WER. Timestamps and long-form audio supported.
@hf-audio @NVIDIADev
#asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice
Four fine-tuned versions of NVIDIA's Parakeet-TDT-0.6B-v3 for Dutch, Portuguese, Estonian, and Slovenian โ among the first community fine-tunes of this architecture for the aforementioned languages
๐ Results on Common Voice 17 test sets:
๐ธ๐ฎ Slovenian: 50.49% โ 11.56% WER (-77%)
๐ต๐น Portuguese: 15.86% โ 10.71% WER (-32%)
๐ช๐ช Estonian: 27.15% โ 21.03% WER (-23%)
๐ณ๐ฑ Dutch: 5.99% โ 5.33% WER (-11%)
All models output cased text with punctuation.
import nemo.collections.asr as nemo_asr
model = nemo_asr.models.ASRModel.from_pretrained(
"yuriyvnv/parakeet-tdt-0.6b-dutch"
)
output = model.transcribe(["audio.wav"])
print(output[0].text)๐ Models:
๐ณ๐ฑ yuriyvnv/parakeet-tdt-0.6b-dutch
๐ต๐น yuriyvnv/parakeet-tdt-0.6b-portuguese
๐ช๐ช yuriyvnv/parakeet-tdt-0.6b-estonian
๐ธ๐ฎ yuriyvnv/parakeet-tdt-0.6b-slovenian
๐๏ธ Training: Common Voice 17 + synthetic speech (OpenAI TTS), filtered with WAVe (yuriyvnv/WAVe-1B-Multimodal-PT) for quality. AdamW + cosine annealing, bf16-mixed precision, early stopping on val WER. Timestamps and long-form audio supported.
@hf-audio @NVIDIADev
#asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice