Yuriy Perezhohin PRO

yuriyvnv

https://scholar.google.com/citations?user=I5uzFtwAAAAJ&hl=en

AI & ML interests

Automatic Speech Recognition, Embeddings, Code Generation, Synthetic Data Generation and Filtering

Recent Activity

posted an update about 1 hour ago

🎙️Parakeet-TDT Fine Tuning: 4 New ASR Models Four fine-tuned versions of NVIDIA's Parakeet-TDT-0.6B-v3 for Dutch, Portuguese, Estonian, and Slovenian — among the first community fine-tunes of this architecture for the aforementioned languages 📊 Results on Common Voice 17 test sets: 🇸🇮 Slovenian: 50.49% → 11.56% WER (-77%) 🇵🇹 Portuguese: 15.86% → 10.71% WER (-32%) 🇪🇪 Estonian: 27.15% → 21.03% WER (-23%) 🇳🇱 Dutch: 5.99% → 5.33% WER (-11%) All models output cased text with punctuation. ``` import nemo.collections.asr as nemo_asr model = nemo_asr.models.ASRModel.from_pretrained( "yuriyvnv/parakeet-tdt-0.6b-dutch" ) output = model.transcribe(["audio.wav"]) print(output[0].text) ``` 🔗 Models: 🇳🇱 yuriyvnv/parakeet-tdt-0.6b-dutch 🇵🇹 yuriyvnv/parakeet-tdt-0.6b-portuguese 🇪🇪 yuriyvnv/parakeet-tdt-0.6b-estonian 🇸🇮 yuriyvnv/parakeet-tdt-0.6b-slovenian 🏗️ Training: Common Voice 17 + synthetic speech (OpenAI TTS), filtered with WAVe (yuriyvnv/WAVe-1B-Multimodal-PT) for quality. AdamW + cosine annealing, bf16-mixed precision, early stopping on val WER. Timestamps and long-form audio supported. @hf-audio @NVIDIADev #asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice

updated a model about 2 hours ago

yuriyvnv/parakeet-tdt-0.6b-polish

published a model about 2 hours ago

yuriyvnv/parakeet-tdt-0.6b-polish

View all activity

Organizations

Posts 3

Post

import nemo.collections.asr as nemo_asr

model = nemo_asr.models.ASRModel.from_pretrained(
    "yuriyvnv/parakeet-tdt-0.6b-dutch"
)
output = model.transcribe(["audio.wav"])
print(output[0].text)

🔗 Models:
🇳🇱 yuriyvnv/parakeet-tdt-0.6b-dutch
🇵🇹 yuriyvnv/parakeet-tdt-0.6b-portuguese
🇪🇪 yuriyvnv/parakeet-tdt-0.6b-estonian
🇸🇮 yuriyvnv/parakeet-tdt-0.6b-slovenian

🏗️ Training: Common Voice 17 + synthetic speech (OpenAI TTS), filtered with WAVe (yuriyvnv/WAVe-1B-Multimodal-PT) for quality. AdamW + cosine annealing, bf16-mixed precision, early stopping on val WER. Timestamps and long-form audio supported.

@hf-audio @NVIDIADev

#asr #speech #parakeet #nvidia #nemo #multilingual #fine-tuning #commonvoice

Post

443

🎯 WAVe-1B-Multimodal-NL: Word-Level Speech Quality Assessment for Dutch

Following the release of the Portuguese model, we're releasing the Dutch variant of WAVe — a 1B multimodal embedding model that assesses synthetic speech quality at the word level, thereby improving the quality of synthetically augmented datasets for training ASR models.

Trained on CommonVoice 16.1 Dutch with 5 corruption strategies, this model catches mispronunciations, timing errors, and prosody issues in synthetic data that sentence-level embeddings miss entirely.
Resources

- Dutch model: yuriyvnv/WAVe-1B-Multimodal-NL
- Portuguese model: yuriyvnv/WAVe-1B-Multimodal-PT
- Code: https://github.com/yuriyvnv/WAVe

This model builds on CommonVoice Dutch data — thanks to @mozilla and the CommonVoice community for making multilingual speech data accessible.

Would be great to hear from the Dutch NLP community — @BramVanroy @GroNLP — especially if you're working on Dutch ASR or TTS pipelines where quality filtering could help. Also tagging @hf-audio as this sits at the intersection of speech processing and data curation.