aquiffoo

https://aquiffoo.is-a.dev/

AI & ML interests

thanks for everything.

Recent Activity

liked a model about 3 hours ago

stepfun-ai/Step-3.5-Flash

reacted to yuriyvnv's post with 👍 about 7 hours ago

🎯 WAVe: 1B Multimodal Embedding Model for Word-Level Speech Quality Multimodal embeddings for speech + transcript that verify quality at the word level, not just sentence level. Catches mispronunciations, timing errors, and prosody issues that sentence-level filters miss. 📊 Impact on Portuguese ASR: • 34% reduction in training steps • 50% better cross-domain generalization • 30% less synthetic data needed • Word-aligned attention finds errors other methods miss 🏗️ Architecture: • Text: XLM-RoBERTa (278M params) • Audio: Wav2Vec2-BERT 2.0 (581M params) • Word Alignment: Multi-head attention + GLU (14M params) • Total: 1B parameters ``` from transformers import AutoModel, AutoProcessor processor = AutoProcessor.from_pretrained( "yuriyvnv/WAVe-1B-Multimodal-PT", trust_remote_code=True ) model = AutoModel.from_pretrained( "yuriyvnv/WAVe-1B-Multimodal-PT", trust_remote_code=True ) ``` # Assess speech-transcript alignment ``` inputs = processor(text="Olá, como está?", audio=audio_array, sampling_rate=16000, return_tensors="pt") quality = model(**inputs).quality_score.item() ``` Perfect for filtering synthetic speech datasets before ASR training. Model: https://huggingface.co/yuriyvnv/WAVe-1B-Multimodal-PT Code to create WAVe : https://github.com/yuriyvnv/WAVe #multimodal #speech #embeddings #asr #syntheticdata #qualityassessment

liked a model 3 days ago

Qwen/Qwen3-ASR-1.7B

View all activity

Organizations

liked a model about 3 hours ago

stepfun-ai/Step-3.5-Flash

199B • Updated about 3 hours ago • 77

reacted to yuriyvnv's post with 👍 about 7 hours ago

Post

1337

from transformers import AutoModel, AutoProcessor

  processor = AutoProcessor.from_pretrained(
      "yuriyvnv/WAVe-1B-Multimodal-PT",
      trust_remote_code=True
  )
  model = AutoModel.from_pretrained(
      "yuriyvnv/WAVe-1B-Multimodal-PT",
      trust_remote_code=True
  )

# Assess speech-transcript alignment

inputs = processor(text="Olá, como está?", audio=audio_array, sampling_rate=16000, return_tensors="pt")
  quality = model(**inputs).quality_score.item()

Perfect for filtering synthetic speech datasets before ASR training.

Model: yuriyvnv/WAVe-1B-Multimodal-PT
Code to create WAVe : https://github.com/yuriyvnv/WAVe
#multimodal #speech #embeddings #asr
#syntheticdata #qualityassessment

1 reply

liked a model 3 days ago

Qwen/Qwen3-ASR-1.7B

Automatic Speech Recognition • 2B • Updated 3 days ago • 25.7k • 325

New activity in aquiffoo/neo-3-1B-A90M-Base 4 days ago

Instruct model update

#2 opened 4 days ago by

iwr-redmond

reacted to AdinaY's post with 🔥 4 days ago

Post

1253

Big day in open source AI!!

✨ DeepSeek released OCR2 💥
deepseek-ai/DeepSeek-OCR-2

✨ Kimi K2.5 just landed 🔥
moonshotai/Kimi-K2.5

With the Chinese Spring Festival 3 weeks away,

what’s coming next?👀