metadata
license: mit
datasets:
- mozilla-foundation/common_voice_11_0
language:
- fa
metrics:
- wer
base_model:
- openai/whisper-tiny
pipeline_tag: automatic-speech-recognition
library_name: transformers
this model trained on validation segment of data for one epoch with 0.05 loss and tested on test segment of data with 0.07 loss
how to use the model in colab:
#start
pip install torch torchaudio transformers librosa gradio
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
#Load your fine-tuned Whisper model and processor
model_name = "hackergeek98/tinyyyy_whisper"
processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
#Force the model to transcribe in Persian
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")
#Move model to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
import librosa
def transcribe_audio(audio_file):
# Load audio file using librosa (supports multiple formats)
audio_data, sampling_rate = librosa.load(audio_file, sr=16000) # Resample to 16kHz
# Preprocess the audio
inputs = processor(audio_data, sampling_rate=sampling_rate, return_tensors="pt").input_features.to(device)
# Generate transcription
with torch.no_grad():
predicted_ids = model.generate(inputs)
# Decode the transcription
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
return transcription
from google.colab import files
#Upload an audio file
uploaded = files.upload()
audio_file = list(uploaded.keys())[0]
#Transcribe the audio
transcription = transcribe_audio(audio_file)
print("Transcription:", transcription)