README.md · hackergeek98/tinyyyy_whisper at 1d588d20801463b7a96e58369d4a8d955b45e4ce

metadata

license: mit
datasets:
  - mozilla-foundation/common_voice_11_0
language:
  - fa
metrics:
  - wer
base_model:
  - openai/whisper-tiny
pipeline_tag: automatic-speech-recognition
library_name: transformers

this model trained on validation segment of data for one epoch with 0.05 loss and tested on test segment of data with 0.07 loss

how to use the model in colab:

   #start
   pip install torch torchaudio transformers librosa gradio
   from transformers import WhisperProcessor, WhisperForConditionalGeneration
   import torch

  #Load your fine-tuned Whisper model and processor
  model_name = "hackergeek98/tinyyyy_whisper"
  processor = WhisperProcessor.from_pretrained(model_name)
  model = WhisperForConditionalGeneration.from_pretrained(model_name)

  #Force the model to transcribe in Persian
  model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")
  
  #Move model to GPU if available
  device = "cuda" if torch.cuda.is_available() else "cpu"
  model.to(device)
  import librosa
  
  def transcribe_audio(audio_file):
      # Load audio file using librosa (supports multiple formats)
      audio_data, sampling_rate = librosa.load(audio_file, sr=16000)  # Resample to 16kHz
  
      # Preprocess the audio
      inputs = processor(audio_data, sampling_rate=sampling_rate, return_tensors="pt").input_features.to(device)
  
      # Generate transcription
      with torch.no_grad():
          predicted_ids = model.generate(inputs)
  
      # Decode the transcription
      transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
      return transcription
  from google.colab import files
  
  #Upload an audio file
  uploaded = files.upload()
  audio_file = list(uploaded.keys())[0]
  
  #Transcribe the audio
  transcription = transcribe_audio(audio_file)
  print("Transcription:", transcription)