Upload fine-tuned Whisper Medium Egyptian model: whisper-medium-egy

Browse files

Files changed (8) hide show

README.md +150 -0
model/CKPT.yaml +4 -0
model/brain.ckpt +3 -0
model/counter.ckpt +3 -0
model/dataloader-TRAIN.ckpt +3 -0
model/model.ckpt +3 -0
model/optimizer.ckpt +3 -0
model/scheduler.ckpt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,150 @@

+---
+language: ar
+license: apache-2.0
+tags:
+- whisper
+- automatic-speech-recognition
+- asr
+- audio
+- arabic
+- egyptian-arabic
+datasets:
+- MAdel121/arabic-egy-cleaned
+metrics:
+- wer
+- cer
+base_model: openai/whisper-medium
+pipeline_tag: automatic-speech-recognition
+library_name: transformers
+model-index:
+- name: whisper-medium-egy
+  results:
+  - task:
+      type: automatic-speech-recognition
+      name: Speech Recognition
+    dataset:
+      name: MAdel121/arabic-egy-cleaned (validation split)
+      type: MAdel121/arabic-egy-cleaned
+      config: ar
+      split: validation
+    metrics:
+    - name: WER
+      type: wer
+      value: 18.029990439289488
+    - name: CER
+      type: cer
+      value: 13.375029793807732
+---
+# Whisper Medium Egyptian Arabic (whisper-medium-egy)
+This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on a custom dataset of 72 hours of Egyptian Arabic speech. It's designed for Automatic Speech Recognition (ASR) for the Egyptian Arabic dialect.
+## Model Description
+*   **Base Model:** `openai/whisper-medium`
+*   **Language:** Arabic (ar), specifically focused on Egyptian dialect (arz)
+*   **Fine-tuning Dataset:** `MAdel121/arabic-egy-cleaned` (approx. 72 hours)
+*   **Total Training Steps:** 7299
+*   **Epochs:** 10
+## Intended Uses & Limitations
+This model is intended for transcribing speech in Egyptian Arabic.
+**Intended Use:**
+*   Automatic transcription of audio recordings and live speech in Egyptian Arabic.
+*   Assisting with content creation, subtitling, and voice-controlled applications for Egyptian Arabic speakers.
+**Limitations:**
+*   Performance may degrade in highly noisy environments or with very strong, non-Egyptian accents.
+*   The model was fine-tuned on a specific dataset; its performance on significantly different domains or audio characteristics might vary.
+*   The training data primarily consists of [describe your dataset sources/domains if possible, e.g., "YouTube videos", "audiobooks", "scripted conversations"]. Performance might be better on similar types of audio.
+## How to Use
+You can use this model with the `transformers` library and the `pipeline` interface for ease of use.
+```python
+from transformers import pipeline
+import torch
+device = "cuda:0" if torch.cuda.is_available() else "cpu"
+pipe = pipeline(
+  "automatic-speech-recognition",
+  model="YOUR_HF_USERNAME/whisper-medium-egy", # Replace YOUR_HF_USERNAME with your Hugging Face username
+  device=device
+)
+# Example with a local audio file
+# audio_file = "path/to/your/egyptian_arabic_audio.wav"
+# transcription = pipe(audio_file, generate_kwargs={"language": "arabic"})["text"]
+# print(transcription)
+# Example with a Hugging Face dataset audio sample
+# from datasets import load_dataset
+# ds = load_dataset("MAdel121/arabic-egy-cleaned", "ar", split="validation") # Or your test split
+# sample = ds[0]["audio"] # Make sure your dataset has an "audio" column
+# result = pipe(sample.copy(), generate_kwargs={"language": "arabic"})
+# print(result["text"])
+```
+Make sure to replace `"YOUR_HF_USERNAME/whisper-medium-egy"` with the actual model ID after uploading. The `generate_kwargs={"language": "arabic"}` is important for Whisper models to ensure correct tokenization and transcription for the target language.
+## Training Data
+The model was fine-tuned on the `MAdel121/arabic-egy-cleaned` dataset available on the Hugging Face Hub. This dataset contains approximately 72 hours of Egyptian Arabic audio paired with transcripts.
+## Training Procedure
+The model was trained using the `transformers` library. The fine-tuning process involved the following key hyperparameters:
+*   **Base Model:** `openai/whisper-medium`
+*   **Optimizer:** AdamW
+*   **Learning Rate:** 1e-5 (0.00001)
+*   **Warmup Steps:** 1000
+*   **Weight Decay:** 0.05
+*   **Gradient Accumulation Factor:** 2
+*   **Batch Size (loader_batch_size):** 8 (effective batch size would be 8 * 2 = 16)
+*   **Number of Epochs:** 10
+*   **Max Grad Norm:** 5
+*   **Augmentations Used:**
+    *   `use_drop_freq`: true
+    *   `use_drop_chunk`: true
+    *   `use_drop_bit_resolution`: true
+    *   Other augmentations like `use_add_noise`, `use_speed_perturb`, `use_pitch_shift`, `use_add_reverb`, `use_codec_augment`, `use_gain` were set to `false`
+*   **Task:** transcribe
+*   **Language:** ar
+*   **Seed:** 1986
+The training was managed and tracked using Weights & Biases under the project `whisper-medium-egyptian-arabic` with resume ID `r3sz4v27`.
+## Training Code
+Can be found on [Github here](https://github.com/moadel321/Fine-tuning-whisper-on-Modal-Labs-with-speech-brain-augmentations-/blob/c85312785faa2b927cbc217fe43acb8ed660d2ee/train_whisper_modal.py)
+## Weights & Biases
+Run can be found here : https://wandb.ai/m-adelomar1/whisper-medium-egyptian-arabic/
+## Evaluation Results
+The model was evaluated on the `validation` split of the `MAdel121/arabic-egy-cleaned` dataset.
+*   **Word Error Rate (WER):** 18.03%
+*   **Character Error Rate (CER):** 13.38%
+These metrics indicate the performance of the model on the validation set. Lower values are better.
+### BibTeX Citation
+```bibtex
+@misc{your_name_2024_whisper_medium_egy,
+  author    = Madel
+  title     = {Whisper Medium Fine-tuned for Egyptian Arabic},
+  year      = {2025},
+  publisher = {Hugging Face},
+  journal   = {Hugging Face Hub},
+  howpublished = {\\url{https://huggingface.co/MAdel121/whisper-medium-egy}} // Replace with actual URL
+}
+```

model/CKPT.yaml ADDED Viewed

	@@ -0,0 +1,4 @@

+# yamllint disable
+brain_intra_epoch_ckpt: true
+end-of-epoch: false
+unixtime: 1746494038.3237214

model/brain.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64af57c5b2b2982bda94205f9340a6e14b9fa13e472b89793fbd36575371282b
+size 65

model/counter.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4a44dc15364204a80fe80e9039455cc1608281820fe2b24f1e5233ade6af1dd5
+size 2

model/dataloader-TRAIN.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b24bdc2fb415e6a7038f442fd99a7144f3cfe358086a1ba9cfb1ac0a44ed7bb2
+size 4

model/model.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d792ab272f5fb4d0d48b7b6836d79b1ebed948b7872aa0c9f827c25f6d956e25
+size 3055793114

model/optimizer.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:852d2cad94668a6e9b2f1ca78a9d792f5430ec87fed36adbc9ae04a1783b043f
+size 6111664039

model/scheduler.ckpt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:000d9d4bec2874c99cd692c4431560aab31f77ae0d6b007244172cda4ac86c42
+size 936