Whisper Medium Arabic - Quran Fine-tuned (Full Fine-tuning)
Model Description
This model is a fine-tuned version of openai/whisper-medium specifically optimized for Arabic Quran recitation transcription.
The model was fine-tuned using Full Fine-tuning on a dataset of professional and non professional Quran recitations from MP3Quran and tarteel AI, making it highly effective for transcribing Quranic Arabic speech.
- Developed by: Fine-tuned model
- Model type: Automatic Speech Recognition (ASR)
- Language: Arabic (ar)
- License: Apache 2.0
- Base model: openai/whisper-medium
- Fine-tuning method: Full Fine-tuning
Training Details
Training Data
- Source Dataset: yousifgamalo/mp3quran
- Processed Dataset: yousifgamalo/quran-cleaned-nonprofessional
- Training samples: 1250527
- Validation samples: 12760
- Test samples: 12761
- Total samples: 1276048
The dataset consists of Quran recitations by professional reciters from MP3Quran, preprocessed with:
- Audio normalized to 16kHz mono
- Text without diacritics (tashkeel removed)
- Log-mel spectrograms extracted
- Shuffled to ensure diverse train/val/test splits
Training Hyperparameters
Training Arguments:
- Batch size per device: 32
- Gradient accumulation steps: 1
- Effective batch size: 32
- Learning rate: 1e-06
- Warmup steps: 500
- Number of epochs: 0.01
- Precision: bf16
- Optimizer: AdamW (default)
- Learning rate scheduler: linear with warmup
- Max generation length: 256
Generation Configuration:
- Task: Transcription
- Language: Arabic (forced)
- No repeat n-gram size: 3
- Repetition penalty: 2.0
Training Infrastructure
- Gradient checkpointing: Enabled
- Mixed precision training: bf16
- Early stopping: WER threshold 0.03
Evaluation
Test Set Metrics
| Metric | Value |
|---|---|
| Word Error Rate (WER) | 0.1162 |
| Test Loss | 0.0317 |
| Runtime (seconds) | 1300.45 |
| Samples per second | 9.81 |
Evaluation Data
The model was evaluated on a held-out test set of 12761 samples from the same distribution as the training data (professional Quran recitations from MP3Quran).
Use limitations and license
- not allowed for commercial use , only nonprofit is allowed .
Installation
pip install transformers torch torchaudio
Inference Example
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset, Audio
# Load model and processor
model_id = "yousifgamalo/quran-s-finetuned"
processor = WhisperProcessor.from_pretrained(model_id)
model = WhisperForConditionalGeneration.from_pretrained(model_id)
# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# Load and preprocess audio
# Example: Load from a file
import librosa
audio_array, sampling_rate = librosa.load("quran_recitation.wav", sr=16000)
# Process audio
input_features = processor(
audio_array,
sampling_rate=16000,
return_tensors="pt"
).input_features.to(device)
# Generate transcription
# The model is configured to output Arabic text automatically
predicted_ids = model.generate(input_features)
# Decode prediction
transcription = processor.batch_decode(
predicted_ids,
skip_special_tokens=True
)[0]
print(f"Transcription: {transcription}")
Using with Pipeline
from transformers import pipeline
# Create ASR pipeline
pipe = pipeline(
"automatic-speech-recognition",
model="yousifgamalo/quran-s-finetuned",
device=0 if torch.cuda.is_available() else -1
)
# Transcribe audio
result = pipe("quran_recitation.wav")
print(result["text"])
Limitations and Bias
- Domain-specific: This model is optimized for Quran recitation and may not perform well on general Arabic speech
- Professional recordings: Trained on professional reciters from MP3Quran, performance may vary on non-professional recordings
- No diacritics: The model outputs Arabic text without diacritical marks (tashkeel)
- Classical Arabic: Optimized for Classical/Quranic Arabic, not Modern Standard Arabic or dialects
Training Procedure Details
Preprocessing
- Audio files resampled to 16kHz mono
- Log-mel spectrograms extracted using Whisper's feature extractor
- Text normalized (Arabic diacritics removed)
- Dataset shuffled before splitting to ensure representative distributions
- Train/validation/test split: 98%/1%/1%
Full Fine-tuning
This model was trained using full fine-tuning, where all model parameters are updated during training. This provides maximum flexibility but requires more memory and compute resources.
Citation
If you use this model, please cite:
author = {Yousif H A },
title = {Whisper Medium - Quran Fine-tuned },
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/yousifgamalo/quran-s-finetuned}}
}
Also cite the original Whisper paper:
@article{radford2022whisper,
title={Robust Speech Recognition via Large-Scale Weak Supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
journal={arXiv preprint arXiv:2212.04356},
year={2022}
}
Model Card Contact
For questions or issues, please open an issue in the model repository.
- Downloads last month
- 241
Model tree for yousifgamalo/quran-s-finetuned
Base model
openai/whisper-mediumDataset used to train yousifgamalo/quran-s-finetuned
Evaluation results
- Word Error Rate on MP3Quran Professional Recitationsself-reported0.116