--- license: apache-2.0 language: - fr library_name: peft base_model: openai/whisper-base tags: - whisper - speech-recognition - asr - lora - french - whisperlivekit - peft datasets: - mozilla-foundation/common_voice_23_0 metrics: - wer - cer pipeline_tag: automatic-speech-recognition model-index: - name: whisper-base-french-lora results: - task: type: automatic-speech-recognition name: Speech Recognition dataset: name: Common Voice 23.0 French type: mozilla-foundation/common_voice_23_0 config: fr split: test metrics: - type: wer value: 39.30 name: Test WER - type: cer value: 17.39 name: Test CER - task: type: automatic-speech-recognition name: Speech Recognition dataset: name: Common Voice 23.0 French type: mozilla-foundation/common_voice_17_0 config: fr split: validation metrics: - type: wer value: 28.06 name: Validation WER - type: cer value: 10.06 name: Validation CER --- # Whisper Base French LoRA A LoRA (Low-Rank Adaptation) fine-tuned adapter for [openai/whisper-base](https://huggingface.co/openai/whisper-base) optimized for French speech recognition. This adapter was specifically designed for use with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit), providing ultra-low-latency French transcription. ## Model Details | Property | Value | |----------|-------| | **Base Model** | `openai/whisper-base` (74M params) | | **Adapter Type** | LoRA (PEFT) | | **Trainable Parameters** | ~2.4M (~3.2% of base) | | **Language** | French (fr) | | **Task** | Transcription | ### LoRA Configuration ```python LoraConfig( r=16, lora_alpha=32, lora_dropout=0.05, bias="none", target_modules=["q_proj", "k_proj", "v_proj", "out_proj"] ) ``` ## Performance ### Comparison with Baseline | Split | Model | WER ↓ | CER ↓ | |-------|-------|-------|-------| | **Validation** | Whisper Base (baseline) | 36.94% | 15.62% | | **Validation** | **+ This LoRA** | **28.06%** | **10.06%** | | **Test** | Whisper Base (baseline) | 60.47% | 31.63% | | **Test** | **+ This LoRA** | **39.30%** | **17.39%** | ### Improvement Summary | Split | WER Reduction | CER Reduction | |-------|---------------|---------------| | Validation | **-8.88 pts** (24% relative) | **-5.56 pts** (36% relative) | | Test | **-21.17 pts** (35% relative) | **-14.24 pts** (45% relative) | ## Usage ### With WhisperLiveKit (Recommended) The easiest way to use this model is with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) for real-time French transcription: ```bash pip install whisperlivekit # Start the server with French LoRA (auto-downloads from HuggingFace) wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora ``` The adapter is automatically downloaded and cached from HuggingFace Hub on first use. ### With Transformers + PEFT ```python from transformers import WhisperForConditionalGeneration, WhisperProcessor from peft import PeftModel import torch # Load base model base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base") processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "qfuxa/whisper-base-french-lora") model = model.merge_and_unload() # Optional: merge for faster inference # Transcribe audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt") generated_ids = model.generate(audio.input_features, language="fr", task="transcribe") transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] ``` ### With Native Whisper (WhisperLiveKit Backend) ```python from whisperlivekit.whisper import load_model # Load Whisper base with French LoRA adapter model = load_model( "base", lora_path="path/to/whisper-base-french-lora" ) # Transcribe result = model.transcribe(audio, language="fr") ``` ## Training Details ### Dataset - **Source**: [Mozilla Common Voice](https://commonvoice.mozilla.org/) v23.0 French - **Training samples**: 100,000 - **Validation samples**: 2,000 - **Test samples**: 2,000 ### Training Configuration | Parameter | Value | |-----------|-------| | Epochs | 5 | | Effective batch size | 128 (16 × 8 accumulation) | | Learning rate | 3e-4 | | Warmup steps | 100 | | Weight decay | 0.01 | | Optimizer | AdamW | | Early stopping | 5 evaluations patience | ### Hardware - Trained on Apple Silicon (MPS) ## Limitations - Optimized specifically for French; may not generalize well to other languages - Based on `whisper-base` (74M params) — consider larger models for higher accuracy - Performance may vary on domain-specific audio (medical, legal, technical) - Trained on crowd-sourced Common Voice data; may have biases toward certain accents ## Citation If you use this model, please cite: ```bibtex @misc{whisper-base-french-lora, author = {Quentin Fuxa}, title = {Whisper Base French LoRA}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/qfuxa/whisper-base-french-lora} } @misc{whisperlivekit, author = {Quentin Fuxa}, title = {WhisperLiveKit: Ultra-low-latency speech-to-text}, year = {2025}, publisher = {GitHub}, url = {https://github.com/QuentinFuxa/WhisperLiveKit} } ``` ## License Apache 2.0 — same as the base Whisper model. ## Acknowledgments - [OpenAI Whisper](https://github.com/openai/whisper) for the base model - [Mozilla Common Voice](https://commonvoice.mozilla.org/) for the French dataset - [Hugging Face PEFT](https://github.com/huggingface/peft) for LoRA implementation