---
license: apache-2.0
language:
- fr
library_name: peft
base_model: openai/whisper-base
tags:
- whisper
- speech-recognition
- asr
- lora
- french
- whisperlivekit
- peft
datasets:
- mozilla-foundation/common_voice_23_0
metrics:
- wer
- cer
pipeline_tag: automatic-speech-recognition
model-index:
- name: whisper-base-french-lora
  results:
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      name: Common Voice 23.0 French
      type: mozilla-foundation/common_voice_23_0
      config: fr
      split: test
    metrics:
    - type: wer
      value: 39.30
      name: Test WER
    - type: cer
      value: 17.39
      name: Test CER
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      name: Common Voice 23.0 French
      type: mozilla-foundation/common_voice_17_0
      config: fr
      split: validation
    metrics:
    - type: wer
      value: 28.06
      name: Validation WER
    - type: cer
      value: 10.06
      name: Validation CER
---

# Whisper Base French LoRA

A LoRA (Low-Rank Adaptation) fine-tuned adapter for [openai/whisper-base](https://huggingface.co/openai/whisper-base) optimized for French speech recognition.

This adapter was specifically designed for use with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit), providing ultra-low-latency French transcription.

## Model Details

| Property | Value |
|----------|-------|
| **Base Model** | `openai/whisper-base` (74M params) |
| **Adapter Type** | LoRA (PEFT) |
| **Trainable Parameters** | ~2.4M (~3.2% of base) |
| **Language** | French (fr) |
| **Task** | Transcription |

### LoRA Configuration

```python
LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none",
    target_modules=["q_proj", "k_proj", "v_proj", "out_proj"]
)
```

## Performance

### Comparison with Baseline

| Split | Model | WER ↓ | CER ↓ |
|-------|-------|-------|-------|
| **Validation** | Whisper Base (baseline) | 36.94% | 15.62% |
| **Validation** | **+ This LoRA** | **28.06%** | **10.06%** |
| **Test** | Whisper Base (baseline) | 60.47% | 31.63% |
| **Test** | **+ This LoRA** | **39.30%** | **17.39%** |

### Improvement Summary

| Split | WER Reduction | CER Reduction |
|-------|---------------|---------------|
| Validation | **-8.88 pts** (24% relative) | **-5.56 pts** (36% relative) |
| Test | **-21.17 pts** (35% relative) | **-14.24 pts** (45% relative) |

## Usage

### With WhisperLiveKit (Recommended)

The easiest way to use this model is with [WhisperLiveKit](https://github.com/QuentinFuxa/WhisperLiveKit) for real-time French transcription:

```bash
pip install whisperlivekit

# Start the server with French LoRA (auto-downloads from HuggingFace)
wlk --model base --language fr --lora-path qfuxa/whisper-base-french-lora
```

The adapter is automatically downloaded and cached from HuggingFace Hub on first use.

### With Transformers + PEFT

```python
from transformers import WhisperForConditionalGeneration, WhisperProcessor
from peft import PeftModel
import torch

# Load base model
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-base")
processor = WhisperProcessor.from_pretrained("openai/whisper-base", language="fr", task="transcribe")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "qfuxa/whisper-base-french-lora")
model = model.merge_and_unload()  # Optional: merge for faster inference

# Transcribe
audio = processor.feature_extractor(audio_array, sampling_rate=16000, return_tensors="pt")
generated_ids = model.generate(audio.input_features, language="fr", task="transcribe")
transcription = processor.tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

### With Native Whisper (WhisperLiveKit Backend)

```python
from whisperlivekit.whisper import load_model

# Load Whisper base with French LoRA adapter
model = load_model(
    "base",
    lora_path="path/to/whisper-base-french-lora"
)

# Transcribe
result = model.transcribe(audio, language="fr")
```

## Training Details

### Dataset

- **Source**: [Mozilla Common Voice](https://commonvoice.mozilla.org/) v23.0 French
- **Training samples**: 100,000
- **Validation samples**: 2,000
- **Test samples**: 2,000

### Training Configuration

| Parameter | Value |
|-----------|-------|
| Epochs | 5 |
| Effective batch size | 128 (16 × 8 accumulation) |
| Learning rate | 3e-4 |
| Warmup steps | 100 |
| Weight decay | 0.01 |
| Optimizer | AdamW |
| Early stopping | 5 evaluations patience |

### Hardware

- Trained on Apple Silicon (MPS)

## Limitations

- Optimized specifically for French; may not generalize well to other languages
- Based on `whisper-base` (74M params) — consider larger models for higher accuracy
- Performance may vary on domain-specific audio (medical, legal, technical)
- Trained on crowd-sourced Common Voice data; may have biases toward certain accents

## Citation

If you use this model, please cite:

```bibtex
@misc{whisper-base-french-lora,
  author = {Quentin Fuxa},
  title = {Whisper Base French LoRA},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/qfuxa/whisper-base-french-lora}
}

@misc{whisperlivekit,
  author = {Quentin Fuxa},
  title = {WhisperLiveKit: Ultra-low-latency speech-to-text},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/QuentinFuxa/WhisperLiveKit}
}
```

## License

Apache 2.0 — same as the base Whisper model.

## Acknowledgments

- [OpenAI Whisper](https://github.com/openai/whisper) for the base model
- [Mozilla Common Voice](https://commonvoice.mozilla.org/) for the French dataset
- [Hugging Face PEFT](https://github.com/huggingface/peft) for LoRA implementation