reecursion's picture
Upload fine-tuned OWSM model with exp/finetune directory
9689953 verified
# Common Accent ASR Model
This is a fine-tuned ASR model based on [espnet/owsm_v3.1_ebf_base](https://huggingface.co/espnet/owsm_v3.1_ebf_base) trained on the [DTU54DL/common-accent](https://huggingface.co/datasets/DTU54DL/common-accent) dataset.
## Model details
- Base model: espnet/owsm_v3.1_ebf_base
- Language: English
- Task: Automatic Speech Recognition
## Usage
```python
import torch
import numpy as np
from espnet2.bin.s2t_inference import Speech2Text
# Load the model
model = Speech2Text.from_pretrained(
"reecursion/accent-adaptive-owsm_v3.1_ebf_base",
lang_sym="<eng>",
beam_size=1,
device="cuda" if torch.cuda.is_available() else "cpu"
)
# Example inference
waveform = ... # Load your audio as numpy array
transcription = model(waveform)
print(transcription[0][0]) # Print the transcription
```