File size: 3,623 Bytes
158765b 4fc7b1e 2faf902 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 4e384c1 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e f4f92cc 4fc7b1e 158765b 4fc7b1e 158765b 4fc7b1e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
library_name: transformers
tags:
- asr
- arabic
license: cc-by-nc-4.0
datasets:
- rsalshalan/MGB2
language:
- ar
pipeline_tag: automatic-speech-recognition
---
# Model Card for ArTST_v2
# ArTST (ASR task)
ArTST model finetuned for automatic speech recognition (speech-to-text) on MGB2.
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** Speech Lab, MBZUAI
- **Model type:** SpeechT5
- **Language:** Arabic
- **Finetuned from:** [ArTST pretrained](https://github.com/mbzuai-nlp/ArTST)
## How to Get Started with the Model
```python
import soundfile as sf
from transformers import (
SpeechT5Config,
SpeechT5FeatureExtractor,
SpeechT5ForSpeechToText,
SpeechT5Processor,
SpeechT5Tokenizer,
)
from custom_tokenizer import CustomTextTokenizer
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = SpeechT5Tokenizer.from_pretrained("mbzuai/artst_asr_v2")
processor = SpeechT5Processor.from_pretrained("mbzuai/artst_asr_v2" , tokenizer=tokenizer)
model = SpeechT5ForSpeechToText.from_pretrained("mbzuai/artst_asr_v2").to(device)
audio, sr = sf.read("audio.wav")
inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")
predicted_ids = model.generate(**inputs.to(device), max_length=250)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
```
## Usage with Pipeline
```python
import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
device = "cuda"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "MBZUAI/artst_asr_v2"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForSpeechSeq2Seq.from_pretrained(model_id).to(device)
pipe = pipeline(
"automatic-speech-recognition",
model=model,
tokenizer=processor.tokenizer,
feature_extractor=processor.feature_extractor,
torch_dtype=torch_dtype,
device=device,
)
audio , sr = sf.read("path/to/audio/file")
if sr != 16000:
audio = librosa.resample(audio), orig_sr=sr, target_sr=16000)
result = pipe(audio)
print(result['text'])
```
### Model Sources [optional]
- **Repository:** [github](https://github.com/mbzuai-nlp/ArTST)
- **Paper :** [Arxiv](https://arxiv.org/abs/2411.05872)
<!-- - **Demo [optional]:** [More Information Needed] -->
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
```
@misc{djanibekov2024dialectalcoveragegeneralizationarabic,
title={Dialectal Coverage And Generalization in Arabic Speech Recognition},
author={Amirbek Djanibekov and Hawau Olamide Toyin and Raghad Alshalan and Abdullah Alitr and Hanan Aldarmaki},
year={2024},
eprint={2411.05872},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.05872},
}
@inproceedings{toyin-etal-2023-artst,
title = "{A}r{TST}: {A}rabic Text and Speech Transformer",
author = "Toyin, Hawau and
Djanibekov, Amirbek and
Kulkarni, Ajinkya and
Aldarmaki, Hanan",
booktitle = "Proceedings of ArabicNLP 2023",
month = dec,
year = "2023",
address = "Singapore (Hybrid)",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.arabicnlp-1.5",
doi = "10.18653/v1/2023.arabicnlp-1.5",
pages = "41--51",
}
``` |