diarray commited on
Commit
db59d95
·
verified ·
1 Parent(s): 772e332

Push model using huggingface_hub.

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. README.md +144 -0
  3. soloba-ctc-0.6b-v3.nemo +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ soloba-ctc-0.6b-v3.nemo filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - bm
4
+ library_name: nemo
5
+ datasets:
6
+ - RobotsMali/kunkado
7
+
8
+ thumbnail: null
9
+ tags:
10
+ - automatic-speech-recognition
11
+ - speech
12
+ - audio
13
+ - Transducer
14
+ - FastConformer
15
+ - Conformer
16
+ - pytorch
17
+ - Bambara
18
+ - NeMo
19
+ license: cc-by-4.0
20
+ base_model: RobotsMali/soloba-ctc-0.6b-v2
21
+ model-index:
22
+ - name: soloba-ctc-0.6b-v3
23
+ results:
24
+ - task:
25
+ name: Automatic Speech Recognition
26
+ type: automatic-speech-recognition
27
+ dataset:
28
+ name: Kunkado
29
+ type: RobotsMali/kunkado
30
+ split: test
31
+ args:
32
+ language: bm
33
+ metrics:
34
+ - name: Test WER
35
+ type: wer
36
+ value: 38.8708581779757
37
+ - name: Test CER
38
+ type: cer
39
+ value: 21.648218306746136
40
+ - task:
41
+ name: Automatic Speech Recognition
42
+ type: automatic-speech-recognition
43
+ dataset:
44
+ name: Nyana Eval
45
+ type: RobotsMali/nyana-eval
46
+ split: test
47
+ args:
48
+ language: bm
49
+ metrics:
50
+ - name: Test WER
51
+ type: wer
52
+ value: XX.XXX
53
+ - name: Test CER
54
+ type: cer
55
+ value: YY.YYY
56
+
57
+ metrics:
58
+ - wer
59
+ - cer
60
+ pipeline_tag: automatic-speech-recognition
61
+ ---
62
+
63
+ # Soloba-CTC-600M Series
64
+
65
+ <style>
66
+ img {
67
+ display: inline;
68
+ }
69
+ </style>
70
+
71
+ [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer--CTC-blue#model-badge)](#model-architecture)
72
+ | [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
73
+ | [![Language](https://img.shields.io/badge/Language-bm-orange#model-badge)](#datasets)
74
+
75
+ `soloba-ctc-0.6b-v3` is a fine tuned version of [`RobotsMali/soloba-ctc-0.6b-v2`](https://huggingface.co/RobotsMali/soloba-ctc-0.6b-v2) on RobotsMali/kunkado. This model does not consistently produce Capitalizations and Punctuations and it cannot produce acoustic event tags like those found in Kunkado its transcriptions. It was fine-tuned using **NVIDIA NeMo**.
76
+
77
+ ## **🚨 Important Note**
78
+ This model, along with its associated resources, is part of an **ongoing research effort**, improvements and refinements are expected in future versions. A human evaluation report of the model is coming soon. Users should be aware that:
79
+
80
+ - **The model may not generalize very well accross all speaking conditions and dialects.**
81
+ - **Community feedback is welcome, and contributions are encouraged to refine the model further.**
82
+
83
+ ## NVIDIA NeMo: Training
84
+
85
+ To fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest PyTorch version.
86
+
87
+ ```bash
88
+ pip install nemo-toolkit['asr']
89
+ ```
90
+
91
+ ## How to Use This Model
92
+
93
+ Note that this model has been released for research purposes primarily.
94
+
95
+ ### Load Model with NeMo
96
+ ```python
97
+ import nemo.collections.asr as nemo_asr
98
+ asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name="RobotsMali/soloba-ctc-0.6b-v3")
99
+ ```
100
+
101
+ ### Transcribe Audio
102
+ ```python
103
+ model.eval()
104
+ # Assuming you have a test audio file named sample_audio.wav
105
+ asr_model.transcribe(['sample_audio.wav'])
106
+ ```
107
+
108
+ ### Input
109
+
110
+ This model accepts any **mono-channel audio (wav files)** as input and resamples them to *16 kHz sample rate* before performing the forward pass
111
+
112
+ ### Output
113
+
114
+ This model provides transcribed speech as an hypothesis object with a text attribute containing the transcription string for a given speech sample. (nemo>=2.3)
115
+
116
+ ## Model Architecture
117
+
118
+ This model uses a FastConformer Ecoder and a Convolutional decoder with CTC Loss. FastConformer is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling. You may find more information on the details of FastConformer here: [Fast-Conformer Model](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/asr/models.html#fast-conformer).
119
+
120
+ ## Training
121
+
122
+ The NeMo toolkit was used for finetuning this model for **39,000 steps** over `RobotsMali/soloba-ctc-0.6b-v2` model with bacth_size 32. The finetuning codes and configurations can be found at [RobotsMali-AI/bambara-asr](https://github.com/RobotsMali-AI/bambara-asr/).
123
+
124
+ The tokenizer for this model was trained on the text transcripts of the train set of RobotsMali/kunkado using this [script](https://github.com/NVIDIA/NeMo/blob/main/scripts/tokenizers/process_asr_text_tokenizer.py).
125
+
126
+ ## Dataset
127
+ This model was fine-tuned on the [kunkado](https://huggingface.co/datasets/RobotsMali/kunkado) dataset, the human-reviewed subset, which consists of **~40 hours of transcribed Bambara speech data**. The text was normalized with the [bambara-normalizer](https://pypi.org/project/bambara-normalizer/) prior to training, normalizing numbers, removing punctuations and removings tags.
128
+
129
+
130
+ ## Performance
131
+
132
+ We report the Word Error Rate (WER) and Character Error Rate (CER) for this model:
133
+
134
+ | Benchmark | Decoding | WER (%) &darr; | CER (%) &darr; |
135
+ |---------------|----------|-----------------|-----------------|
136
+ | Kunkado | CTC | 38.87 | 21.65 |
137
+ | Nyana Eval | CTC | XX.XX | YY.YY |
138
+
139
+ ## License
140
+ This model is released under the **CC-BY-4.0** license. By using this model, you agree to the terms of the license.
141
+
142
+ ---
143
+
144
+ Feel free to open a discussion on Hugging Face or [file an issue](https://github.com/RobotsMali-AI/bambara-asr/issues) on GitHub for help or contributions.
soloba-ctc-0.6b-v3.nemo ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3102e2764326558058de0a4683d0c317ee9a4b599964ea044ed1094b9658de89
3
+ size 2434027520