lelegu commited on
Commit
b3d9eb7
·
verified ·
1 Parent(s): 31dacff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -90
README.md CHANGED
@@ -1,90 +1,91 @@
1
- ---
2
- license: apple-amlr
3
- language:
4
- - en
5
- metrics:
6
- - wer
7
- pipeline_tag: automatic-speech-recognition
8
- tags:
9
- - asr
10
- - mixture-of-experts
11
- - speech
12
- ---
13
- # Model Card for Model ID
14
-
15
- Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers.
16
-
17
-
18
- ## Model Details
19
-
20
- ### Model Description
21
-
22
- This model is a dense model (84M).
23
-
24
- - **Developed by:** Apple Machine Learning Research
25
- - **Model type:** ASR
26
- - **Language(s):** English
27
- - **License:** apple-amlr
28
-
29
- ## Uses
30
-
31
- This model is a speech recognition model.
32
-
33
- ## How to Get Started with the Model
34
-
35
- Please refer to the github pages for detailed usage.
36
-
37
- ## Training Details
38
-
39
- ### Training Data
40
-
41
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
42
-
43
- It is trained on the Libriheavy dataset.
44
-
45
- ## Evaluation
46
-
47
- <!-- This section describes the evaluation protocols and provides the results. -->
48
-
49
- ### Testing Data, Factors & Metrics
50
-
51
- #### Testing Data
52
-
53
- <!-- This should link to a Dataset Card if possible. -->
54
-
55
- This model is evaluated on Librispeech dev/test sets.
56
-
57
- #### Metrics
58
-
59
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
60
-
61
- Word Error Rate (WER).
62
-
63
- ### Results
64
-
65
- | | Dense | Switch | Omni-router |
66
- |---|---|---|---|
67
- | | 84M | 8 x 84M | 8 x 84M |
68
- | dev-clean | 2.1 | 1.9 | 1.8 |
69
- | dev-other | 6.7 | 6.1 | 5.4 |
70
- | test-clean | 2.3 | 2.2 | 2.0 |
71
- | test-other | 6.2 | 5.8 | 5.2 |
72
-
73
-
74
- ## Citation
75
-
76
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
77
-
78
- If you find this work useful, please cite our paper:
79
- ```
80
- @article{gu2025omnirouter,
81
- title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
82
- author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
83
- journal={arXiv preprint arXiv:2507.05724},
84
- year={2025}
85
- }
86
- ```
87
-
88
- ## Model Card Contact
89
-
90
- Contact zijin@apple.com for any issues.
 
 
1
+ ---
2
+ license: apple-amlr
3
+ language:
4
+ - en
5
+ metrics:
6
+ - wer
7
+ pipeline_tag: automatic-speech-recognition
8
+ tags:
9
+ - asr
10
+ - mixture-of-experts
11
+ - speech
12
+ ---
13
+ # Model Card for Model ID
14
+
15
+ Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers.
16
+ Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details.
17
+
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ This model is a dense model (84M).
24
+
25
+ - **Developed by:** Apple Machine Learning Research
26
+ - **Model type:** ASR
27
+ - **Language(s):** English
28
+ - **License:** apple-amlr
29
+
30
+ ## Uses
31
+
32
+ This model is a speech recognition model.
33
+
34
+ ## How to Get Started with the Model
35
+
36
+ Please refer to the [github](https://github.com/apple/ml-omni-router-moe-asr) page for detailed usage.
37
+
38
+ ## Training Details
39
+
40
+ ### Training Data
41
+
42
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
43
+
44
+ It is trained on the [Libriheavy](https://github.com/k2-fsa/libriheavy) dataset.
45
+
46
+ ## Evaluation
47
+
48
+ <!-- This section describes the evaluation protocols and provides the results. -->
49
+
50
+ ### Testing Data, Factors & Metrics
51
+
52
+ #### Testing Data
53
+
54
+ <!-- This should link to a Dataset Card if possible. -->
55
+
56
+ This model is evaluated on [Librispeech](https://www.openslr.org/12) dev/test sets.
57
+
58
+ #### Metrics
59
+
60
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
61
+
62
+ Word Error Rate (WER).
63
+
64
+ ### Results
65
+
66
+ | | Dense | Switch | Omni-router |
67
+ |---|---|---|---|
68
+ | | 84M | 8 x 84M | 8 x 84M |
69
+ | dev-clean | 2.1 | 1.9 | 1.8 |
70
+ | dev-other | 6.7 | 6.1 | 5.4 |
71
+ | test-clean | 2.3 | 2.2 | 2.0 |
72
+ | test-other | 6.2 | 5.8 | 5.2 |
73
+
74
+
75
+ ## Citation
76
+
77
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
78
+
79
+ If you find this work useful, please cite our paper:
80
+ ```
81
+ @article{gu2025omnirouter,
82
+ title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
83
+ author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
84
+ journal={arXiv preprint arXiv:2507.05724},
85
+ year={2025}
86
+ }
87
+ ```
88
+
89
+ ## Model Card Contact
90
+
91
+ Contact zijin@apple.com for any issues.