lelegu
/

dense-asr-libriheavy-0.08b

@@ -1,90 +1,91 @@
----
-license: apple-amlr
-language:
-- en
-metrics:
-- wer
-pipeline_tag: automatic-speech-recognition
-tags:
-- asr
-- mixture-of-experts
-- speech
----
-# Model Card for Model ID
-Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers.
-## Model Details
-### Model Description
-This model is a dense model (84M).
-- **Developed by:** Apple Machine Learning Research
-- **Model type:** ASR
-- **Language(s):** English
-- **License:** apple-amlr
-## Uses
-This model is a speech recognition model.
-## How to Get Started with the Model
-Please refer to the github pages for detailed usage.
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-It is trained on the Libriheavy dataset.
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-This model is evaluated on Librispeech dev/test sets.
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-Word Error Rate (WER).
-### Results
-| | Dense | Switch | Omni-router |
-|---|---|---|---|
-| | 84M | 8 x 84M | 8 x 84M |
-| dev-clean | 2.1 | 1.9 | 1.8 |
-| dev-other | 6.7 | 6.1 | 5.4 |
-| test-clean | 2.3 | 2.2 | 2.0 |
-| test-other | 6.2 | 5.8 | 5.2 |
-## Citation
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-If you find this work useful, please cite our paper:
-```
-@article{gu2025omnirouter,
-  title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
-  author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
-  journal={arXiv preprint arXiv:2507.05724},
-  year={2025}
-}
-```
-## Model Card Contact
-Contact zijin@apple.com for any issues.

+---
+license: apple-amlr
+language:
+- en
+metrics:
+- wer
+pipeline_tag: automatic-speech-recognition
+tags:
+- asr
+- mixture-of-experts
+- speech
+---
+# Model Card for Model ID
+Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers.
+Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details.
+## Model Details
+### Model Description
+This model is a dense model (84M).
+- **Developed by:** Apple Machine Learning Research
+- **Model type:** ASR
+- **Language(s):** English
+- **License:** apple-amlr
+## Uses
+This model is a speech recognition model.
+## How to Get Started with the Model
+Please refer to the [github](https://github.com/apple/ml-omni-router-moe-asr) page for detailed usage.
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+It is trained on the [Libriheavy](https://github.com/k2-fsa/libriheavy) dataset.
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+This model is evaluated on [Librispeech](https://www.openslr.org/12) dev/test sets.
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+Word Error Rate (WER).
+### Results
+| | Dense | Switch | Omni-router |
+|---|---|---|---|
+| | 84M | 8 x 84M | 8 x 84M |
+| dev-clean | 2.1 | 1.9 | 1.8 |
+| dev-other | 6.7 | 6.1 | 5.4 |
+| test-clean | 2.3 | 2.2 | 2.0 |
+| test-other | 6.2 | 5.8 | 5.2 |
+## Citation
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+If you find this work useful, please cite our paper:
+```
+@article{gu2025omnirouter,
+  title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
+  author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
+  journal={arXiv preprint arXiv:2507.05724},
+  year={2025}
+}
+```
+## Model Card Contact
+Contact zijin@apple.com for any issues.