--- license: apple-amlr language: - en metrics: - wer pipeline_tag: automatic-speech-recognition tags: - asr - mixture-of-experts - speech --- # Model Card for Model ID Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details. ## Model Details ### Model Description This model is a dense model (84M). - **Developed by:** Apple Machine Learning Research - **Model type:** ASR - **Language(s):** English - **License:** apple-amlr ## Uses This model is a speech recognition model. ## How to Get Started with the Model Please refer to the [github](https://github.com/apple/ml-omni-router-moe-asr) page for detailed usage. ## Training Details ### Training Data It is trained on the [Libriheavy](https://github.com/k2-fsa/libriheavy) dataset. ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data This model is evaluated on [Librispeech](https://www.openslr.org/12) dev/test sets. #### Metrics Word Error Rate (WER). ### Results | | Dense | Switch | Omni-router | |---|---|---|---| | | 84M | 8 x 84M | 8 x 84M | | dev-clean | 2.1 | 1.9 | 1.8 | | dev-other | 6.7 | 6.1 | 5.4 | | test-clean | 2.3 | 2.2 | 2.0 | | test-other | 6.2 | 5.8 | 5.2 | ## Citation If you find this work useful, please cite our paper: ``` @article{gu2025omnirouter, title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition}, author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep}, journal={arXiv preprint arXiv:2507.05724}, year={2025} } ``` ## Model Card Contact Contact zijin@apple.com for any issues.