Model Card for Model ID
Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the paper for details.
Model Details
Model Description
This ASR model is a 4-expert MoE model (total 613M with 200M activate parameters). The model is streaming which transcribes speech conditioned only on past and current speech.
- Developed by: Apple Machine Learning Research
- Model type: ASR
- Language(s): English
- License: apple-amlr
Uses
This model is a speech recognition model.
How to Get Started with the Model
Please refer to the github page for detailed usage.
Training Details
Training Data
The training data is a large-scale conversational audio dataset collected from publicly accessible sources, named SpeechCrawl. Please refer to the paper for details.
Citation
If you find this work useful, please cite our paper:
@article{gu2025omnirouter,
title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
journal={arXiv preprint arXiv:2507.05724},
year={2025}
}
Model Card Contact
Contact zijin@apple.com for any issues.