Model Card for Model ID

Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the paper for details.

Model Details

Model Description

This ASR model is a 4-expert MoE model (total 613M with 200M activate parameters). The model is streaming which transcribes speech conditioned only on past and current speech.

Developed by: Apple Machine Learning Research
Model type: ASR
Language(s): English
License: apple-amlr

Uses

This model is a speech recognition model.

How to Get Started with the Model

Please refer to the github page for detailed usage.

Training Details

Training Data

The training data is a large-scale conversational audio dataset collected from publicly accessible sources, named SpeechCrawl. Please refer to the paper for details.

Citation

If you find this work useful, please cite our paper:

@article{gu2025omnirouter,
  title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
  author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
  journal={arXiv preprint arXiv:2507.05724},
  year={2025}
}

Model Card Contact

Contact zijin@apple.com for any issues.

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including lelegu/omni-router-speechcrawl-streaming-asr-0.6b-v1

SpeechModels

Collection

A collection of speech models. • 5 items • Updated 9 days ago