metadata
license: apple-amlr
language:
- en
metrics:
- wer
pipeline_tag: automatic-speech-recognition
tags:
- asr
- mixture-of-experts
- speech
Model Card for Model ID
Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the paper for details.
Model Details
Model Description
This model is a dense model (84M).
- Developed by: Apple Machine Learning Research
- Model type: ASR
- Language(s): English
- License: apple-amlr
Uses
This model is a speech recognition model.
How to Get Started with the Model
Please refer to the github page for detailed usage.
Training Details
Training Data
It is trained on the Libriheavy dataset.
Evaluation
Testing Data, Factors & Metrics
Testing Data
This model is evaluated on Librispeech dev/test sets.
Metrics
Word Error Rate (WER).
Results
| Dense | Switch | Omni-router | |
|---|---|---|---|
| 84M | 8 x 84M | 8 x 84M | |
| dev-clean | 2.1 | 1.9 | 1.8 |
| dev-other | 6.7 | 6.1 | 5.4 |
| test-clean | 2.3 | 2.2 | 2.0 |
| test-other | 6.2 | 5.8 | 5.2 |
Citation
If you find this work useful, please cite our paper:
@article{gu2025omnirouter,
title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
journal={arXiv preprint arXiv:2507.05724},
year={2025}
}
Model Card Contact
Contact zijin@apple.com for any issues.