SpeechModels
Collection
A collection of speech models.
•
5 items
•
Updated
Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the paper for details.
This model is a dense model (84M).
This model is a speech recognition model.
Please refer to the github page for detailed usage.
It is trained on the Libriheavy dataset.
This model is evaluated on Librispeech dev/test sets.
Word Error Rate (WER).
| Dense | Switch | Omni-router | |
|---|---|---|---|
| 84M | 8 x 84M | 8 x 84M | |
| dev-clean | 2.1 | 1.9 | 1.8 |
| dev-other | 6.7 | 6.1 | 5.4 |
| test-clean | 2.3 | 2.2 | 2.0 |
| test-other | 6.2 | 5.8 | 5.2 |
If you find this work useful, please cite our paper:
@article{gu2025omnirouter,
title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
journal={arXiv preprint arXiv:2507.05724},
year={2025}
}
Contact zijin@apple.com for any issues.