Model Card for Model ID

Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers. Please refer to the paper for details.

Model Details

Model Description

This model is a dense model (84M).

Developed by: Apple Machine Learning Research
Model type: ASR
Language(s): English
License: apple-amlr

Uses

This model is a speech recognition model.

How to Get Started with the Model

Please refer to the github page for detailed usage.

Training Details

Training Data

It is trained on the Libriheavy dataset.

Evaluation

Testing Data, Factors & Metrics

Testing Data

This model is evaluated on Librispeech dev/test sets.

Metrics

Word Error Rate (WER).

Results

	Dense	Switch	Omni-router
	84M	8 x 84M	8 x 84M
dev-clean	2.1	1.9	1.8
dev-other	6.7	6.1	5.4
test-clean	2.3	2.2	2.0
test-other	6.2	5.8	5.2

Citation

If you find this work useful, please cite our paper:

@article{gu2025omnirouter,
  title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
  author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
  journal={arXiv preprint arXiv:2507.05724},
  year={2025}
}

Model Card Contact

Contact zijin@apple.com for any issues.

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including lelegu/dense-asr-libriheavy-0.08b

SpeechModels

Collection

A collection of speech models. • 5 items • Updated 10 days ago