---
license: apple-amlr
language:
- en
metrics:
- wer
pipeline_tag: automatic-speech-recognition
tags:
- asr
- mixture-of-experts
- speech
---
# Model Card for Model ID

Omni-router Transformer is a new Mixture-of-Experts (MoE) architecture that explicitly couples routing across layers using a shared router to learn strong and specialized experts. Omni-router's routing decisions appear to form consistent temporal segments and strutured usage across model depth, suggesting meaningful coordination between layers.
Please refer to the [paper](https://arxiv.org/abs/2507.05724) for details. 


## Model Details

### Model Description

This model is a dense model (84M).

- **Developed by:** Apple Machine Learning Research
- **Model type:** ASR
- **Language(s):** English
- **License:** apple-amlr

## Uses

This model is a speech recognition model.

## How to Get Started with the Model

Please refer to the [github](https://github.com/apple/ml-omni-router-moe-asr) page for detailed usage.

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

It is trained on the [Libriheavy](https://github.com/k2-fsa/libriheavy) dataset.

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

<!-- This should link to a Dataset Card if possible. -->

This model is evaluated on [Librispeech](https://www.openslr.org/12) dev/test sets.

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

Word Error Rate (WER).

### Results

| | Dense | Switch | Omni-router |
|---|---|---|---|
| | 84M | 8 x 84M | 8 x 84M | 
| dev-clean | 2.1 | 1.9 | 1.8 |
| dev-other | 6.7 | 6.1 | 5.4 |
| test-clean | 2.3 | 2.2 | 2.0 |
| test-other | 6.2 | 5.8 | 5.2 |


## Citation

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

If you find this work useful, please cite our paper:
```
@article{gu2025omnirouter,
  title={Omni-router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition},
  author={Gu, Zijin and Likhomanenko, Tatiana and Jaitly, Navdeep},
  journal={arXiv preprint arXiv:2507.05724},
  year={2025}
}
```

## Model Card Contact

Contact zijin@apple.com for any issues.