File size: 1,862 Bytes
c9a6f5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
pipeline_tag: image-to-text
license: apache-2.0
tags:
- Non-Autoregressive
- Masked-Generative-Transformer
- Discrete-Diffusion
- Unified-Model
language:
- en
---
# Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
[Paper](https://arxiv.org/abs/2505.23606) | [Model](https://huggingface.co/MeissonFlow/Muddit) | [Code](https://github.com/M-E-AGI-Lab/Muddit) | [Demo](https://huggingface.co/spaces/MeissonFlow/muddit)

## Introduction
Welcome to the official repository of **Muddit** — a next-generation foundation model in the Meissonic family, built upon discrete diffusion for unified and efficient multimodal generation.
Unlike traditional autoregressive methods, **Muddit** leverages discrete diffusion (a.k.a. MaskGIT-style masking) as its core mechanism — enabling fast, parallel decoding across modalities.
While most unified models are still rooted in language priors, **Muddit** is developed from a visual-first perspective for scalable and flexible generation.
Muddit (512) and Muddit Plus (1024) aim to handle diverse tasks across modalities -- such as text generation, image generation, and vision-language reasoning -- within a single architecture and decoding paradigm.
## Usage
Please refer to [github link](https://github.com/M-E-AGI-Lab/Muddit).
## Citation
If you find this work helpful, please consider citing:
```bibtex
@article{shi2025muddit,
title={Muddit: Liberating generation beyond text-to-image with a unified discrete diffusion model},
author={Shi, Qingyu and Bai, Jinbin and Zhao, Zhuoran and Chai, Wenhao and Yu, Kaidong and Wu, Jianzong and Song, Shuangyong and Tong, Yunhai and Li, Xiangtai and Li, Xuelong and others},
journal={arXiv preprint arXiv:2505.23606},
year={2025}
}
``` |