Spaces:
Running
Running
File size: 2,837 Bytes
1e232c9 14bf825 6a2acd5 beb7c63 6a2acd5 beb7c63 6a2acd5 beb7c63 b6c5e57 beb7c63 f4fe971 beb7c63 39fa490 beb7c63 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
---
title: README
emoji: 🏃
colorFrom: yellow
colorTo: indigo
sdk: static
pinned: false
---
# MeissonFlow Research [[Join us]](mailto:jinbin5bai@gmail.com)
**MeissonFlow Research** is a non-commercial research group dedicated to advancing generative modeling techniques for structured visual and multimodal content creation.
We aim to design models and algorithms that help creators produce high-quality content with greater efficiency and control.
Our journey began with [**MaskGIT**](https://arxiv.org/abs/2202.04200), a pioneering work by [**Huiwen Chang**](https://scholar.google.com/citations?hl=en&user=eZQNcvcAAAAJ), which introduced a bidirectional transformer decoder for image synthesis—outperforming traditional raster-scan autoregressive (AR) generation.
This paradigm was later extended to text-to-image synthesis in [**MUSE**](https://arxiv.org/abs/2301.00704).
Building upon these foundations, we scaled masked generative modeling with the latest architectural designs and sampling strategies—culminating in [**Monetico** and **Meissonic**](https://github.com/viiika/Meissonic) from scratch, which on par with leading diffusion models such as SDXL, while maintaining greater efficiency.
Having verified the effectiveness of this approach, we began to ask a deeper question — one that reaches beyond performance benchmarks: **what foundations are required for general-purpose generative intelligence**?
Through discussions with researchers at Safe Superintelligence (SSI) Club, University of Illinois Urbana-Champaign (UIUC) and Riot Video Games, we converged on the vision of a **visual-centric world model** — a generative and interactive system capable of simulating, interacting with, and reasoning about multimodal environments.
> We believe that **masking** is a fundamental abstraction for building such controllable, efficient, and generalizable intelligence.
A similar vision was shared by [**Stefano Ermon**](https://cs.stanford.edu/~ermon/) at ICLR 2025, where he described *Diffusion as a unified paradigm for a multi-modal world model* — a message that echoes and strengthens our belief: that unified generative modeling is the path toward general-purpose superintelligence.
To pursue this vision, we introduced [**Muddit** and **Muddit Plus**](https://github.com/M-E-AGI-Lab/Muddit), unified generative models built upon visual priors (Meissonic), and capable of unified generation across text and image within a single architecture and paradigm.
We look forward to releasing more models and algorithms in this direction.
We thank our amazing teammates — and you, the reader — for your interest in our work.
Special thanks to [**Style2Paints Research**](https://lllyasviel.github.io/Style2PaintsResearch/), which helped shape our taste and research direction in the early days.
|