Spaces:

MeissonFlow
/

README

Running

App Files Files Community

README / README.md

BryanW

Update README.md

14bf825 verified 7 days ago

preview code

raw

history blame contribute delete

2.84 kB

	---
	title: README
	emoji: 🏃
	colorFrom: yellow
	colorTo: indigo
	sdk: static
	pinned: false
	---

	# MeissonFlow Research [[Join us]](mailto:jinbin5bai@gmail.com)

	MeissonFlow Research is a non-commercial research group dedicated to advancing generative modeling techniques for structured visual and multimodal content creation.
	We aim to design models and algorithms that help creators produce high-quality content with greater efficiency and control.

	Our journey began with [MaskGIT](https://arxiv.org/abs/2202.04200), a pioneering work by [Huiwen Chang](https://scholar.google.com/citations?hl=en&user=eZQNcvcAAAAJ), which introduced a bidirectional transformer decoder for image synthesis—outperforming traditional raster-scan autoregressive (AR) generation.
	This paradigm was later extended to text-to-image synthesis in [MUSE](https://arxiv.org/abs/2301.00704).

	Building upon these foundations, we scaled masked generative modeling with the latest architectural designs and sampling strategies—culminating in [Monetico and Meissonic](https://github.com/viiika/Meissonic) from scratch, which on par with leading diffusion models such as SDXL, while maintaining greater efficiency.

	Having verified the effectiveness of this approach, we began to ask a deeper question — one that reaches beyond performance benchmarks: what foundations are required for general-purpose generative intelligence?
	Through discussions with researchers at Safe Superintelligence (SSI) Club, University of Illinois Urbana-Champaign (UIUC) and Riot Video Games, we converged on the vision of a visual-centric world model — a generative and interactive system capable of simulating, interacting with, and reasoning about multimodal environments.

	> We believe that masking is a fundamental abstraction for building such controllable, efficient, and generalizable intelligence.

	A similar vision was shared by [Stefano Ermon](https://cs.stanford.edu/~ermon/) at ICLR 2025, where he described Diffusion as a unified paradigm for a multi-modal world model — a message that echoes and strengthens our belief: that unified generative modeling is the path toward general-purpose superintelligence.

	To pursue this vision, we introduced [Muddit and Muddit Plus](https://github.com/M-E-AGI-Lab/Muddit), unified generative models built upon visual priors (Meissonic), and capable of unified generation across text and image within a single architecture and paradigm.

	We look forward to releasing more models and algorithms in this direction.
	We thank our amazing teammates — and you, the reader — for your interest in our work.

	Special thanks to [Style2Paints Research](https://lllyasviel.github.io/Style2PaintsResearch/), which helped shape our taste and research direction in the early days.