LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization
Paper β’ 2503.08619 β’ Published β’ 20
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("Beckham808/LightGen", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]
This model (LightGen) introduces a novel pre-train pipeline for text-to-image models. It uses knowledge distillation (KD) and Direct Preference Optimization (DPO) to achieve efficient image generation. Drawing inspiration from data KD techniques, LightGen distills knowledge from state-of-the-art text-to-image models into a compact Masked Autoregressive (MAR) architecture with only $0.7B$ parameters.
It is based on this paper, code release on this github repo.
Currently, we just release some checkpoint without DPO