Collections
Discover the best community collections!
Collections including paper arxiv:2403.03206
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 22 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 26
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.52M • • 11.1k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 526k • • 1.22k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • 0.4B • Updated • 11.8k • 479
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 63 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 22 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 26
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 63 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 127 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.52M • • 11.1k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 526k • • 1.22k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • 0.4B • Updated • 11.8k • 479