CoMPaSS-FLUX.1 / README.md
Gaoyang Zhang
include ComfyUI support in README
c1a3627 unverified
---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
widget:
- text: a photo of a laptop above a dog
output:
url: images/laptop-above-dog.jpg
- text: a photo of a bird below a skateboard
output:
url: images/bird-below-skateboard.jpg
- text: a photo of a horse to the left of a bottle
output:
url: images/horse-left-bottle.jpg
base_model: black-forest-labs/FLUX.1-dev
instance_prompt: null
license: other
license_name: compass-lora-weights-nc-license
license_link: LICENSE
---
# CoMPaSS-FLUX.1
\[[Project Page]\]
\[[code]\]
\[[arXiv]\]
<Gallery />
## Model description
# CoMPaSS-FLUX.1
A LoRA adapter that enhances spatial understanding capabilities of the FLUX.1 text-to-image
diffusion model. This model demonstrates significant improvements in generating images with specific
spatial relationships between objects.
## Model Details
- **Base Model**: FLUX.1-dev
- **LoRA Rank**: 16
- **Training Data**: SCOP dataset (curated from COCO)
- **File Size**: ~50MiB
- **Framework**: Diffusers
- **License**: Non-Commercial (see [./LICENSE])
## ComfyUI Support
We provide a custom node with examples at [comfyui-node-impl]. Use the
ComfyUI-compatible LoRA checkpoint [comfyui-checkpoint] to get started.
## Intended Use
- Generating images with accurate spatial relationships between objects
- Creating compositions that require specific spatial arrangements
- Enhancing the base model's spatial understanding while maintaining its other capabilities
## Performance
### Key Improvements
- VISOR benchmark: +98% relative improvement
- T2I-CompBench Spatial: +67% relative improvement
- GenEval Position: +131% relative improvement
- Maintains or improves base model's image fidelity (lower FID and CMMD scores than base model)
## Using the Model
See our [GitHub repository][code] to get started.
### Effective Prompting
The model works well with:
- Clear spatial relationship descriptors (left, right, above, below)
- Pairs of distinct objects
- Explicit spatial relationships (e.g., "a photo of A to the right of B")
## Training Details
### Training Data
- Built using the SCOP (Spatial Constraints-Oriented Pairing) data engine
- ~28,000 curated object pairs from COCO
- Enforces criteria for:
- Visual significance
- Semantic distinction
- Spatial clarity
- Object relationships
- Visual balance
### Training Process
- Trained for 24,000 steps
- Batch size of 4
- Learning rate: 1e-4
- Optimizer: AdamW with β₁=0.9, β₂=0.999
- Weight decay: 1e-2
## Evaluation Results
| Metric | FLUX.1 | +CoMPaSS |
|--------|-------------|-----------|
| VISOR uncond (⬆️) | 37.96% | **75.17%** |
| T2I-CompBench Spatial (⬆️) | 0.18 | **0.30** |
| GenEval Position (⬆️) | 0.26 | **0.60** |
| FID (⬇️) | 27.96 | **26.40** |
| CMMD (⬇️) | 0.8737 | **0.6859** |
## Citation
If you use this model in your research, please cite:
```bibtex
@inproceedings{zhang2025compass,
title={CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models},
author={Zhang, Gaoyang and Fu, Bingtao and Fan, Qingnan and Zhang, Qi and Liu, Runxing and Gu, Hong and Zhang, Huaqi and Liu, Xinguo},
booktitle={ICCV},
year={2025}
}
```
## Contact
For questions about the model, please contact <blurgy@zju.edu.cn>
## Download model
Weights for this model are available in Safetensors format.
[Download](/blurgy/CoMPaSS-FLUX.1/tree/main) them in the Files & versions tab.
[comfyui-node-impl]: <https://github.com/blurgyy/CoMPaSS-FLUX.1-dev-ComfyUI>
[comfyui-checkpoint]: <./CoMPaSS-FLUX.1-comfyui.safetensors>
[./LICENSE]: <./LICENSE>
[Project page]: <https://compass.blurgy.xyz>
[code]: <https://github.com/blurgyy/CoMPaSS>
[arXiv]: <https://arxiv.org/abs/2412.13195>