Spaces:
Running
on
Zero
Running
on
Zero
# CLAUDE.md | |
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. | |
## Project Overview | |
USO FLUX is a unified style-subject optimized customization model for generating images with combined subjects and styles. This is a Gradio-based web application that provides a user interface for the USO (Unified Style-subject Optimization) image generation pipeline. | |
## Development Commands | |
### Running the Application | |
```bash | |
python app.py --name flux-dev --device cuda --port 7860 | |
``` | |
### Common Parameters for app.py | |
- `--name`: Model type (`flux-dev`, `flux-dev-fp8`, `flux-schnell`, `flux-krea-dev`) | |
- `--device`: Device to run on (`cuda` or `cpu`) | |
- `--offload`: Enable sequential CPU offloading for memory efficiency | |
- `--port`: Server port (default: 7860) | |
### Installing Dependencies | |
```bash | |
pip install -r requirements.txt | |
``` | |
## Architecture Overview | |
### Core Components | |
1. **USO Pipeline** (`uso/flux/pipeline.py`) | |
- Main inference pipeline integrating FLUX diffusion model with USO customization | |
- Handles image preprocessing, encoding, and generation | |
- Supports multiple reference images (content + style references) | |
2. **FLUX Model** (`uso/flux/model.py`) | |
- Transformer-based diffusion model implementation | |
- Uses double-stream and single-stream attention blocks | |
- Integrates SigLIP vision encoder for image understanding | |
3. **Gradio Interface** (`app.py`) | |
- Web UI with support for text prompts and multiple image inputs | |
- Configurable generation parameters (steps, guidance, dimensions) | |
- Example gallery with pre-configured use cases | |
### Module Structure | |
- `uso/flux/modules/`: Core neural network components | |
- `layers.py`: Attention blocks, embeddings, LoRA processors | |
- `autoencoder.py`: VAE for image encoding/decoding | |
- `conditioner.py`: Text and image conditioning | |
- `uso/flux/sampling.py`: Diffusion sampling and denoising | |
- `uso/flux/util.py`: Model loading utilities and checkpoints | |
## Key Features | |
### Usage Modes | |
1. **Content-only**: Subject/identity-driven generation or style editing | |
2. **Style-only**: Generate anything following style reference | |
3. **Content + Style**: Combine specific subjects with desired styles | |
### Model Capabilities | |
- Supports 1024x1024 resolution generation | |
- Multi-style reference support (beta) | |
- Layout preservation and layout-shifting modes | |
- LoRA-based efficient fine-tuning (rank 128) | |
## Configuration | |
### Example Configurations | |
Examples are stored in `assets/gradio_examples/` with JSON configs containing: | |
- `prompt`: Text description | |
- `image_ref1`: Content reference image path | |
- `image_ref2`: Style reference image path | |
- `image_ref3`: Additional style reference (beta) | |
- `seed`: Random seed for reproducibility | |
### Model Loading | |
The application automatically downloads models from Hugging Face: | |
- Main USO model: `bytedance-research/USO` | |
- SigLIP vision encoder: `google/siglip-so400m-patch14-384` | |
## Dependencies | |
Key dependencies include: | |
- PyTorch 2.4.0 with CUDA 12.4 support | |
- Transformers 4.43.3 for model components | |
- Diffusers 0.30.1 for diffusion utilities | |
- Gradio 5.22.0 for web interface | |
- Accelerate 1.1.1 and DeepSpeed 0.14.4 for optimization |