Spaces:
Running
on
Zero
Running
on
Zero
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
USO FLUX is a unified style-subject optimized customization model for generating images with combined subjects and styles. This is a Gradio-based web application that provides a user interface for the USO (Unified Style-subject Optimization) image generation pipeline.
Development Commands
Running the Application
python app.py --name flux-dev --device cuda --port 7860
Common Parameters for app.py
--name
: Model type (flux-dev
,flux-dev-fp8
,flux-schnell
,flux-krea-dev
)--device
: Device to run on (cuda
orcpu
)--offload
: Enable sequential CPU offloading for memory efficiency--port
: Server port (default: 7860)
Installing Dependencies
pip install -r requirements.txt
Architecture Overview
Core Components
USO Pipeline (
uso/flux/pipeline.py
)- Main inference pipeline integrating FLUX diffusion model with USO customization
- Handles image preprocessing, encoding, and generation
- Supports multiple reference images (content + style references)
FLUX Model (
uso/flux/model.py
)- Transformer-based diffusion model implementation
- Uses double-stream and single-stream attention blocks
- Integrates SigLIP vision encoder for image understanding
Gradio Interface (
app.py
)- Web UI with support for text prompts and multiple image inputs
- Configurable generation parameters (steps, guidance, dimensions)
- Example gallery with pre-configured use cases
Module Structure
uso/flux/modules/
: Core neural network componentslayers.py
: Attention blocks, embeddings, LoRA processorsautoencoder.py
: VAE for image encoding/decodingconditioner.py
: Text and image conditioning
uso/flux/sampling.py
: Diffusion sampling and denoisinguso/flux/util.py
: Model loading utilities and checkpoints
Key Features
Usage Modes
- Content-only: Subject/identity-driven generation or style editing
- Style-only: Generate anything following style reference
- Content + Style: Combine specific subjects with desired styles
Model Capabilities
- Supports 1024x1024 resolution generation
- Multi-style reference support (beta)
- Layout preservation and layout-shifting modes
- LoRA-based efficient fine-tuning (rank 128)
Configuration
Example Configurations
Examples are stored in assets/gradio_examples/
with JSON configs containing:
prompt
: Text descriptionimage_ref1
: Content reference image pathimage_ref2
: Style reference image pathimage_ref3
: Additional style reference (beta)seed
: Random seed for reproducibility
Model Loading
The application automatically downloads models from Hugging Face:
- Main USO model:
bytedance-research/USO
- SigLIP vision encoder:
google/siglip-so400m-patch14-384
Dependencies
Key dependencies include:
- PyTorch 2.4.0 with CUDA 12.4 support
- Transformers 4.43.3 for model components
- Diffusers 0.30.1 for diffusion utilities
- Gradio 5.22.0 for web interface
- Accelerate 1.1.1 and DeepSpeed 0.14.4 for optimization