ByteDance-USO

Running on Zero

File size: 3,214 Bytes

af09165

# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

USO FLUX is a unified style-subject optimized customization model for generating images with combined subjects and styles. This is a Gradio-based web application that provides a user interface for the USO (Unified Style-subject Optimization) image generation pipeline.

## Development Commands

### Running the Application
```bash
python app.py --name flux-dev --device cuda --port 7860
```

### Common Parameters for app.py
- `--name`: Model type (`flux-dev`, `flux-dev-fp8`, `flux-schnell`, `flux-krea-dev`)
- `--device`: Device to run on (`cuda` or `cpu`)
- `--offload`: Enable sequential CPU offloading for memory efficiency
- `--port`: Server port (default: 7860)

### Installing Dependencies
```bash
pip install -r requirements.txt
```

## Architecture Overview

### Core Components

1. **USO Pipeline** (`uso/flux/pipeline.py`)
   - Main inference pipeline integrating FLUX diffusion model with USO customization
   - Handles image preprocessing, encoding, and generation
   - Supports multiple reference images (content + style references)

2. **FLUX Model** (`uso/flux/model.py`)
   - Transformer-based diffusion model implementation
   - Uses double-stream and single-stream attention blocks
   - Integrates SigLIP vision encoder for image understanding

3. **Gradio Interface** (`app.py`)
   - Web UI with support for text prompts and multiple image inputs
   - Configurable generation parameters (steps, guidance, dimensions)
   - Example gallery with pre-configured use cases

### Module Structure
- `uso/flux/modules/`: Core neural network components
  - `layers.py`: Attention blocks, embeddings, LoRA processors
  - `autoencoder.py`: VAE for image encoding/decoding
  - `conditioner.py`: Text and image conditioning
- `uso/flux/sampling.py`: Diffusion sampling and denoising
- `uso/flux/util.py`: Model loading utilities and checkpoints

## Key Features

### Usage Modes
1. **Content-only**: Subject/identity-driven generation or style editing
2. **Style-only**: Generate anything following style reference
3. **Content + Style**: Combine specific subjects with desired styles

### Model Capabilities
- Supports 1024x1024 resolution generation
- Multi-style reference support (beta)
- Layout preservation and layout-shifting modes
- LoRA-based efficient fine-tuning (rank 128)

## Configuration

### Example Configurations
Examples are stored in `assets/gradio_examples/` with JSON configs containing:
- `prompt`: Text description
- `image_ref1`: Content reference image path
- `image_ref2`: Style reference image path  
- `image_ref3`: Additional style reference (beta)
- `seed`: Random seed for reproducibility

### Model Loading
The application automatically downloads models from Hugging Face:
- Main USO model: `bytedance-research/USO`
- SigLIP vision encoder: `google/siglip-so400m-patch14-384`

## Dependencies

Key dependencies include:
- PyTorch 2.4.0 with CUDA 12.4 support
- Transformers 4.43.3 for model components
- Diffusers 0.30.1 for diffusion utilities
- Gradio 5.22.0 for web interface
- Accelerate 1.1.1 and DeepSpeed 0.14.4 for optimization