ByteDance-USO / CLAUDE.md
tchung1970's picture
Localize UI to Korean and update title
af09165

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

USO FLUX is a unified style-subject optimized customization model for generating images with combined subjects and styles. This is a Gradio-based web application that provides a user interface for the USO (Unified Style-subject Optimization) image generation pipeline.

Development Commands

Running the Application

python app.py --name flux-dev --device cuda --port 7860

Common Parameters for app.py

  • --name: Model type (flux-dev, flux-dev-fp8, flux-schnell, flux-krea-dev)
  • --device: Device to run on (cuda or cpu)
  • --offload: Enable sequential CPU offloading for memory efficiency
  • --port: Server port (default: 7860)

Installing Dependencies

pip install -r requirements.txt

Architecture Overview

Core Components

  1. USO Pipeline (uso/flux/pipeline.py)

    • Main inference pipeline integrating FLUX diffusion model with USO customization
    • Handles image preprocessing, encoding, and generation
    • Supports multiple reference images (content + style references)
  2. FLUX Model (uso/flux/model.py)

    • Transformer-based diffusion model implementation
    • Uses double-stream and single-stream attention blocks
    • Integrates SigLIP vision encoder for image understanding
  3. Gradio Interface (app.py)

    • Web UI with support for text prompts and multiple image inputs
    • Configurable generation parameters (steps, guidance, dimensions)
    • Example gallery with pre-configured use cases

Module Structure

  • uso/flux/modules/: Core neural network components
    • layers.py: Attention blocks, embeddings, LoRA processors
    • autoencoder.py: VAE for image encoding/decoding
    • conditioner.py: Text and image conditioning
  • uso/flux/sampling.py: Diffusion sampling and denoising
  • uso/flux/util.py: Model loading utilities and checkpoints

Key Features

Usage Modes

  1. Content-only: Subject/identity-driven generation or style editing
  2. Style-only: Generate anything following style reference
  3. Content + Style: Combine specific subjects with desired styles

Model Capabilities

  • Supports 1024x1024 resolution generation
  • Multi-style reference support (beta)
  • Layout preservation and layout-shifting modes
  • LoRA-based efficient fine-tuning (rank 128)

Configuration

Example Configurations

Examples are stored in assets/gradio_examples/ with JSON configs containing:

  • prompt: Text description
  • image_ref1: Content reference image path
  • image_ref2: Style reference image path
  • image_ref3: Additional style reference (beta)
  • seed: Random seed for reproducibility

Model Loading

The application automatically downloads models from Hugging Face:

  • Main USO model: bytedance-research/USO
  • SigLIP vision encoder: google/siglip-so400m-patch14-384

Dependencies

Key dependencies include:

  • PyTorch 2.4.0 with CUDA 12.4 support
  • Transformers 4.43.3 for model components
  • Diffusers 0.30.1 for diffusion utilities
  • Gradio 5.22.0 for web interface
  • Accelerate 1.1.1 and DeepSpeed 0.14.4 for optimization