ByteDance-USO

Running on Zero

App Files Files Community

ByteDance-USO / CLAUDE.md

tchung1970

Localize UI to Korean and update title

af09165 9 days ago

preview code

raw

history blame contribute delete

3.21 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	USO FLUX is a unified style-subject optimized customization model for generating images with combined subjects and styles. This is a Gradio-based web application that provides a user interface for the USO (Unified Style-subject Optimization) image generation pipeline.

	## Development Commands

	### Running the Application
	```bash
	python app.py --name flux-dev --device cuda --port 7860
	```

	### Common Parameters for app.py
	- `--name`: Model type (`flux-dev`, `flux-dev-fp8`, `flux-schnell`, `flux-krea-dev`)
	- `--device`: Device to run on (`cuda` or `cpu`)
	- `--offload`: Enable sequential CPU offloading for memory efficiency
	- `--port`: Server port (default: 7860)

	### Installing Dependencies
	```bash
	pip install -r requirements.txt
	```

	## Architecture Overview

	### Core Components

	1. USO Pipeline (`uso/flux/pipeline.py`)
	- Main inference pipeline integrating FLUX diffusion model with USO customization
	- Handles image preprocessing, encoding, and generation
	- Supports multiple reference images (content + style references)

	2. FLUX Model (`uso/flux/model.py`)
	- Transformer-based diffusion model implementation
	- Uses double-stream and single-stream attention blocks
	- Integrates SigLIP vision encoder for image understanding

	3. Gradio Interface (`app.py`)
	- Web UI with support for text prompts and multiple image inputs
	- Configurable generation parameters (steps, guidance, dimensions)
	- Example gallery with pre-configured use cases

	### Module Structure
	- `uso/flux/modules/`: Core neural network components
	- `layers.py`: Attention blocks, embeddings, LoRA processors
	- `autoencoder.py`: VAE for image encoding/decoding
	- `conditioner.py`: Text and image conditioning
	- `uso/flux/sampling.py`: Diffusion sampling and denoising
	- `uso/flux/util.py`: Model loading utilities and checkpoints

	## Key Features

	### Usage Modes
	1. Content-only: Subject/identity-driven generation or style editing
	2. Style-only: Generate anything following style reference
	3. Content + Style: Combine specific subjects with desired styles

	### Model Capabilities
	- Supports 1024x1024 resolution generation
	- Multi-style reference support (beta)
	- Layout preservation and layout-shifting modes
	- LoRA-based efficient fine-tuning (rank 128)

	## Configuration

	### Example Configurations
	Examples are stored in `assets/gradio_examples/` with JSON configs containing:
	- `prompt`: Text description
	- `image_ref1`: Content reference image path
	- `image_ref2`: Style reference image path
	- `image_ref3`: Additional style reference (beta)
	- `seed`: Random seed for reproducibility

	### Model Loading
	The application automatically downloads models from Hugging Face:
	- Main USO model: `bytedance-research/USO`
	- SigLIP vision encoder: `google/siglip-so400m-patch14-384`

	## Dependencies

	Key dependencies include:
	- PyTorch 2.4.0 with CUDA 12.4 support
	- Transformers 4.43.3 for model components
	- Diffusers 0.30.1 for diffusion utilities
	- Gradio 5.22.0 for web interface
	- Accelerate 1.1.1 and DeepSpeed 0.14.4 for optimization