Spaces:
Running
on
Zero
Running
on
Zero
title: EditP23 | |
emoji: π¨ | |
colorFrom: blue | |
colorTo: purple | |
sdk: gradio | |
sdk_version: 5.38.2 | |
app_file: app.py | |
pinned: false | |
# EditP23: 3D Editing via Propagation of Image Prompts to Multi-View | |
[](https://editp23.github.io/) | |
[](https://arxiv.org/abs/2506.20652) | |
This repository contains the official implementation for **EditP23**, a method for fast, mask-free 3D editing that propagates 2D image edits to multi-view representations in a 3D-consistent manner. | |
The edit is guided by an image pair, allowing users to leverage any preferred 2D editing tool, from manual painting to generative pipelines. | |
### Installation | |
<details> | |
<summary>Click to expand installation instructions</summary> | |
This project was tested on a Linux system with Python 3.11 and CUDA 12.6. | |
**1. Clone the Repository** | |
```bash | |
git clone --recurse-submodules https://github.com/editp23/EditP23.git | |
cd EditP23 | |
``` | |
**2. Install Dependencies** | |
```bash | |
conda create -n editp23 python=3.11 -y | |
conda activate editp23 | |
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 # Ensure compatibility with your CUDA version. (tested with torch 2.6, cuda 12.6) | |
pip install diffusers==0.30.1 transformers accelerate pillow huggingface_hub numpy tqdm | |
``` | |
</details> | |
### Quick Start | |
**1. Prepare Your Experiment Directory** | |
Create a directory for your experiment. Inside this directory, you must place three specific PNG files: | |
* `src.png`: The original, unedited view of your object. | |
* `edited.png`: The same view after you have applied your desired 2D edit. | |
* `src_mv.png`: The multi-view grid of the original object, which will be edited. | |
Your directory structure should look like this: | |
```text | |
examples/ | |
βββ robot_sunglasses/ | |
βββ src.png | |
βββ edited.png | |
βββ src_mv.png | |
``` | |
**2. Run the Editing Script** | |
Execute the `main.py` script, pointing it to your experiment directory. You can adjust the guidance parameters based on the complexity of your edit. | |
#### Execution Examples | |
* **Mild Edit (Appearance Change):** | |
```bash | |
python src/main.py --exp_dir examples/robot_sunglasses --tar_guidance_scale 5.0 --n_max 31 | |
``` | |
* **Hard Edit (Large Geometry Change):** | |
```bash | |
python src/main.py --exp_dir examples/deer_wings --tar_guidance_scale 21.0 --n_max 39 | |
``` | |
The output will be saved in the `output/` subdirectory within your experiment folder. | |
### Command-Line Arguments | |
* `--exp_dir`: (Required) Path to the experiment directory. | |
* `--T_steps`: Total number of denoising steps. Default: `50`. | |
* `--n_max`: The number of denoising steps to apply edit-aware guidance. Higher values can help with more complex edits. Default: `31`. This value shouldn't exceed `T_steps`. | |
* `--src_guidance_scale`: CFG scale for the source condition. Can typically remain constant. Default: `3.5`. | |
* `--tar_guidance_scale`: CFG scale for the target (edited) condition. Higher values apply the edit more strongly. Default: `5.0`. | |
* `--seed`: Random seed for reproducibility. Default: `18`. | |
# Results in Multi-View | |
### Deer - Pixar style & Wings | |
| | Cond. View | View 1 | View 2 | View 3 | | |
| :--- |:-----------------------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:| | |
| **Original** |  |  |  |  | | |
| **Pixar style** |  |  |  |  | | |
| **Wings** |  |  |  |  | | |
<br> | |
### Person - Old & Zombie | |
| | Cond. View | View 1 | View 2 | View 3 | | |
|:-------------|:-----------------------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:|:----------------------------------------------------:| | |
| **Original** |  |  |  |  | | |
| **Old** |  |  |  |  | | |
| **Zombie** |  |  |  |  | | |
# Project Structure | |
The repository is organized as follows: | |
```text | |
EditP23/ | |
βββ examples/ # Example assets for quick testing | |
β βββ deer_wings/ | |
β β βββ src.png | |
β β βββ edited.png | |
β β βββ src_mv.png | |
β βββ robot_sunglasses/ | |
β βββ ... | |
βββ assets/ # Raw asset files | |
β βββ stormtrooper.glb | |
βββ scripts/ # Helper scripts for data preparation | |
β βββ render_mesh.py | |
β βββ img2mv.py | |
βββ src/ # Main source code | |
β βββ init.py | |
β βββ edit_mv.py | |
β βββ main.py | |
β βββ pipeline.py | |
β βββ utils.py | |
βββ .gitignore | |
βββ README.md | |
``` | |
# Utilities | |
## Setup | |
This guide shows how to prepare inputs for **EditP23** and run an edit. | |
These helper scripts create the three PNG files every experiment needs: | |
| File | Purpose | | |
|---------------|-----------------------------------------------------------------| | |
| `src.png` | Original single view (the one you will edit). | | |
| `edited.png` | Your 2D edit of `src.png`. | | |
| `src_mv.png` | 6-view grid of the original object. | | |
### 1. Generate `src.png` and `src_mv.png` | |
**EditP23** needs a **source view** (`src.png`) and a **multi-view grid** (`src_mv.png`). | |
The grid contains six extra views at fixed azimuth/elevation pairs: | |
Angles (azimuth, elevation): `(30Β°, 20Β°) (90Β°, -10Β°) (150Β°, 20Β°) (210Β°, -10Β°) (270Β°, 20Β°) (330Β°, -10Β°)` and for the prompt view `(0Β°, 20Β°)`. | |
We provide two methods to generate these inputs. Both methods produce views on a clean, white background. | |
Both methods below produce the multi-view grid and the source view from the relevant angles on a white background. | |
#### Method A: From a Single Image | |
You can generate the multi-view grid from a single image of an object using our `img2mv.py` script. This script leverages the Zero123++ pipeline with a checkpoint from InstantMesh, which is fine-tuned to produce white backgrounds. | |
```bash | |
# This script takes a single input image and generates the corresponding multi-view grid. | |
python scripts/img2mv.py \ | |
--input_image "examples/robot_sunglasses/src.png" \ | |
--output_dir "examples/robot_sunglasses/" | |
``` | |
**Note:** In this case, `src.png` serves as the source view for EditP23. | |
#### Method B: From a 3D Mesh | |
If you have a 3D model, you can use our Blender script to render both the source view and the multi-view grid. | |
**Prerequisite:** This script requires Blender (`pip install bpy`). | |
```bash | |
# This script renders a source view and a multi-view grid from a 3D mesh. | |
python scripts/render_mesh.py \ | |
--mesh_path "assets/stormtrooper.glb" \ | |
--output_dir "examples/stormtrooper/" | |
``` | |
### 2. Generating `edited.png` | |
Once you have your **source view**, you can use any 2D image editor to make your desired changes. We use this user-provided edit to guide the 3D modification. | |
For quick edits, you can use readily available online tools, such as the following HuggingFace Spaces: | |
- [FlowEdit](https://huggingface.co/spaces/fallenshock/FlowEdit): Excellent for global, structural edits. | |
- [Flux-Inpainting](https://huggingface.co/spaces/black-forest-labs/FLUX.1-Fill-dev): Great for local modifications and inpainting. | |
## Reconstruction | |
After generating an edited multi-view image (`edited_mv.png`) with our main script, you can reconstruct it into a 3D model. We provide a helper script that uses the [InstantMesh](https://github.com/TencentARC/InstantMesh) framework to produce a textured `.obj` file and a turntable video. | |
### Additional Dependencies | |
First, you'll need to install several libraries required for the reconstruction process. | |
<details> | |
<summary>Click to expand installation instructions</summary> | |
```bash | |
# Install general dependencies | |
pip install opencv-python einops xatlas imageio[ffmpeg] | |
# Install NVIDIA's nvdiffrast library | |
pip install git+https://github.com/NVlabs/nvdiffrast/ | |
# For video export, ensure ffmpeg is installed | |
# On conda, you can run: | |
conda install ffmpeg | |
``` | |
</details> | |
### Running the Reconstruction | |
The reconstruction script takes the multi-view PNG as input and generates the 3D assets. The necessary model config file (instant-mesh-large.yaml) is included in the configs/ directory of the InstanMesh repository. | |
#### Example Command | |
````bash | |
python scripts/recon.py \ | |
external/instant-mesh/configs/instant-mesh-large.yaml \ | |
--input_file "examples/robot_sunglasses/output/edited_mv.png" \ | |
--output_dir "examples/robot_sunglasses/output/recon/" | |
```` | |
### Command-Line Arguments | |
Here are the arguments for the recon.py script: | |
| Argument | Description | Default | | |
| :------------ | :----------------------------------------------------------------- | :----------- | | |
| `config` | **(Required)** Path to the InstantMesh model config file. | | | |
| `--input_file`| **(Required)** Path to the multi-view PNG file you want to reconstruct. | | | |
| `--output_dir`| Directory where the output `.obj` and `.mp4` files will be saved. | `"outputs/"` | | |
| `--scale` | Scale of the input cameras. | `1.0` | | |
| `--distance` | Camera distance for rendering the output video. | `4.5` | | |
| `--no_video` | A flag to disable saving the `.mp4` video. | `False` | |