1 Nanyang Technological University ββ 2 StepFun
π Updates
- [10/2025] π₯ Model checkpoints is released!
- [10/2025] π₯ Codebase is relased!
π§© Overview
ICVE proposes a low-cost pretraining strategy for instruction-based video editing via in-context learning from unpaired clips. Built upon HunyuanVideoT2V, it first learns editing concepts from about 1M unpaired videos, then fine-tunes on <150K paired editing data for improved instruction alignment and visual quality β enabling general editing operations guided by natural language.
π₯ Video Demo
Click the image above to watch the full video on YouTube π¬
π οΈ Dependencies and Installation
Begin by cloning the repository:
git clone https://github.com/leoisufa/ICVE.git
cd ICVE
We recommend CUDA versions 12.4 or 11.8 for the manual installation.
# 1. Create conda environment
conda create -n icve python==3.10.9
# 2. Activate the environment
conda activate icve
# 3. Install PyTorch and other dependencies using conda
# For CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# 4. Install pip dependencies
python -m pip install -r requirements.txt
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
π§± Download Models
- HunyuanVideo Pretrained Weights
Follow the official HunyuanVideo instructions here:
π Download Pretrained Models
and place the downloaded weights into theckpts/
directory as shown above. - ICVE Checkpoint
Download the our model weights from
π Hugging Face
and place them in thecheckpoint/
directory.
The folder structure of this project should look like this after setup:
ICVE/
βββ assets/
βββ checkpoint/ # Our model checkpoint
β βββ config.json
β βββ diffusion_pytorch_model.safetensors
βββ ckpts/ # Pretrained weights from HunyuanVideo
β βββ hunyuan-video-t2v-720p
β βββ text_encoder
β βββ text_encoder_2
βββ hyvideo/
βββ scripts/
βββ requirements.txt
βββ sample_video.py
βββ README.md
π Running the Demos
You can directly run the provided demo scripts under the scripts/
directory.
Alternatively, you can manually run the example command below:
python sample_video.py \
--dit-weight checkpoint/diffusion_pytorch_model.safetensors \
--video-size 384 240 \
--video-length 81 \
--infer-steps 50 \
--prompt "Add black glasses to the person's face." \
--video "assets/glasses.mp4" \
--seed 42 \
--embedded-cfg-scale 1.0 \
--cfg-scale 6.0 \
--flow-shift 7.0 \
--flow-reverse \
--use-cpu-offload \
--save-path ./results
π Acknowledgements
We thank the following prior art for their excellent open source work:
π BibTeX
If you find ICEV useful for your research and applications, please cite using this BibTeX:
@article{xu2025withanyone,
title={In-Context Learning with Unpaired Clips for Instruction-based Video Editing},
author={Xinyao Liao and Xianfang Zeng and Ziye Song and Zhoujie Fu and Gang Yu and Guosheng Lin},
journal={arXiv preprint arxiv:2510.14648},
year={2025}
}
- Downloads last month
- 18