YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ICVE: In-Context Learning with Unpaired Clips for
Instruction-based Video Editing

arXiv HuggingFace

Xinyao Liao1,2, Xianfang Zeng2, Ziye Song1, Zhoujie Fu1,2, Gang Yu2*, Guosheng Lin1*

1 Nanyang Technological University    2 StepFun

πŸŽ‰ Updates

🧩 Overview

ICVE proposes a low-cost pretraining strategy for instruction-based video editing via in-context learning from unpaired clips. Built upon HunyuanVideoT2V, it first learns editing concepts from about 1M unpaired videos, then fine-tunes on <150K paired editing data for improved instruction alignment and visual quality β€” enabling general editing operations guided by natural language.

πŸŽ₯ Video Demo

ICVE Demo Video
Click the image above to watch the full video on YouTube 🎬

πŸ› οΈ Dependencies and Installation

Begin by cloning the repository:

git clone https://github.com/leoisufa/ICVE.git
cd ICVE

We recommend CUDA versions 12.4 or 11.8 for the manual installation.

# 1. Create conda environment
conda create -n icve python==3.10.9

# 2. Activate the environment
conda activate icve

# 3. Install PyTorch and other dependencies using conda
# For CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# 4. Install pip dependencies
python -m pip install -r requirements.txt

# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3

🧱 Download Models

  1. HunyuanVideo Pretrained Weights
    Follow the official HunyuanVideo instructions here:
    πŸ‘‰ Download Pretrained Models
    and place the downloaded weights into the ckpts/ directory as shown above.
  2. ICVE Checkpoint
    Download the our model weights from
    πŸ‘‰ Hugging Face
    and place them in the checkpoint/ directory.

The folder structure of this project should look like this after setup:

ICVE/
β”œβ”€β”€ assets/
β”œβ”€β”€ checkpoint/ # Our model checkpoint
β”‚   β”œβ”€β”€ config.json
β”‚   └── diffusion_pytorch_model.safetensors
β”œβ”€β”€ ckpts/  # Pretrained weights from HunyuanVideo
β”‚   β”œβ”€β”€ hunyuan-video-t2v-720p
β”‚   β”œβ”€β”€ text_encoder
β”‚   └── text_encoder_2
β”œβ”€β”€ hyvideo/
β”œβ”€β”€ scripts/
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ sample_video.py
└── README.md

πŸš€ Running the Demos

You can directly run the provided demo scripts under the scripts/ directory.

Alternatively, you can manually run the example command below:

python sample_video.py \
    --dit-weight checkpoint/diffusion_pytorch_model.safetensors \
    --video-size 384 240 \
    --video-length 81 \
    --infer-steps 50 \
    --prompt "Add black glasses to the person's face." \
    --video "assets/glasses.mp4" \
    --seed 42 \
    --embedded-cfg-scale 1.0 \
    --cfg-scale 6.0 \
    --flow-shift 7.0 \
    --flow-reverse \
    --use-cpu-offload \
    --save-path ./results

πŸ™ Acknowledgements

We thank the following prior art for their excellent open source work:

πŸ”— BibTeX

If you find ICEV useful for your research and applications, please cite using this BibTeX:

@article{xu2025withanyone,
  title={In-Context Learning with Unpaired Clips for Instruction-based Video Editing}, 
  author={Xinyao Liao and Xianfang Zeng and Ziye Song and Zhoujie Fu and Gang Yu and Guosheng Lin},
  journal={arXiv preprint arxiv:2510.14648},
  year={2025}
}
Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support