metadata
license: apache-2.0
language:
- en
base_model:
- stable-diffusion-v1-5/stable-diffusion-v1-5
- liuhaotian/llava-llama-2-13b-chat-lightning-preview
tags:
- Image-to-Image
- Action-Generation
- HOI
- Egocentric-Vision
- Vision-Language-Model
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
ECCV 2024 (Oral, Best Paper Finalist)
Project Page | Paper | Dataset | Code
Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu

This repo is the model weights finetuned on Epic-Kitchens for our paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning". More repos are available in this collection.
Please refer to the code on github for detailed instructions on how to use it.
If you find LEGO useful for your work, please cite using this BibTeX.
@inproceedings{lai2024lego,
title={Lego: Learning egocentric action frame generation via visual instruction tuning},
author={Lai, Bolin and Dai, Xiaoliang and Chen, Lawrence and Pang, Guan and Rehg, James M and Liu, Miao},
booktitle={European Conference on Computer Vision},
pages={135--155},
year={2024},
organization={Springer}
}