|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- stable-diffusion-v1-5/stable-diffusion-v1-5 |
|
|
- liuhaotian/llava-llama-2-13b-chat-lightning-preview |
|
|
tags: |
|
|
- Image-to-Image |
|
|
- Action-Generation |
|
|
- HOI |
|
|
- Egocentric-Vision |
|
|
- Vision-Language-Model |
|
|
--- |
|
|
# LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning |
|
|
|
|
|
### ECCV 2024 (Oral, Best Paper Finalist) |
|
|
|
|
|
[Project Page](https://bolinlai.github.io/Lego_EgoActGen/) | [Paper](https://arxiv.org/pdf/2312.03849) | [Dataset](https://huggingface.co/datasets/bolinlai/LEGO-Dataset) | [Code](https://github.com/BolinLai/LEGO) |
|
|
|
|
|
[Bolin Lai](https://bolinlai.github.io/), [Xiaoliang Dai](https://sites.google.com/view/xiaoliangdai/), [Lawrence Chen](https://www.lawrencechen.me/), [Guan Pang](https://scholar.google.com/citations?user=7v1LZxUAAAAJ&hl=en), [James M. Rehg](https://rehg.org/), [Miao Liu](https://aptx4869lm.github.io/) |
|
|
|
|
|
<img src='https://bolinlai.github.io/Lego_EgoActGen/figures/visualization_new_actions.png'/> |
|
|
|
|
|
This repo is the model weights finetuned on Epic-Kitchens for our paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning". More repos are available in this [collection](https://huggingface.co/collections/bolinlai/lego-67b386cf642909c56776f754). |
|
|
|
|
|
Please refer to the code on [github](https://github.com/BolinLai/LEGO) for detailed instructions on how to use it. |
|
|
|
|
|
If you find LEGO useful for your work, please cite using this BibTeX. |
|
|
|
|
|
```BibTex |
|
|
@inproceedings{lai2024lego, |
|
|
title={Lego: Learning egocentric action frame generation via visual instruction tuning}, |
|
|
author={Lai, Bolin and Dai, Xiaoliang and Chen, Lawrence and Pang, Guan and Rehg, James M and Liu, Miao}, |
|
|
booktitle={European Conference on Computer Vision}, |
|
|
pages={135--155}, |
|
|
year={2024}, |
|
|
organization={Springer} |
|
|
} |
|
|
``` |