| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - stable-diffusion-v1-5/stable-diffusion-v1-5 |
| | - liuhaotian/llava-llama-2-13b-chat-lightning-preview |
| | tags: |
| | - Image-to-Image |
| | - Action-Generation |
| | - HOI |
| | - Egocentric-Vision |
| | - Vision-Language-Model |
| | --- |
| | |
| | # LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning |
| |
|
| | ### ECCV 2024 (Oral, Best Paper Finalist) |
| |
|
| | [Project Page](https://bolinlai.github.io/Lego_EgoActGen/) | [Paper](https://arxiv.org/pdf/2312.03849) | [Dataset](https://huggingface.co/datasets/bolinlai/LEGO-Dataset) | [Code](https://github.com/BolinLai/LEGO) |
| |
|
| | [Bolin Lai](https://bolinlai.github.io/), [Xiaoliang Dai](https://sites.google.com/view/xiaoliangdai/), [Lawrence Chen](https://www.lawrencechen.me/), [Guan Pang](https://scholar.google.com/citations?user=7v1LZxUAAAAJ&hl=en), [James M. Rehg](https://rehg.org/), [Miao Liu](https://aptx4869lm.github.io/) |
| |
|
| | <img src='https://bolinlai.github.io/Lego_EgoActGen/figures/visualization_new_actions.png'/> |
| |
|
| | This repo is the model weights finetuned on Ego4D for our paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning". |
| |
|
| | Please refer to the code on [github](https://github.com/BolinLai/LEGO) for detailed instructions on how to use it. More repos are available in this [collection](https://huggingface.co/collections/bolinlai/lego-67b386cf642909c56776f754). |
| |
|
| | If you find LEGO useful for your work, please cite using this BibTeX. |
| |
|
| | ```BibTex |
| | @inproceedings{lai2024lego, |
| | title={Lego: Learning egocentric action frame generation via visual instruction tuning}, |
| | author={Lai, Bolin and Dai, Xiaoliang and Chen, Lawrence and Pang, Guan and Rehg, James M and Liu, Miao}, |
| | booktitle={European Conference on Computer Vision}, |
| | pages={135--155}, |
| | year={2024}, |
| | organization={Springer} |
| | } |
| | ``` |