MSALab
/

LoomVideo

@@ -57,82 +57,39 @@ LoomVideo supports **four** unified video generation and editing tasks within a
 | **Instruction-Image Editing** | Video 🎬 + Image 🖼 + Text 📝 | Video 🎬 | Edit a video with a reference image as guidance |
 | **Multi-Image-to-Video** | Images 🖼 + Text 📝 | Video 🎬 | Compose multiple reference images into a coherent video |
-### 🎬 Text-to-Video
-<p align="center">
-  <img src="assets/results_1/t2v_demo.gif" width="480"/>
-</p>
-> **Prompt:** *Snow rocky mountains peaks canyon. Snow blanketed rocky mountains surround and shadow deep canyons. The canyons twist and bend through the high elevated mountain peaks.*
-<p align="center">
-  <img src="assets/results_2/t2v_demo.gif" width="480"/>
-</p>
-> **Prompt:** *Vampire makeup face of beautiful girl, red contact lenses.*
-### ✂️ Instruction Editing
-<p align="center">
-  <img src="assets/results_1/edit_input.gif" height="180"/>
-  <b> &nbsp; → &nbsp; </b>
-  <img src="assets/results_1/edit_demo.gif" height="180"/>
-</p>
-> **Prompt:** *Apply the Impressionist aesthetic to this video, ensuring seamless temporal consistency across all frames. The result should emulate the fluid brushstroke techniques and atmospheric focus of 19th-century Impressionist art, with each frame retaining the original motion, character actions, and camera movements.*
-<p align="center">
-  <img src="assets/results_2/edit_input.gif" height="180"/>
-  <b> &nbsp; → &nbsp; </b>
-  <img src="assets/results_2/edit_demo.gif" height="180"/>
-</p>
-> **Prompt:** *Replace the tree with a golden-leaved tree that shimmers softly, ensuring it maintains the same position and pose within the video scene.*
-### 🖼️ Instruction-Image Editing
-<p align="center">
-  <img src="assets/results_1/ref_edit_input.gif" height="180"/>
-  <img src="assets/results_1/ref_edit_reference.jpg" height="100"/>
-  <b> &nbsp; → &nbsp; </b>
-  <img src="assets/results_1/ref_edit_demo.gif" height="180"/>
-</p>
-> **Prompt:** *Replace the green t-shirt of the man with the suit in the image.*
-<p align="center">
-  <img src="assets/results_2/ref_edit_input.gif" height="180"/>
-  <img src="assets/results_2/ref_edit_reference.jpg" height="100"/>
-  <b> &nbsp; → &nbsp; </b>
-  <img src="assets/results_2/ref_edit_demo.gif" height="180"/>
-</p>
-> **Prompt:** *Replace the background with a Chinese ink painting, featuring a large golden mountain peak rising above swirling clouds, ensuring it appears in the same position and pose within the video scene.*
-### 🎞️ Multi-Image-to-Video
-<p align="center">
-  <img src="assets/results_1/mi2v_input_1.jpg" height="140"/>
-  <img src="assets/results_1/mi2v_input_2.jpg" height="140"/>
-  <img src="assets/results_1/mi2v_input_3.jpg" height="140"/>
-  <b> &nbsp; → &nbsp; </b>
-  <img src="assets/results_1/mi2v_demo.gif" height="180"/>
-</p>
-> **Prompt:** *The girl (@Image 2), wearing the denim jacket (@Image 3), black inner top, and black shorts, wearing sunglasses and carrying the handbag, walks down the street (@Image 1). Then, the girl (@Image 2) stops walking and turns her head to look to one side, followed by the girl (@Image 2) crossing her arms over her chest and striking a confident pose.*
-<p align="center">
-  <img src="assets/results_2/mi2v_input_1.jpg" height="140"/>
-  <img src="assets/results_2/mi2v_input_2.jpg" height="140"/>
-  <b> &nbsp; → &nbsp; </b>
-  <img src="assets/results_2/mi2v_demo.gif" height="180"/>
-</p>
-> **Prompt:** *The man wearing a Polo shirt (@Image 2), black casual pants, white sneakers, sunglasses, and a watch, striding forward on the lawn (@Image 1) with one hand in his pocket.*
-# 🔧 Preparation
 # 🎬 Inference
 LoomVideo provides a unified inference script that supports **four generation tasks** through a single entry point. Each task is selected via the `--task` flag.

 | **Instruction-Image Editing** | Video 🎬 + Image 🖼 + Text 📝 | Video 🎬 | Edit a video with a reference image as guidance |
 | **Multi-Image-to-Video** | Images 🖼 + Text 📝 | Video 🎬 | Compose multiple reference images into a coherent video |
+# 🔧 Preparation
+## Step 1: Clone the Repository
+```bash
+git clone TODO
+cd LoomVideo
+```
+## Step 2: Install Dependencies
+We recommend using [uv](https://github.com/astral-sh/uv) for a fast and fully reproducible environment setup.
+```bash
+uv sync
+source .venv/bin/activate
+# (Optional) Include evaluation dependencies
+uv sync --extra eval
+```
+Additionally, install [Flash Attention](https://github.com/Dao-AILab/flash-attention) for faster inference and reduced GPU memory consumption. (for reference, our environment uses v2.7.4)
+## Step 3: Download Model Weights
+Download the pretrained LoomVideo checkpoint from [Hugging Face](https://huggingface.co/MSALab/LoomVideo) and place it under `checkpoints/LoomVideo/`:
+```
+checkpoints/LoomVideo/
+└── gen_model.pth
+```
+You can also specify a custom path via the `--ckpt_path` argument at inference time.
 # 🎬 Inference
 LoomVideo provides a unified inference script that supports **four generation tasks** through a single entry point. Each task is selected via the `--task` flag.