MSALab commited on
Commit
97a90b9
Β·
verified Β·
1 Parent(s): 95ebdfd

Update README

Browse files
Files changed (1) hide show
  1. README.md +22 -65
README.md CHANGED
@@ -57,82 +57,39 @@ LoomVideo supports **four** unified video generation and editing tasks within a
57
  | **Instruction-Image Editing** | Video 🎬 + Image πŸ–Ό + Text πŸ“ | Video 🎬 | Edit a video with a reference image as guidance |
58
  | **Multi-Image-to-Video** | Images πŸ–Ό + Text πŸ“ | Video 🎬 | Compose multiple reference images into a coherent video |
59
 
60
- ### 🎬 Text-to-Video
61
-
62
- <p align="center">
63
- <img src="assets/results_1/t2v_demo.gif" width="480"/>
64
- </p>
65
-
66
- > **Prompt:** *Snow rocky mountains peaks canyon. Snow blanketed rocky mountains surround and shadow deep canyons. The canyons twist and bend through the high elevated mountain peaks.*
67
-
68
- <p align="center">
69
- <img src="assets/results_2/t2v_demo.gif" width="480"/>
70
- </p>
71
-
72
- > **Prompt:** *Vampire makeup face of beautiful girl, red contact lenses.*
73
-
74
- ### βœ‚οΈ Instruction Editing
75
-
76
- <p align="center">
77
- <img src="assets/results_1/edit_input.gif" height="180"/>
78
- <b> &nbsp; β†’ &nbsp; </b>
79
- <img src="assets/results_1/edit_demo.gif" height="180"/>
80
- </p>
81
-
82
- > **Prompt:** *Apply the Impressionist aesthetic to this video, ensuring seamless temporal consistency across all frames. The result should emulate the fluid brushstroke techniques and atmospheric focus of 19th-century Impressionist art, with each frame retaining the original motion, character actions, and camera movements.*
83
-
84
- <p align="center">
85
- <img src="assets/results_2/edit_input.gif" height="180"/>
86
- <b> &nbsp; β†’ &nbsp; </b>
87
- <img src="assets/results_2/edit_demo.gif" height="180"/>
88
- </p>
89
-
90
- > **Prompt:** *Replace the tree with a golden-leaved tree that shimmers softly, ensuring it maintains the same position and pose within the video scene.*
91
-
92
- ### πŸ–ΌοΈ Instruction-Image Editing
93
-
94
- <p align="center">
95
- <img src="assets/results_1/ref_edit_input.gif" height="180"/>
96
- <img src="assets/results_1/ref_edit_reference.jpg" height="100"/>
97
- <b> &nbsp; β†’ &nbsp; </b>
98
- <img src="assets/results_1/ref_edit_demo.gif" height="180"/>
99
- </p>
100
 
101
- > **Prompt:** *Replace the green t-shirt of the man with the suit in the image.*
102
 
103
- <p align="center">
104
- <img src="assets/results_2/ref_edit_input.gif" height="180"/>
105
- <img src="assets/results_2/ref_edit_reference.jpg" height="100"/>
106
- <b> &nbsp; β†’ &nbsp; </b>
107
- <img src="assets/results_2/ref_edit_demo.gif" height="180"/>
108
- </p>
109
 
110
- > **Prompt:** *Replace the background with a Chinese ink painting, featuring a large golden mountain peak rising above swirling clouds, ensuring it appears in the same position and pose within the video scene.*
111
 
112
- ### 🎞️ Multi-Image-to-Video
113
 
114
- <p align="center">
115
- <img src="assets/results_1/mi2v_input_1.jpg" height="140"/>
116
- <img src="assets/results_1/mi2v_input_2.jpg" height="140"/>
117
- <img src="assets/results_1/mi2v_input_3.jpg" height="140"/>
118
- <b> &nbsp; β†’ &nbsp; </b>
119
- <img src="assets/results_1/mi2v_demo.gif" height="180"/>
120
- </p>
121
 
122
- > **Prompt:** *The girl (@Image 2), wearing the denim jacket (@Image 3), black inner top, and black shorts, wearing sunglasses and carrying the handbag, walks down the street (@Image 1). Then, the girl (@Image 2) stops walking and turns her head to look to one side, followed by the girl (@Image 2) crossing her arms over her chest and striking a confident pose.*
 
 
123
 
124
- <p align="center">
125
- <img src="assets/results_2/mi2v_input_1.jpg" height="140"/>
126
- <img src="assets/results_2/mi2v_input_2.jpg" height="140"/>
127
- <b> &nbsp; β†’ &nbsp; </b>
128
- <img src="assets/results_2/mi2v_demo.gif" height="180"/>
129
- </p>
130
 
131
- > **Prompt:** *The man wearing a Polo shirt (@Image 2), black casual pants, white sneakers, sunglasses, and a watch, striding forward on the lawn (@Image 1) with one hand in his pocket.*
132
 
 
133
 
134
- # πŸ”§ Preparation
 
 
 
135
 
 
136
 
137
  # 🎬 Inference
138
  LoomVideo provides a unified inference script that supports **four generation tasks** through a single entry point. Each task is selected via the `--task` flag.
 
57
  | **Instruction-Image Editing** | Video 🎬 + Image πŸ–Ό + Text πŸ“ | Video 🎬 | Edit a video with a reference image as guidance |
58
  | **Multi-Image-to-Video** | Images πŸ–Ό + Text πŸ“ | Video 🎬 | Compose multiple reference images into a coherent video |
59
 
60
+ # πŸ”§ Preparation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
+ ## Step 1: Clone the Repository
63
 
64
+ ```bash
65
+ git clone TODO
66
+ cd LoomVideo
67
+ ```
 
 
68
 
69
+ ## Step 2: Install Dependencies
70
 
71
+ We recommend using [uv](https://github.com/astral-sh/uv) for a fast and fully reproducible environment setup.
72
 
73
+ ```bash
74
+ uv sync
75
+ source .venv/bin/activate
 
 
 
 
76
 
77
+ # (Optional) Include evaluation dependencies
78
+ uv sync --extra eval
79
+ ```
80
 
81
+ Additionally, install [Flash Attention](https://github.com/Dao-AILab/flash-attention) for faster inference and reduced GPU memory consumption. (for reference, our environment uses v2.7.4)
 
 
 
 
 
82
 
83
+ ## Step 3: Download Model Weights
84
 
85
+ Download the pretrained LoomVideo checkpoint from [Hugging Face](https://huggingface.co/MSALab/LoomVideo) and place it under `checkpoints/LoomVideo/`:
86
 
87
+ ```
88
+ checkpoints/LoomVideo/
89
+ └── gen_model.pth
90
+ ```
91
 
92
+ You can also specify a custom path via the `--ckpt_path` argument at inference time.
93
 
94
  # 🎬 Inference
95
  LoomVideo provides a unified inference script that supports **four generation tasks** through a single entry point. Each task is selected via the `--task` flag.