Add pipeline_tag: image-to-video
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,13 +1,15 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
datasets:
|
| 4 |
-
- nkp37/OpenVid-1M
|
| 5 |
base_model:
|
| 6 |
- zai-org/CogVideoX-5b-I2V
|
| 7 |
- Doubiiu/DynamiCrafter_1024
|
| 8 |
- stabilityai/stable-video-diffusion-img2vid
|
|
|
|
|
|
|
| 9 |
library_name: diffusers
|
|
|
|
|
|
|
| 10 |
---
|
|
|
|
| 11 |
# π€ MotionRAG Model Checkpoints
|
| 12 |
|
| 13 |
<div align="left">
|
|
@@ -26,37 +28,37 @@ MotionRAG is a retrieval-augmented framework for image-to-video generation that
|
|
| 26 |
|
| 27 |
Our model checkpoints are organized into three key components for each base model:
|
| 28 |
|
| 29 |
-
1.
|
| 30 |
|
| 31 |
-
2.
|
| 32 |
|
| 33 |
-
3.
|
| 34 |
|
| 35 |
## π¦ Checkpoint Files
|
| 36 |
|
| 37 |
### MotionRAG Enhanced Models
|
| 38 |
|
| 39 |
-
| Model
|
| 40 |
-
|
| 41 |
-
| **CogVideoX**
|
| 42 |
-
| **CogVideoX**
|
| 43 |
-
| **CogVideoX**
|
| 44 |
-
| **CogVideoX**
|
| 45 |
-
| **DynamiCrafter** | Motion Projector
|
| 46 |
| **DynamiCrafter** | Motion Context Transformer | `checkpoints/DynamiCrafter/motion_transformer.ckpt` |
|
| 47 |
-
| **DynamiCrafter** | Motion-Adapter
|
| 48 |
-
| **SVD**
|
| 49 |
-
| **SVD**
|
| 50 |
-
| **SVD**
|
| 51 |
|
| 52 |
### Datasets
|
| 53 |
|
| 54 |
Our dataset differs from [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) datasets through curation and preprocessing. We use Llama3.1 to refine captions and extract motion-specific descriptions, which are stored in the `motion_caption` field. The data is then partitioned into non-overlapping training and test sets.
|
| 55 |
|
| 56 |
-
| Dataset
|
| 57 |
-
|
| 58 |
| **OpenVid-1M** | Large-scale video dataset for training | `datasets/OpenVid-1M/data/openvid-1m.parquet` |
|
| 59 |
-
| **OpenVid-1K** | Test set sampled from OpenVid-1M
|
| 60 |
|
| 61 |
## π Usage
|
| 62 |
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- zai-org/CogVideoX-5b-I2V
|
| 4 |
- Doubiiu/DynamiCrafter_1024
|
| 5 |
- stabilityai/stable-video-diffusion-img2vid
|
| 6 |
+
datasets:
|
| 7 |
+
- nkp37/OpenVid-1M
|
| 8 |
library_name: diffusers
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
pipeline_tag: image-to-video
|
| 11 |
---
|
| 12 |
+
|
| 13 |
# π€ MotionRAG Model Checkpoints
|
| 14 |
|
| 15 |
<div align="left">
|
|
|
|
| 28 |
|
| 29 |
Our model checkpoints are organized into three key components for each base model:
|
| 30 |
|
| 31 |
+
1. **Motion Projector (Resampler)**: Compresses high-dimensional motion features from the video encoder into compact token representations.
|
| 32 |
|
| 33 |
+
2. **Motion Context Transformer**: Adapts motion patterns through in-context learning using a causal transformer architecture.
|
| 34 |
|
| 35 |
+
3. **Motion-Adapter**: Injects the adapted motion features into the base image-to-video generation models.
|
| 36 |
|
| 37 |
## π¦ Checkpoint Files
|
| 38 |
|
| 39 |
### MotionRAG Enhanced Models
|
| 40 |
|
| 41 |
+
| Model | Component | File |
|
| 42 |
+
|:------------------|:---------------------------|:----------------------------------------------------|
|
| 43 |
+
| **CogVideoX** | CogVideoX-5B 17 frames | `checkpoints/CogVideoX/17_frames.ckpt` |
|
| 44 |
+
| **CogVideoX** | Motion Projector | `checkpoints/CogVideoX/motion_proj.ckpt` |
|
| 45 |
+
| **CogVideoX** | Motion Context Transformer | `checkpoints/CogVideoX/motion_transformer.ckpt` |
|
| 46 |
+
| **CogVideoX** | Motion-Adapter | `checkpoints/CogVideoX/Motion-Adapter.ckpt` |
|
| 47 |
+
| **DynamiCrafter** | Motion Projector | `checkpoints/DynamiCrafter/motion_proj.ckpt` |
|
| 48 |
| **DynamiCrafter** | Motion Context Transformer | `checkpoints/DynamiCrafter/motion_transformer.ckpt` |
|
| 49 |
+
| **DynamiCrafter** | Motion-Adapter | `checkpoints/DynamiCrafter/Motion-Adapter.ckpt` |
|
| 50 |
+
| **SVD** | Motion Projector | `checkpoints/SVD/motion_proj.ckpt` |
|
| 51 |
+
| **SVD** | Motion Context Transformer | `checkpoints/SVD/motion_transformer.ckpt` |
|
| 52 |
+
| **SVD** | Motion-Adapter | `checkpoints/SVD/Motion-Adapter.ckpt` |
|
| 53 |
|
| 54 |
### Datasets
|
| 55 |
|
| 56 |
Our dataset differs from [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) datasets through curation and preprocessing. We use Llama3.1 to refine captions and extract motion-specific descriptions, which are stored in the `motion_caption` field. The data is then partitioned into non-overlapping training and test sets.
|
| 57 |
|
| 58 |
+
| Dataset | Description | File |
|
| 59 |
+
|:----------------|:---------------------------------------|:----------------------------------------------|
|
| 60 |
| **OpenVid-1M** | Large-scale video dataset for training | `datasets/OpenVid-1M/data/openvid-1m.parquet` |
|
| 61 |
+
| **OpenVid-1K** | Test set sampled from OpenVid-1M | `datasets/OpenVid-1M/data/openvid-1k.parquet` |
|
| 62 |
|
| 63 |
## π Usage
|
| 64 |
|