MCG-NJU
/

MotionRAG

Image-to-Video

Diffusers

Model card Files Files and versions

xet

Community

Add pipeline_tag: image-to-video

by nielsr HF Staff - opened Oct 2, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+22

-20

Files changed (1) hide show

README.md +22 -20

README.md CHANGED Viewed

@@ -1,13 +1,15 @@
 ---
-license: apache-2.0
-datasets:
-- nkp37/OpenVid-1M
 base_model:
 - zai-org/CogVideoX-5b-I2V
 - Doubiiu/DynamiCrafter_1024
 - stabilityai/stable-video-diffusion-img2vid
 library_name: diffusers
 ---
 # 🤗 MotionRAG Model Checkpoints
 <div align="left">
@@ -26,37 +28,37 @@ MotionRAG is a retrieval-augmented framework for image-to-video generation that
 Our model checkpoints are organized into three key components for each base model:
-1. **Motion Projector (Resampler)**: Compresses high-dimensional motion features from the video encoder into compact token representations.
-2. **Motion Context Transformer**: Adapts motion patterns through in-context learning using a causal transformer architecture.
-3. **Motion-Adapter**: Injects the adapted motion features into the base image-to-video generation models.
 ## 📦 Checkpoint Files
 ### MotionRAG Enhanced Models
-| Model             | Component                  | File                                                |
-|-------------------|----------------------------|-----------------------------------------------------|
-| **CogVideoX**     | CogVideoX-5B 17 frames     | `checkpoints/CogVideoX/17_frames.ckpt`              |
-| **CogVideoX**     | Motion Projector           | `checkpoints/CogVideoX/motion_proj.ckpt`            |
-| **CogVideoX**     | Motion Context Transformer | `checkpoints/CogVideoX/motion_transformer.ckpt`     |
-| **CogVideoX**     | Motion-Adapter             | `checkpoints/CogVideoX/Motion-Adapter.ckpt`         |
-| **DynamiCrafter** | Motion Projector           | `checkpoints/DynamiCrafter/motion_proj.ckpt`        |
 | **DynamiCrafter** | Motion Context Transformer | `checkpoints/DynamiCrafter/motion_transformer.ckpt` |
-| **DynamiCrafter** | Motion-Adapter             | `checkpoints/DynamiCrafter/Motion-Adapter.ckpt`     |
-| **SVD**           | Motion Projector           | `checkpoints/SVD/motion_proj.ckpt`                  |
-| **SVD**           | Motion Context Transformer | `checkpoints/SVD/motion_transformer.ckpt`           |
-| **SVD**           | Motion-Adapter             | `checkpoints/SVD/Motion-Adapter.ckpt`               |
 ### Datasets
 Our dataset differs from [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) datasets through curation and preprocessing. We use Llama3.1 to refine captions and extract motion-specific descriptions, which are stored in the `motion_caption` field. The data is then partitioned into non-overlapping training and test sets.
-| Dataset        | Description                            | File                                          |
-|----------------|----------------------------------------|-----------------------------------------------|
 | **OpenVid-1M** | Large-scale video dataset for training | `datasets/OpenVid-1M/data/openvid-1m.parquet` |
-| **OpenVid-1K** | Test set sampled from OpenVid-1M       | `datasets/OpenVid-1M/data/openvid-1k.parquet` |
 ## 🚀 Usage

 ---
 base_model:
 - zai-org/CogVideoX-5b-I2V
 - Doubiiu/DynamiCrafter_1024
 - stabilityai/stable-video-diffusion-img2vid
+datasets:
+- nkp37/OpenVid-1M
 library_name: diffusers
+license: apache-2.0
+pipeline_tag: image-to-video
 ---
 # 🤗 MotionRAG Model Checkpoints
 <div align="left">
 Our model checkpoints are organized into three key components for each base model:
+1.  **Motion Projector (Resampler)**: Compresses high-dimensional motion features from the video encoder into compact token representations.
+2.  **Motion Context Transformer**: Adapts motion patterns through in-context learning using a causal transformer architecture.
+3.  **Motion-Adapter**: Injects the adapted motion features into the base image-to-video generation models.
 ## 📦 Checkpoint Files
 ### MotionRAG Enhanced Models
+| Model | Component | File |
+|:------------------|:---------------------------|:----------------------------------------------------|
+| **CogVideoX** | CogVideoX-5B 17 frames | `checkpoints/CogVideoX/17_frames.ckpt` |
+| **CogVideoX** | Motion Projector | `checkpoints/CogVideoX/motion_proj.ckpt` |
+| **CogVideoX** | Motion Context Transformer | `checkpoints/CogVideoX/motion_transformer.ckpt` |
+| **CogVideoX** | Motion-Adapter | `checkpoints/CogVideoX/Motion-Adapter.ckpt` |
+| **DynamiCrafter** | Motion Projector | `checkpoints/DynamiCrafter/motion_proj.ckpt` |
 | **DynamiCrafter** | Motion Context Transformer | `checkpoints/DynamiCrafter/motion_transformer.ckpt` |
+| **DynamiCrafter** | Motion-Adapter | `checkpoints/DynamiCrafter/Motion-Adapter.ckpt` |
+| **SVD** | Motion Projector | `checkpoints/SVD/motion_proj.ckpt` |
+| **SVD** | Motion Context Transformer | `checkpoints/SVD/motion_transformer.ckpt` |
+| **SVD** | Motion-Adapter | `checkpoints/SVD/Motion-Adapter.ckpt` |
 ### Datasets
 Our dataset differs from [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) datasets through curation and preprocessing. We use Llama3.1 to refine captions and extract motion-specific descriptions, which are stored in the `motion_caption` field. The data is then partitioned into non-overlapping training and test sets.
+| Dataset | Description | File |
+|:----------------|:---------------------------------------|:----------------------------------------------|
 | **OpenVid-1M** | Large-scale video dataset for training | `datasets/OpenVid-1M/data/openvid-1m.parquet` |
+| **OpenVid-1K** | Test set sampled from OpenVid-1M | `datasets/OpenVid-1M/data/openvid-1k.parquet` |
 ## 🚀 Usage