Image-to-Video
Diffusers

Add pipeline_tag: image-to-video

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +22 -20
README.md CHANGED
@@ -1,13 +1,15 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - nkp37/OpenVid-1M
5
  base_model:
6
  - zai-org/CogVideoX-5b-I2V
7
  - Doubiiu/DynamiCrafter_1024
8
  - stabilityai/stable-video-diffusion-img2vid
 
 
9
  library_name: diffusers
 
 
10
  ---
 
11
  # πŸ€— MotionRAG Model Checkpoints
12
 
13
  <div align="left">
@@ -26,37 +28,37 @@ MotionRAG is a retrieval-augmented framework for image-to-video generation that
26
 
27
  Our model checkpoints are organized into three key components for each base model:
28
 
29
- 1. **Motion Projector (Resampler)**: Compresses high-dimensional motion features from the video encoder into compact token representations.
30
 
31
- 2. **Motion Context Transformer**: Adapts motion patterns through in-context learning using a causal transformer architecture.
32
 
33
- 3. **Motion-Adapter**: Injects the adapted motion features into the base image-to-video generation models.
34
 
35
  ## πŸ“¦ Checkpoint Files
36
 
37
  ### MotionRAG Enhanced Models
38
 
39
- | Model | Component | File |
40
- |-------------------|----------------------------|-----------------------------------------------------|
41
- | **CogVideoX** | CogVideoX-5B 17 frames | `checkpoints/CogVideoX/17_frames.ckpt` |
42
- | **CogVideoX** | Motion Projector | `checkpoints/CogVideoX/motion_proj.ckpt` |
43
- | **CogVideoX** | Motion Context Transformer | `checkpoints/CogVideoX/motion_transformer.ckpt` |
44
- | **CogVideoX** | Motion-Adapter | `checkpoints/CogVideoX/Motion-Adapter.ckpt` |
45
- | **DynamiCrafter** | Motion Projector | `checkpoints/DynamiCrafter/motion_proj.ckpt` |
46
  | **DynamiCrafter** | Motion Context Transformer | `checkpoints/DynamiCrafter/motion_transformer.ckpt` |
47
- | **DynamiCrafter** | Motion-Adapter | `checkpoints/DynamiCrafter/Motion-Adapter.ckpt` |
48
- | **SVD** | Motion Projector | `checkpoints/SVD/motion_proj.ckpt` |
49
- | **SVD** | Motion Context Transformer | `checkpoints/SVD/motion_transformer.ckpt` |
50
- | **SVD** | Motion-Adapter | `checkpoints/SVD/Motion-Adapter.ckpt` |
51
 
52
  ### Datasets
53
 
54
  Our dataset differs from [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) datasets through curation and preprocessing. We use Llama3.1 to refine captions and extract motion-specific descriptions, which are stored in the `motion_caption` field. The data is then partitioned into non-overlapping training and test sets.
55
 
56
- | Dataset | Description | File |
57
- |----------------|----------------------------------------|-----------------------------------------------|
58
  | **OpenVid-1M** | Large-scale video dataset for training | `datasets/OpenVid-1M/data/openvid-1m.parquet` |
59
- | **OpenVid-1K** | Test set sampled from OpenVid-1M | `datasets/OpenVid-1M/data/openvid-1k.parquet` |
60
 
61
  ## πŸš€ Usage
62
 
 
1
  ---
 
 
 
2
  base_model:
3
  - zai-org/CogVideoX-5b-I2V
4
  - Doubiiu/DynamiCrafter_1024
5
  - stabilityai/stable-video-diffusion-img2vid
6
+ datasets:
7
+ - nkp37/OpenVid-1M
8
  library_name: diffusers
9
+ license: apache-2.0
10
+ pipeline_tag: image-to-video
11
  ---
12
+
13
  # πŸ€— MotionRAG Model Checkpoints
14
 
15
  <div align="left">
 
28
 
29
  Our model checkpoints are organized into three key components for each base model:
30
 
31
+ 1. **Motion Projector (Resampler)**: Compresses high-dimensional motion features from the video encoder into compact token representations.
32
 
33
+ 2. **Motion Context Transformer**: Adapts motion patterns through in-context learning using a causal transformer architecture.
34
 
35
+ 3. **Motion-Adapter**: Injects the adapted motion features into the base image-to-video generation models.
36
 
37
  ## πŸ“¦ Checkpoint Files
38
 
39
  ### MotionRAG Enhanced Models
40
 
41
+ | Model | Component | File |
42
+ |:------------------|:---------------------------|:----------------------------------------------------|
43
+ | **CogVideoX** | CogVideoX-5B 17 frames | `checkpoints/CogVideoX/17_frames.ckpt` |
44
+ | **CogVideoX** | Motion Projector | `checkpoints/CogVideoX/motion_proj.ckpt` |
45
+ | **CogVideoX** | Motion Context Transformer | `checkpoints/CogVideoX/motion_transformer.ckpt` |
46
+ | **CogVideoX** | Motion-Adapter | `checkpoints/CogVideoX/Motion-Adapter.ckpt` |
47
+ | **DynamiCrafter** | Motion Projector | `checkpoints/DynamiCrafter/motion_proj.ckpt` |
48
  | **DynamiCrafter** | Motion Context Transformer | `checkpoints/DynamiCrafter/motion_transformer.ckpt` |
49
+ | **DynamiCrafter** | Motion-Adapter | `checkpoints/DynamiCrafter/Motion-Adapter.ckpt` |
50
+ | **SVD** | Motion Projector | `checkpoints/SVD/motion_proj.ckpt` |
51
+ | **SVD** | Motion Context Transformer | `checkpoints/SVD/motion_transformer.ckpt` |
52
+ | **SVD** | Motion-Adapter | `checkpoints/SVD/Motion-Adapter.ckpt` |
53
 
54
  ### Datasets
55
 
56
  Our dataset differs from [OpenVid-1M](https://huggingface.co/datasets/nkp37/OpenVid-1M) datasets through curation and preprocessing. We use Llama3.1 to refine captions and extract motion-specific descriptions, which are stored in the `motion_caption` field. The data is then partitioned into non-overlapping training and test sets.
57
 
58
+ | Dataset | Description | File |
59
+ |:----------------|:---------------------------------------|:----------------------------------------------|
60
  | **OpenVid-1M** | Large-scale video dataset for training | `datasets/OpenVid-1M/data/openvid-1m.parquet` |
61
+ | **OpenVid-1K** | Test set sampled from OpenVid-1M | `datasets/OpenVid-1M/data/openvid-1k.parquet` |
62
 
63
  ## πŸš€ Usage
64