Update README.md
Browse filesUpdated the model card with the details on the Video-to-Video generation pipeline.
README.md
CHANGED
|
@@ -47,6 +47,12 @@ Our multi-stage pretraining pipeline, inspired by the <a href="https://huggingfa
|
|
| 47 |
- **Stage 2: Image-to-Video Model Pretraining**: We convert the text-to-video model from Stage 1 into an image-to-video model by adjusting the conv-in parameters. This new model is then pretrained on the same dataset used in Stage 1.
|
| 48 |
- **Stage 3: High-Quality Fine-Tuning**: We fine-tune the image-to-video model on a high-quality subset of the original dataset, ensuring superior performance and quality.
|
| 49 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
## 📦 Model Introduction
|
| 51 |
| Model Name | Resolution | Video Length | FPS |
|
| 52 |
|-----------------|------------|--------------|-----|
|
|
|
|
| 47 |
- **Stage 2: Image-to-Video Model Pretraining**: We convert the text-to-video model from Stage 1 into an image-to-video model by adjusting the conv-in parameters. This new model is then pretrained on the same dataset used in Stage 1.
|
| 48 |
- **Stage 3: High-Quality Fine-Tuning**: We fine-tune the image-to-video model on a high-quality subset of the original dataset, ensuring superior performance and quality.
|
| 49 |
|
| 50 |
+
### 4. Video-to-Video Generation Pipeline
|
| 51 |
+
|
| 52 |
+
The V1 model is a hybrid architecture combining the HunyuanVideo model by Tencent and Stable Video Diffusion (SVD) by Stability AI. During inference, the model accepts a user prompt and an optional video input, which are processed before video generation. For Video-to-Video (V2V) generation, the system employs video interpolation techniques to extract frames from the input video. These frames are organized by timestamp and used as image inputs for the Stable Video Diffusion (SVD) model, alongside the user prompt, to generate the final video.
|
| 53 |
+
|
| 54 |
+
At the inference stage, the backend dynamically switches between the HunyuanVideo and Stable Video Diffusion (SVD) models based on the input file type. By default, V1 uses a fine-tuned version of the HunyuanVideo model. However, when a video file is detected in the user input, the system automatically switches to the Stable Video Diffusion (SVD) model, enabling a "Video-to-Video" generation workflow.
|
| 55 |
+
|
| 56 |
## 📦 Model Introduction
|
| 57 |
| Model Name | Resolution | Video Length | FPS |
|
| 58 |
|-----------------|------------|--------------|-----|
|