NullVoider commited on
Commit
69077d8
·
verified ·
1 Parent(s): 91bfbef

Update README.md

Browse files

Updated the model card with the details on the Video-to-Video generation pipeline.

Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -47,6 +47,12 @@ Our multi-stage pretraining pipeline, inspired by the <a href="https://huggingfa
47
  - **Stage 2: Image-to-Video Model Pretraining**: We convert the text-to-video model from Stage 1 into an image-to-video model by adjusting the conv-in parameters. This new model is then pretrained on the same dataset used in Stage 1.
48
  - **Stage 3: High-Quality Fine-Tuning**: We fine-tune the image-to-video model on a high-quality subset of the original dataset, ensuring superior performance and quality.
49
 
 
 
 
 
 
 
50
  ## 📦 Model Introduction
51
  | Model Name | Resolution | Video Length | FPS |
52
  |-----------------|------------|--------------|-----|
 
47
  - **Stage 2: Image-to-Video Model Pretraining**: We convert the text-to-video model from Stage 1 into an image-to-video model by adjusting the conv-in parameters. This new model is then pretrained on the same dataset used in Stage 1.
48
  - **Stage 3: High-Quality Fine-Tuning**: We fine-tune the image-to-video model on a high-quality subset of the original dataset, ensuring superior performance and quality.
49
 
50
+ ### 4. Video-to-Video Generation Pipeline
51
+
52
+ The V1 model is a hybrid architecture combining the HunyuanVideo model by Tencent and Stable Video Diffusion (SVD) by Stability AI. During inference, the model accepts a user prompt and an optional video input, which are processed before video generation. For Video-to-Video (V2V) generation, the system employs video interpolation techniques to extract frames from the input video. These frames are organized by timestamp and used as image inputs for the Stable Video Diffusion (SVD) model, alongside the user prompt, to generate the final video.
53
+
54
+ At the inference stage, the backend dynamically switches between the HunyuanVideo and Stable Video Diffusion (SVD) models based on the input file type. By default, V1 uses a fine-tuned version of the HunyuanVideo model. However, when a video file is detected in the user input, the system automatically switches to the Stable Video Diffusion (SVD) model, enabling a "Video-to-Video" generation workflow.
55
+
56
  ## 📦 Model Introduction
57
  | Model Name | Resolution | Video Length | FPS |
58
  |-----------------|------------|--------------|-----|