base_model:
- VideoCrafter/VideoCrafter2
datasets:
- nkp37/OpenVid-1M
- TempoFunk/webvid-10M
license: gpl-3.0
pipeline_tag: text-to-video
library_name: diffusers
Advanced text-to-video Diffusion Models
This repository contains the model from the paper AMD-Hummingbird: Towards an Efficient Text-to-Video Model. Hummingbird is a lightweight text-to-video (T2V) framework that prunes existing models (such as VideoCrafter2) and enhances visual quality through visual feedback learning. It aims to improve the efficiency of T2V generation, making it more suitable for deployment on resource-limited devices while preserving high-quality video generation.
Table of Contents
- Advanced text-to-video Diffusion Models
- Key Features
- 8-Steps Results
- Checkpoint
- Installation
- Data Processing
- Training
- Inference
- License
Key Features
⚡️ This repository provides training recipes for the AMD efficient text-to-video models, which are designed for high performance and efficiency. The training process includes two key steps:
Distillation and Pruning: We distill and prune the popular text-to-video model VideoCrafter2, reducing the parameters to a compact 945M while maintaining competitive performance.
Optimization with T2V-Turbo: We apply the T2V-Turbo method on the distilled model to reduce inference steps and further enhance model quality.
This implementation is released to promote further research and innovation in the field of efficient text-to-video generation, optimized for AMD Instinct accelerators.
8-Steps Results
Checkpoint
Our pretrained checkpoint can be downloaded from HuggingFace
Installation
We train both 0.9B and 0.7 T2V models on MI250 and evalute them on MI250, MI300, RTX7900xt and RadeonTM 880M RyzenTM AI 9 365 Ubuntu 6.8.0-51-generic.
conda
conda create -n AMD_Hummingbird python=3.10
conda activate AMD_Hummingbird
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/rocm6.1
pip install -r requirements.txt
For rocm flash-attn, you can install it by this link.
git clone https://github.com/ROCm/flash-attention.git
cd flash-attention
python setup.py install
It will take about 1.5 hours to install.
docker
First, you should use docker pull to download the image.
docker pull rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
Second, you can use docker run to run the image, for example:
docker run \
-v "$(pwd):/workspace" \
--device=/dev/kfd \
--device=/dev/dri \
-it \
--network=host \
--name hummingbird \
rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
When you in the container, you can use pip to install other dependencies:
pip install -r requirements.txt
Data Processing
VQA
cd data_pre_process/DOVER
sh run.sh
Then you can get a score table for all video qualities, sort according to the table, and remove low-scoring videos.
Remove Dolly Zoom Videos
cd data_pre_process/VBench
sh run.sh
According to the motion smoothness score csv file, you can remove low-scoring videos.
Training
Model Distillation
sh configs/training_512_t2v_v1.0/run_distill.sh
Acceleration Training
cd acceleration/t2v-turbo
# for 0.7 B model
sh train_07B.sh
# for 0.9 B model
sh train_09B.sh
Inference
# for 0.7B model
python inference_command_config_07B.py
# for 0.9B model
python inference_command_config_09B.py
License
Copyright (c) 2024 Advanced Micro Devices, Inc. All Rights Reserved.











