| | --- |
| | pipeline_tag: text-to-video |
| | license: cc-by-nc-4.0 |
| | --- |
| | # Info |
| | - base model [zeroscope_v2_576w](https://huggingface.co/cerspense/zeroscope_v2_576w). |
| | - Train on [potat1](https://huggingface.co/camenduru/potat1#dataset--config) [dataset](https://huggingface.co/camenduru/potat1_dataset) with 50000 steps for 10 hours on single GPU of 24GB VRAM. |
| |
|
| | - Thought result of training is not good. |
| |
|
| | # Install Windows |
| | ```batch |
| | git clone https://github.com/ExponentialML/Text-To-Video-Finetuning.git |
| | cd Text-To-Video-Finetuning |
| | git lfs install |
| | git clone https://huggingface.co/damo-vilab/text-to-video-ms-1.7b ./models/model_scope_diffusers/ |
| | py -m venv --clear venv && venv\Scripts\activate |
| | pip install -r requirements.txt |
| | pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 --no-cache --force-reinstall --version --isolated --ignore-installed |
| | |
| | git clone https://github.com/ExponentialML/Video-BLIP2-Preprocessor.git |
| | cd Video-BLIP2-Preprocessor |
| | pip install -r requirements.txt |
| | |
| | :: fix1 |
| | echo accelerate^>=0.20.3>>requirements-dev.txt |
| | pip install -r requirements-dev.txt |
| | |
| | :: Captation |
| | python preprocess.py --video_directory C:\Video-BLIP2-Preprocessor\videos --config_name "My Videos" --config_save_name "my_videos" |
| | |
| | :: Training |
| | venv\Scripts\activate && python train.py --config ./configs/v2/train_config.yaml |
| | |
| | :: inference 1024x576 |
| | python inference.py --model zeroscope_v2_576w_potat1\zeroscope_v2_576w-checkpoint-50000 --prompt "a fast moving fancy sports car" --fps 24 --num-frames 30 --window-size 12 --width 1024 --height 576 --sdp |
| | ``` |