Spaces:

haodongli
/

DVD

Running on Zero

App Files Files Community

DVD / train_script /train.MD

haodongli

init-1

4b35c4e 5 days ago

raw

history blame contribute delete

3.91 kB

	# 🔥 Training Guidelines for DVD

	This document provides a comprehensive guide to training DVD (Deterministic Video Depth).

	## 1. 📂 Key Files Overview

	Before starting, it is helpful to understand the core scripts involved in the training process:

	* `train_script/train_video_new.sh`,`examples/wanvideo/model_training/train_with_accelerate_video.py`: The main entry point for the training loop.
	* `examples/wanvideo/model_training/WanTrainingModule.py`: Handles training and validation logic. Please note that we only validate on a single window during training. Please consider using the inference script to perform more validation to save time if needed.
	* `examples/dataset`: Handles dataset (both train and val).
	* `train_config/normal_config/video_config_new.yaml`: Contains all hyperparameters, including learning rate, batch size, dataset config, and so on.
	* `diffsynth/pipelines/wan_video_new_determine.py`: The core model architecture.

	---

	## 2. 🗄️ Dataset Preparation



	As mentioned in our paper, DVD requires only 367K frames to unlock generative priors. We mainly [Hypersim](https://github.com/apple/ml-hypersim) (image), [TartanAir](https://theairlab.org/tartanair-dataset/) (video) and [Virtual KITTI](https://europe.naverlabs.com/proxy-virtual-worlds-vkitti-2/) (video,image) for training.

	Please download the raw datasets from their official websites and organize them as follows:

	```
	vkitti
	├── Scene01
	├── Scene02
	├── ...

	hypersim/
	├── test
	├── train
	└── val

	ttr
	├── abandonedfactory
	├── abandonedfactory_night
	├── amusement
	├── ...
	```


	---

	## 3. ⚙️ Configuration

	All training hyperparameters are centralized in `configs/train_config.yaml`.
	Key parameters you might want to adjust based on your hardware:

	* `batch_size`: Reduce this if you encounter Out-Of-Memory (OOM) errors.
	* `gradient_accumulation_steps`: Increase this to maintain the effective batch size if you reduce `batch_size`.
	* `use_gradient_checkpointing` : Set this to `True` if you are facing OOM errors.
	* `learning_rate`: Default is set to `1e-4`.
	* `{test/train}_{min/max}_num_frame`: The number of frames processed in one clip (default is e.g., 45-45).
	* `denoise_step`: The $\tau$ condition in our paper.
	* `grad_loss` , `grad_co`: The LMR and the $\lambda_{LMR}$ in our paper.
	* `lora_rank`: Set to 512 following [Lotus-2](https://lotus-2.github.io/).
	* `init_validate`: Whether to perform initial validation before training.
	* `log_step`: Interval for logging training state.
	* `prob`: The ratio for Hypersim(img), Virtual KITTI(img), TartanAir(vid), Virtual KITTI(vid).
	* `batch_size`: The batch for image. The default batch_size for video is 1.
	* `dataset settings`: Please refer to `examples/dataset` for more details.

	---

	## 4. 🚀 Launching the Training

	Make sure you have downloaded the base weights (e.g., Wan2.1) before starting (You will automatically download the weight if you are using the training script we provide).

	### Multi-GPU / Distributed Training (Recommended)
	We use `accelerate` for multi-GPU training. To train on 4 GPUs:
	```bash
	bash train_script/train_video_new.sh
	```
	You might also alter the files under `train_config/accelerate_config` to change the GPU configuration(e.g., for single GPU or DEEPSPEED).


	---
	## 4. 📊 Checkpoints


	### Resuming from a Checkpoint
	Checkpoints are saved automatically in the `output_path` directory every `validate_step`. If your training is interrupted, you can resume it by specifying the `training_state_dir` and setting `resume` and `load_optimizer` to `True`. Please also set `global_step` if possible. Then just simply rerun the training script:
	```bash
	bash train_script/train_video_new.sh
	```

	Please refer to [this line in the training script](../examples/wanvideo/model_training/train_with_accelerate_video.py#L474) for more details.