Zenderos / SETUP.md

Upload 11 files

3d8856d verified 24 days ago

8.51 kB

	# TTV-1B Setup Guide

	Complete installation and setup instructions for the TTV-1B text-to-video model.

	## Prerequisites

	### Hardware Requirements

	#### Minimum (Inference Only)
	- GPU: 8GB VRAM (RTX 3070, RTX 4060 Ti)
	- RAM: 16GB
	- Storage: 50GB
	- OS: Ubuntu 20.04+, Windows 10+, macOS 12+

	#### Recommended (Training)
	- GPU: 24GB+ VRAM (RTX 4090, A5000, A100)
	- RAM: 64GB
	- Storage: 500GB SSD
	- OS: Ubuntu 22.04 LTS

	#### Production (Full Training)
	- GPU: 8× A100 80GB
	- RAM: 512GB
	- Storage: 2TB NVMe SSD
	- Network: High-speed interconnect for multi-GPU

	### Software Requirements

	- Python 3.9, 3.10, or 3.11
	- CUDA 11.8+ (for GPU acceleration)
	- cuDNN 8.6+
	- Git

	## Installation

	### Step 1: Clone Repository

	```bash
	git clone https://github.com/yourusername/ttv-1b.git
	cd ttv-1b
	```

	### Step 2: Create Virtual Environment

	```bash
	# Using venv
	python3 -m venv venv
	source venv/bin/activate # Linux/Mac
	# or
	venv\Scripts\activate # Windows

	# Using conda (alternative)
	conda create -n ttv1b python=3.10
	conda activate ttv1b
	```

	### Step 3: Install PyTorch

	Choose the appropriate command for your system from https://pytorch.org/get-started/locally/

	```bash
	# CUDA 11.8 (most common)
	pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

	# CUDA 12.1
	pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

	# CPU only (not recommended)
	pip install torch torchvision
	```

	### Step 4: Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	### Step 5: Verify Installation

	```bash
	python -c "import torch; print(f'PyTorch {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
	```

	Expected output:
	```
	PyTorch 2.1.0
	CUDA available: True
	```

	## Quick Start

	### Test the Model

	```bash
	# Run evaluation script to verify everything works
	python evaluate.py
	```

	This will:
	- Create the model
	- Count parameters (should be ~1.0B)
	- Test forward/backward passes
	- Measure inference speed
	- Check memory usage

	### Generate Your First Video (After Training)

	```bash
	python inference.py \
	--prompt "A beautiful sunset over mountains" \
	--checkpoint checkpoints/checkpoint_best.pt \
	--output my_first_video.mp4 \
	--steps 50
	```

	## Preparing Data

	### Data Format

	The model expects video-text pairs in the following format:

	```
	data/
	├── videos/
	│ ├── video_0001.mp4
	│ ├── video_0002.mp4
	│ └── ...
	└── annotations.json
	```

	annotations.json:
	```json
	{
	"video_0001": {
	"caption": "A cat playing with a ball of yarn",
	"duration": 2.0,
	"fps": 8
	},
	"video_0002": {
	"caption": "Sunset over the ocean with waves",
	"duration": 2.0,
	"fps": 8
	}
	}
	```

	### Video Specifications

	- Format: MP4, AVI, or MOV
	- Resolution: 256×256 (will be resized)
	- Frame rate: 8 FPS recommended
	- Duration: 2 seconds (16 frames at 8 FPS)
	- Codec: H.264 recommended

	### Converting Videos

	```bash
	# Using FFmpeg to convert videos
	ffmpeg -i input.mp4 -vf "scale=256:256,fps=8" -t 2 -c:v libx264 output.mp4
	```

	### Dataset Preparation Script

	```python
	import json
	from pathlib import Path

	def create_annotations(video_dir, output_file):
	"""Create annotations file from videos"""
	video_dir = Path(video_dir)
	annotations = {}

	for video_path in video_dir.glob("*.mp4"):
	video_id = video_path.stem
	annotations[video_id] = {
	"caption": f"Video {video_id}", # Add actual captions
	"duration": 2.0,
	"fps": 8
	}

	with open(output_file, 'w') as f:
	json.dump(annotations, f, indent=2)

	# Usage
	create_annotations("data/videos", "data/annotations.json")
	```

	## Training

	### Single GPU Training

	```bash
	python train.py
	```

	Configuration in train.py:
	```python
	config = {
	'batch_size': 2,
	'gradient_accumulation_steps': 8, # Effective batch size = 16
	'learning_rate': 1e-4,
	'num_epochs': 100,
	'mixed_precision': True,
	}
	```

	### Multi-GPU Training (Recommended)

	```bash
	# Using PyTorch DDP
	torchrun --nproc_per_node=8 train.py

	# Or using accelerate (better)
	accelerate config # First time setup
	accelerate launch train.py
	```

	### Monitoring Training

	```bash
	# Install tensorboard
	pip install tensorboard

	# Run tensorboard
	tensorboard --logdir=./checkpoints/logs
	```

	### Resume from Checkpoint

	```python
	# In train.py, add:
	trainer.load_checkpoint('checkpoints/checkpoint_step_10000.pt')
	trainer.train()
	```

	## Inference

	### Basic Inference

	```python
	from inference import generate_video_from_prompt

	video = generate_video_from_prompt(
	prompt="A serene lake with mountains",
	checkpoint_path="checkpoints/best.pt",
	output_path="output.mp4",
	num_steps=50,
	guidance_scale=7.5,
	seed=42 # For reproducibility
	)
	```

	### Batch Inference

	```python
	from inference import batch_generate

	prompts = [
	"A cat playing",
	"Ocean waves",
	"City at night"
	]

	batch_generate(
	prompts=prompts,
	checkpoint_path="checkpoints/best.pt",
	output_dir="./outputs",
	num_steps=50
	)
	```

	### Advanced Options

	```python
	# Lower guidance for more creative results
	video = generate_video_from_prompt(
	prompt="Abstract art in motion",
	guidance_scale=5.0, # Lower = more creative
	num_steps=100, # More steps = higher quality
	)

	# Fast generation (fewer steps)
	video = generate_video_from_prompt(
	prompt="Quick test",
	num_steps=20, # Faster but lower quality
	)
	```

	## Optimization Tips

	### Memory Optimization

	1. Reduce Batch Size
	```python
	config['batch_size'] = 1 # Minimum
	config['gradient_accumulation_steps'] = 16 # Maintain effective batch size
	```

	2. Enable Gradient Checkpointing
	```python
	config['gradient_checkpointing'] = True
	```

	3. Use Mixed Precision
	```python
	config['mixed_precision'] = True # Always recommended
	```

	### Speed Optimization

	1. Use Torch Compile (PyTorch 2.0+)
	```python
	model = torch.compile(model)
	```

	2. Enable cuDNN Benchmarking
	```python
	torch.backends.cudnn.benchmark = True
	```

	3. Pin Memory
	```python
	DataLoader(..., pin_memory=True)
	```

	## Troubleshooting

	### CUDA Out of Memory

	```bash
	# Reduce batch size
	config['batch_size'] = 1

	# Enable gradient checkpointing
	config['gradient_checkpointing'] = True

	# Clear cache
	torch.cuda.empty_cache()
	```

	### Slow Training

	```bash
	# Check GPU utilization
	nvidia-smi

	# Increase num_workers
	DataLoader(..., num_workers=8)

	# Enable mixed precision
	config['mixed_precision'] = True
	```

	### NaN Loss

	```python
	# Reduce learning rate
	config['learning_rate'] = 5e-5

	# Enable gradient clipping (already included)
	torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

	# Check for NaN in data
	assert not torch.isnan(videos).any()
	```

	### Model Not Learning

	```python
	# Increase learning rate
	config['learning_rate'] = 2e-4

	# Check data quality
	# Verify annotations are correct
	# Ensure videos are properly normalized

	# Reduce regularization
	config['weight_decay'] = 0.001 # Lower weight decay
	```

	## Performance Benchmarks

	### Training Speed (A100 80GB)

	\| Batch Size \| Grad Accum \| Eff. Batch \| Sec/Batch \| Hours/100K steps \|
	\|------------\|------------\|------------\|-----------\|------------------\|
	\| 1 \| 16 \| 16 \| 2.5 \| 69 \|
	\| 2 \| 8 \| 16 \| 2.5 \| 69 \|
	\| 4 \| 4 \| 16 \| 2.7 \| 75 \|

	### Inference Speed

	\| GPU \| FP16 \| Steps \| Time/Video \|
	\|-----\|------\|-------\|------------\|
	\| A100 80GB \| Yes \| 50 \| 15s \|
	\| RTX 4090 \| Yes \| 50 \| 25s \|
	\| RTX 3090 \| Yes \| 50 \| 35s \|

	### Memory Usage

	\| Operation \| Batch Size \| Memory (GB) \|
	\|-----------\|------------\|-------------\|
	\| Inference \| 1 \| 6 \|
	\| Training \| 1 \| 12 \|
	\| Training \| 2 \| 24 \|
	\| Training \| 4 \| 48 \|

	## Next Steps

	1. Prepare your dataset - Collect and annotate videos
	2. Start training - Begin with small dataset to verify
	3. Monitor progress - Check loss, sample generations
	4. Fine-tune - Adjust hyperparameters based on results
	5. Evaluate - Test on held-out validation set
	6. Deploy - Use for inference on new prompts

	## Getting Help

	- GitHub Issues: Report bugs and ask questions
	- Documentation: Check README.md and ARCHITECTURE.md
	- Examples: See example scripts in the repository

	## Additional Resources

	- [PyTorch Documentation](https://pytorch.org/docs/)
	- [Diffusion Models Explained](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/)
	- [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
	- [DiT Paper](https://arxiv.org/abs/2212.09748)