File size: 8,514 Bytes

3d8856d

# TTV-1B Setup Guide

Complete installation and setup instructions for the TTV-1B text-to-video model.

## Prerequisites

### Hardware Requirements

#### Minimum (Inference Only)
- GPU: 8GB VRAM (RTX 3070, RTX 4060 Ti)
- RAM: 16GB
- Storage: 50GB
- OS: Ubuntu 20.04+, Windows 10+, macOS 12+

#### Recommended (Training)
- GPU: 24GB+ VRAM (RTX 4090, A5000, A100)
- RAM: 64GB
- Storage: 500GB SSD
- OS: Ubuntu 22.04 LTS

#### Production (Full Training)
- GPU: 8× A100 80GB
- RAM: 512GB
- Storage: 2TB NVMe SSD
- Network: High-speed interconnect for multi-GPU

### Software Requirements

- Python 3.9, 3.10, or 3.11
- CUDA 11.8+ (for GPU acceleration)
- cuDNN 8.6+
- Git

## Installation

### Step 1: Clone Repository

```bash
git clone https://github.com/yourusername/ttv-1b.git
cd ttv-1b
```

### Step 2: Create Virtual Environment

```bash
# Using venv
python3 -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# Using conda (alternative)
conda create -n ttv1b python=3.10
conda activate ttv1b
```

### Step 3: Install PyTorch

Choose the appropriate command for your system from https://pytorch.org/get-started/locally/

```bash
# CUDA 11.8 (most common)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# CPU only (not recommended)
pip install torch torchvision
```

### Step 4: Install Dependencies

```bash
pip install -r requirements.txt
```

### Step 5: Verify Installation

```bash
python -c "import torch; print(f'PyTorch {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
```

Expected output:
```
PyTorch 2.1.0
CUDA available: True
```

## Quick Start

### Test the Model

```bash
# Run evaluation script to verify everything works
python evaluate.py
```

This will:
- Create the model
- Count parameters (should be ~1.0B)
- Test forward/backward passes
- Measure inference speed
- Check memory usage

### Generate Your First Video (After Training)

```bash
python inference.py \
    --prompt "A beautiful sunset over mountains" \
    --checkpoint checkpoints/checkpoint_best.pt \
    --output my_first_video.mp4 \
    --steps 50
```

## Preparing Data

### Data Format

The model expects video-text pairs in the following format:

```
data/
├── videos/
│   ├── video_0001.mp4
│   ├── video_0002.mp4
│   └── ...
└── annotations.json
```

annotations.json:
```json
{
  "video_0001": {
    "caption": "A cat playing with a ball of yarn",
    "duration": 2.0,
    "fps": 8
  },
  "video_0002": {
    "caption": "Sunset over the ocean with waves",
    "duration": 2.0,
    "fps": 8
  }
}
```

### Video Specifications

- Format: MP4, AVI, or MOV
- Resolution: 256×256 (will be resized)
- Frame rate: 8 FPS recommended
- Duration: 2 seconds (16 frames at 8 FPS)
- Codec: H.264 recommended

### Converting Videos

```bash
# Using FFmpeg to convert videos
ffmpeg -i input.mp4 -vf "scale=256:256,fps=8" -t 2 -c:v libx264 output.mp4
```

### Dataset Preparation Script

```python
import json
from pathlib import Path

def create_annotations(video_dir, output_file):
    """Create annotations file from videos"""
    video_dir = Path(video_dir)
    annotations = {}
    
    for video_path in video_dir.glob("*.mp4"):
        video_id = video_path.stem
        annotations[video_id] = {
            "caption": f"Video {video_id}",  # Add actual captions
            "duration": 2.0,
            "fps": 8
        }
    
    with open(output_file, 'w') as f:
        json.dump(annotations, f, indent=2)

# Usage
create_annotations("data/videos", "data/annotations.json")
```

## Training

### Single GPU Training

```bash
python train.py
```

Configuration in train.py:
```python
config = {
    'batch_size': 2,
    'gradient_accumulation_steps': 8,  # Effective batch size = 16
    'learning_rate': 1e-4,
    'num_epochs': 100,
    'mixed_precision': True,
}
```

### Multi-GPU Training (Recommended)

```bash
# Using PyTorch DDP
torchrun --nproc_per_node=8 train.py

# Or using accelerate (better)
accelerate config  # First time setup
accelerate launch train.py
```

### Monitoring Training

```bash
# Install tensorboard
pip install tensorboard

# Run tensorboard
tensorboard --logdir=./checkpoints/logs
```

### Resume from Checkpoint

```python
# In train.py, add:
trainer.load_checkpoint('checkpoints/checkpoint_step_10000.pt')
trainer.train()
```

## Inference

### Basic Inference

```python
from inference import generate_video_from_prompt

video = generate_video_from_prompt(
    prompt="A serene lake with mountains",
    checkpoint_path="checkpoints/best.pt",
    output_path="output.mp4",
    num_steps=50,
    guidance_scale=7.5,
    seed=42  # For reproducibility
)
```

### Batch Inference

```python
from inference import batch_generate

prompts = [
    "A cat playing",
    "Ocean waves",
    "City at night"
]

batch_generate(
    prompts=prompts,
    checkpoint_path="checkpoints/best.pt",
    output_dir="./outputs",
    num_steps=50
)
```

### Advanced Options

```python
# Lower guidance for more creative results
video = generate_video_from_prompt(
    prompt="Abstract art in motion",
    guidance_scale=5.0,  # Lower = more creative
    num_steps=100,        # More steps = higher quality
)

# Fast generation (fewer steps)
video = generate_video_from_prompt(
    prompt="Quick test",
    num_steps=20,  # Faster but lower quality
)
```

## Optimization Tips

### Memory Optimization

1. **Reduce Batch Size**
```python
config['batch_size'] = 1  # Minimum
config['gradient_accumulation_steps'] = 16  # Maintain effective batch size
```

2. **Enable Gradient Checkpointing**
```python
config['gradient_checkpointing'] = True
```

3. **Use Mixed Precision**
```python
config['mixed_precision'] = True  # Always recommended
```

### Speed Optimization

1. **Use Torch Compile** (PyTorch 2.0+)
```python
model = torch.compile(model)
```

2. **Enable cuDNN Benchmarking**
```python
torch.backends.cudnn.benchmark = True
```

3. **Pin Memory**
```python
DataLoader(..., pin_memory=True)
```

## Troubleshooting

### CUDA Out of Memory

```bash
# Reduce batch size
config['batch_size'] = 1

# Enable gradient checkpointing
config['gradient_checkpointing'] = True

# Clear cache
torch.cuda.empty_cache()
```

### Slow Training

```bash
# Check GPU utilization
nvidia-smi

# Increase num_workers
DataLoader(..., num_workers=8)

# Enable mixed precision
config['mixed_precision'] = True
```

### NaN Loss

```python
# Reduce learning rate
config['learning_rate'] = 5e-5

# Enable gradient clipping (already included)
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

# Check for NaN in data
assert not torch.isnan(videos).any()
```

### Model Not Learning

```python
# Increase learning rate
config['learning_rate'] = 2e-4

# Check data quality
# Verify annotations are correct
# Ensure videos are properly normalized

# Reduce regularization
config['weight_decay'] = 0.001  # Lower weight decay
```

## Performance Benchmarks

### Training Speed (A100 80GB)

| Batch Size | Grad Accum | Eff. Batch | Sec/Batch | Hours/100K steps |
|------------|------------|------------|-----------|------------------|
| 1 | 16 | 16 | 2.5 | 69 |
| 2 | 8 | 16 | 2.5 | 69 |
| 4 | 4 | 16 | 2.7 | 75 |

### Inference Speed

| GPU | FP16 | Steps | Time/Video |
|-----|------|-------|------------|
| A100 80GB | Yes | 50 | 15s |
| RTX 4090 | Yes | 50 | 25s |
| RTX 3090 | Yes | 50 | 35s |

### Memory Usage

| Operation | Batch Size | Memory (GB) |
|-----------|------------|-------------|
| Inference | 1 | 6 |
| Training | 1 | 12 |
| Training | 2 | 24 |
| Training | 4 | 48 |

## Next Steps

1. **Prepare your dataset** - Collect and annotate videos
2. **Start training** - Begin with small dataset to verify
3. **Monitor progress** - Check loss, sample generations
4. **Fine-tune** - Adjust hyperparameters based on results
5. **Evaluate** - Test on held-out validation set
6. **Deploy** - Use for inference on new prompts

## Getting Help

- GitHub Issues: Report bugs and ask questions
- Documentation: Check README.md and ARCHITECTURE.md
- Examples: See example scripts in the repository

## Additional Resources

- [PyTorch Documentation](https://pytorch.org/docs/)
- [Diffusion Models Explained](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/)
- [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
- [DiT Paper](https://arxiv.org/abs/2212.09748)