Zenderos / SETUP.md
ASADSANAN's picture
Upload 11 files
3d8856d verified
# TTV-1B Setup Guide
Complete installation and setup instructions for the TTV-1B text-to-video model.
## Prerequisites
### Hardware Requirements
#### Minimum (Inference Only)
- GPU: 8GB VRAM (RTX 3070, RTX 4060 Ti)
- RAM: 16GB
- Storage: 50GB
- OS: Ubuntu 20.04+, Windows 10+, macOS 12+
#### Recommended (Training)
- GPU: 24GB+ VRAM (RTX 4090, A5000, A100)
- RAM: 64GB
- Storage: 500GB SSD
- OS: Ubuntu 22.04 LTS
#### Production (Full Training)
- GPU: 8Γ— A100 80GB
- RAM: 512GB
- Storage: 2TB NVMe SSD
- Network: High-speed interconnect for multi-GPU
### Software Requirements
- Python 3.9, 3.10, or 3.11
- CUDA 11.8+ (for GPU acceleration)
- cuDNN 8.6+
- Git
## Installation
### Step 1: Clone Repository
```bash
git clone https://github.com/yourusername/ttv-1b.git
cd ttv-1b
```
### Step 2: Create Virtual Environment
```bash
# Using venv
python3 -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Using conda (alternative)
conda create -n ttv1b python=3.10
conda activate ttv1b
```
### Step 3: Install PyTorch
Choose the appropriate command for your system from https://pytorch.org/get-started/locally/
```bash
# CUDA 11.8 (most common)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# CPU only (not recommended)
pip install torch torchvision
```
### Step 4: Install Dependencies
```bash
pip install -r requirements.txt
```
### Step 5: Verify Installation
```bash
python -c "import torch; print(f'PyTorch {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"
```
Expected output:
```
PyTorch 2.1.0
CUDA available: True
```
## Quick Start
### Test the Model
```bash
# Run evaluation script to verify everything works
python evaluate.py
```
This will:
- Create the model
- Count parameters (should be ~1.0B)
- Test forward/backward passes
- Measure inference speed
- Check memory usage
### Generate Your First Video (After Training)
```bash
python inference.py \
--prompt "A beautiful sunset over mountains" \
--checkpoint checkpoints/checkpoint_best.pt \
--output my_first_video.mp4 \
--steps 50
```
## Preparing Data
### Data Format
The model expects video-text pairs in the following format:
```
data/
β”œβ”€β”€ videos/
β”‚ β”œβ”€β”€ video_0001.mp4
β”‚ β”œβ”€β”€ video_0002.mp4
β”‚ └── ...
└── annotations.json
```
annotations.json:
```json
{
"video_0001": {
"caption": "A cat playing with a ball of yarn",
"duration": 2.0,
"fps": 8
},
"video_0002": {
"caption": "Sunset over the ocean with waves",
"duration": 2.0,
"fps": 8
}
}
```
### Video Specifications
- Format: MP4, AVI, or MOV
- Resolution: 256Γ—256 (will be resized)
- Frame rate: 8 FPS recommended
- Duration: 2 seconds (16 frames at 8 FPS)
- Codec: H.264 recommended
### Converting Videos
```bash
# Using FFmpeg to convert videos
ffmpeg -i input.mp4 -vf "scale=256:256,fps=8" -t 2 -c:v libx264 output.mp4
```
### Dataset Preparation Script
```python
import json
from pathlib import Path
def create_annotations(video_dir, output_file):
"""Create annotations file from videos"""
video_dir = Path(video_dir)
annotations = {}
for video_path in video_dir.glob("*.mp4"):
video_id = video_path.stem
annotations[video_id] = {
"caption": f"Video {video_id}", # Add actual captions
"duration": 2.0,
"fps": 8
}
with open(output_file, 'w') as f:
json.dump(annotations, f, indent=2)
# Usage
create_annotations("data/videos", "data/annotations.json")
```
## Training
### Single GPU Training
```bash
python train.py
```
Configuration in train.py:
```python
config = {
'batch_size': 2,
'gradient_accumulation_steps': 8, # Effective batch size = 16
'learning_rate': 1e-4,
'num_epochs': 100,
'mixed_precision': True,
}
```
### Multi-GPU Training (Recommended)
```bash
# Using PyTorch DDP
torchrun --nproc_per_node=8 train.py
# Or using accelerate (better)
accelerate config # First time setup
accelerate launch train.py
```
### Monitoring Training
```bash
# Install tensorboard
pip install tensorboard
# Run tensorboard
tensorboard --logdir=./checkpoints/logs
```
### Resume from Checkpoint
```python
# In train.py, add:
trainer.load_checkpoint('checkpoints/checkpoint_step_10000.pt')
trainer.train()
```
## Inference
### Basic Inference
```python
from inference import generate_video_from_prompt
video = generate_video_from_prompt(
prompt="A serene lake with mountains",
checkpoint_path="checkpoints/best.pt",
output_path="output.mp4",
num_steps=50,
guidance_scale=7.5,
seed=42 # For reproducibility
)
```
### Batch Inference
```python
from inference import batch_generate
prompts = [
"A cat playing",
"Ocean waves",
"City at night"
]
batch_generate(
prompts=prompts,
checkpoint_path="checkpoints/best.pt",
output_dir="./outputs",
num_steps=50
)
```
### Advanced Options
```python
# Lower guidance for more creative results
video = generate_video_from_prompt(
prompt="Abstract art in motion",
guidance_scale=5.0, # Lower = more creative
num_steps=100, # More steps = higher quality
)
# Fast generation (fewer steps)
video = generate_video_from_prompt(
prompt="Quick test",
num_steps=20, # Faster but lower quality
)
```
## Optimization Tips
### Memory Optimization
1. **Reduce Batch Size**
```python
config['batch_size'] = 1 # Minimum
config['gradient_accumulation_steps'] = 16 # Maintain effective batch size
```
2. **Enable Gradient Checkpointing**
```python
config['gradient_checkpointing'] = True
```
3. **Use Mixed Precision**
```python
config['mixed_precision'] = True # Always recommended
```
### Speed Optimization
1. **Use Torch Compile** (PyTorch 2.0+)
```python
model = torch.compile(model)
```
2. **Enable cuDNN Benchmarking**
```python
torch.backends.cudnn.benchmark = True
```
3. **Pin Memory**
```python
DataLoader(..., pin_memory=True)
```
## Troubleshooting
### CUDA Out of Memory
```bash
# Reduce batch size
config['batch_size'] = 1
# Enable gradient checkpointing
config['gradient_checkpointing'] = True
# Clear cache
torch.cuda.empty_cache()
```
### Slow Training
```bash
# Check GPU utilization
nvidia-smi
# Increase num_workers
DataLoader(..., num_workers=8)
# Enable mixed precision
config['mixed_precision'] = True
```
### NaN Loss
```python
# Reduce learning rate
config['learning_rate'] = 5e-5
# Enable gradient clipping (already included)
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# Check for NaN in data
assert not torch.isnan(videos).any()
```
### Model Not Learning
```python
# Increase learning rate
config['learning_rate'] = 2e-4
# Check data quality
# Verify annotations are correct
# Ensure videos are properly normalized
# Reduce regularization
config['weight_decay'] = 0.001 # Lower weight decay
```
## Performance Benchmarks
### Training Speed (A100 80GB)
| Batch Size | Grad Accum | Eff. Batch | Sec/Batch | Hours/100K steps |
|------------|------------|------------|-----------|------------------|
| 1 | 16 | 16 | 2.5 | 69 |
| 2 | 8 | 16 | 2.5 | 69 |
| 4 | 4 | 16 | 2.7 | 75 |
### Inference Speed
| GPU | FP16 | Steps | Time/Video |
|-----|------|-------|------------|
| A100 80GB | Yes | 50 | 15s |
| RTX 4090 | Yes | 50 | 25s |
| RTX 3090 | Yes | 50 | 35s |
### Memory Usage
| Operation | Batch Size | Memory (GB) |
|-----------|------------|-------------|
| Inference | 1 | 6 |
| Training | 1 | 12 |
| Training | 2 | 24 |
| Training | 4 | 48 |
## Next Steps
1. **Prepare your dataset** - Collect and annotate videos
2. **Start training** - Begin with small dataset to verify
3. **Monitor progress** - Check loss, sample generations
4. **Fine-tune** - Adjust hyperparameters based on results
5. **Evaluate** - Test on held-out validation set
6. **Deploy** - Use for inference on new prompts
## Getting Help
- GitHub Issues: Report bugs and ask questions
- Documentation: Check README.md and ARCHITECTURE.md
- Examples: See example scripts in the repository
## Additional Resources
- [PyTorch Documentation](https://pytorch.org/docs/)
- [Diffusion Models Explained](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/)
- [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
- [DiT Paper](https://arxiv.org/abs/2212.09748)