Zenderos / SETUP.md

ASADSANAN

Upload 11 files

3d8856d verified 24 days ago

preview code

raw

history blame contribute delete

8.51 kB

TTV-1B Setup Guide

Complete installation and setup instructions for the TTV-1B text-to-video model.

Prerequisites

Hardware Requirements

Minimum (Inference Only)

GPU: 8GB VRAM (RTX 3070, RTX 4060 Ti)
RAM: 16GB
Storage: 50GB
OS: Ubuntu 20.04+, Windows 10+, macOS 12+

Recommended (Training)

GPU: 24GB+ VRAM (RTX 4090, A5000, A100)
RAM: 64GB
Storage: 500GB SSD
OS: Ubuntu 22.04 LTS

Production (Full Training)

GPU: 8× A100 80GB
RAM: 512GB
Storage: 2TB NVMe SSD
Network: High-speed interconnect for multi-GPU

Software Requirements

Python 3.9, 3.10, or 3.11
CUDA 11.8+ (for GPU acceleration)
cuDNN 8.6+
Git

Installation

Step 1: Clone Repository

git clone https://github.com/yourusername/ttv-1b.git
cd ttv-1b

Step 2: Create Virtual Environment

# Using venv
python3 -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate  # Windows

# Using conda (alternative)
conda create -n ttv1b python=3.10
conda activate ttv1b

Step 3: Install PyTorch

Choose the appropriate command for your system from https://pytorch.org/get-started/locally/

# CUDA 11.8 (most common)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# CUDA 12.1
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# CPU only (not recommended)
pip install torch torchvision

Step 4: Install Dependencies

pip install -r requirements.txt

Step 5: Verify Installation

python -c "import torch; print(f'PyTorch {torch.__version__}'); print(f'CUDA available: {torch.cuda.is_available()}')"

Expected output:

PyTorch 2.1.0
CUDA available: True

Quick Start

Test the Model

# Run evaluation script to verify everything works
python evaluate.py

This will:

Create the model
Count parameters (should be ~1.0B)
Test forward/backward passes
Measure inference speed
Check memory usage

Generate Your First Video (After Training)

python inference.py \
    --prompt "A beautiful sunset over mountains" \
    --checkpoint checkpoints/checkpoint_best.pt \
    --output my_first_video.mp4 \
    --steps 50

Preparing Data

Data Format

The model expects video-text pairs in the following format:

data/
├── videos/
│   ├── video_0001.mp4
│   ├── video_0002.mp4
│   └── ...
└── annotations.json

annotations.json:

{
  "video_0001": {
    "caption": "A cat playing with a ball of yarn",
    "duration": 2.0,
    "fps": 8
  },
  "video_0002": {
    "caption": "Sunset over the ocean with waves",
    "duration": 2.0,
    "fps": 8
  }
}

Video Specifications

Format: MP4, AVI, or MOV
Resolution: 256×256 (will be resized)
Frame rate: 8 FPS recommended
Duration: 2 seconds (16 frames at 8 FPS)
Codec: H.264 recommended

Converting Videos

# Using FFmpeg to convert videos
ffmpeg -i input.mp4 -vf "scale=256:256,fps=8" -t 2 -c:v libx264 output.mp4

Dataset Preparation Script

import json
from pathlib import Path

def create_annotations(video_dir, output_file):
    """Create annotations file from videos"""
    video_dir = Path(video_dir)
    annotations = {}
    
    for video_path in video_dir.glob("*.mp4"):
        video_id = video_path.stem
        annotations[video_id] = {
            "caption": f"Video {video_id}",  # Add actual captions
            "duration": 2.0,
            "fps": 8
        }
    
    with open(output_file, 'w') as f:
        json.dump(annotations, f, indent=2)

# Usage
create_annotations("data/videos", "data/annotations.json")

Training

Single GPU Training

python train.py

Configuration in train.py:

config = {
    'batch_size': 2,
    'gradient_accumulation_steps': 8,  # Effective batch size = 16
    'learning_rate': 1e-4,
    'num_epochs': 100,
    'mixed_precision': True,
}

Multi-GPU Training (Recommended)

# Using PyTorch DDP
torchrun --nproc_per_node=8 train.py

# Or using accelerate (better)
accelerate config  # First time setup
accelerate launch train.py

Monitoring Training

# Install tensorboard
pip install tensorboard

# Run tensorboard
tensorboard --logdir=./checkpoints/logs

Resume from Checkpoint

# In train.py, add:
trainer.load_checkpoint('checkpoints/checkpoint_step_10000.pt')
trainer.train()

Inference

Basic Inference

from inference import generate_video_from_prompt

video = generate_video_from_prompt(
    prompt="A serene lake with mountains",
    checkpoint_path="checkpoints/best.pt",
    output_path="output.mp4",
    num_steps=50,
    guidance_scale=7.5,
    seed=42  # For reproducibility
)

Batch Inference

from inference import batch_generate

prompts = [
    "A cat playing",
    "Ocean waves",
    "City at night"
]

batch_generate(
    prompts=prompts,
    checkpoint_path="checkpoints/best.pt",
    output_dir="./outputs",
    num_steps=50
)

Advanced Options

# Lower guidance for more creative results
video = generate_video_from_prompt(
    prompt="Abstract art in motion",
    guidance_scale=5.0,  # Lower = more creative
    num_steps=100,        # More steps = higher quality
)

# Fast generation (fewer steps)
video = generate_video_from_prompt(
    prompt="Quick test",
    num_steps=20,  # Faster but lower quality
)

Optimization Tips

Memory Optimization

Reduce Batch Size

config['batch_size'] = 1  # Minimum
config['gradient_accumulation_steps'] = 16  # Maintain effective batch size

Enable Gradient Checkpointing

config['gradient_checkpointing'] = True

Use Mixed Precision

config['mixed_precision'] = True  # Always recommended

Speed Optimization

Use Torch Compile (PyTorch 2.0+)

model = torch.compile(model)

Enable cuDNN Benchmarking

torch.backends.cudnn.benchmark = True

Pin Memory

DataLoader(..., pin_memory=True)

Troubleshooting

CUDA Out of Memory

# Reduce batch size
config['batch_size'] = 1

# Enable gradient checkpointing
config['gradient_checkpointing'] = True

# Clear cache
torch.cuda.empty_cache()

Slow Training

# Check GPU utilization
nvidia-smi

# Increase num_workers
DataLoader(..., num_workers=8)

# Enable mixed precision
config['mixed_precision'] = True

NaN Loss

# Reduce learning rate
config['learning_rate'] = 5e-5

# Enable gradient clipping (already included)
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

# Check for NaN in data
assert not torch.isnan(videos).any()

Model Not Learning

# Increase learning rate
config['learning_rate'] = 2e-4

# Check data quality
# Verify annotations are correct
# Ensure videos are properly normalized

# Reduce regularization
config['weight_decay'] = 0.001  # Lower weight decay

Performance Benchmarks

Training Speed (A100 80GB)

Batch Size	Grad Accum	Eff. Batch	Sec/Batch	Hours/100K steps
1	16	16	2.5	69
2	8	16	2.5	69
4	4	16	2.7	75

Inference Speed

GPU	FP16	Steps	Time/Video
A100 80GB	Yes	50	15s
RTX 4090	Yes	50	25s
RTX 3090	Yes	50	35s

Memory Usage

Operation	Batch Size	Memory (GB)
Inference	1	6
Training	1	12
Training	2	24
Training	4	48

Next Steps

Prepare your dataset - Collect and annotate videos
Start training - Begin with small dataset to verify
Monitor progress - Check loss, sample generations
Fine-tune - Adjust hyperparameters based on results
Evaluate - Test on held-out validation set
Deploy - Use for inference on new prompts

Getting Help

GitHub Issues: Report bugs and ask questions
Documentation: Check README.md and ARCHITECTURE.md
Examples: See example scripts in the repository