Testing / CUDA_SETUP.md
Sidak Singh
question boundary works
7b7db64

CUDA Configuration Guide

This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.

Overview

The app supports both CPU and GPU processing for all AI models:

  • Whisper (speech-to-text)
  • RoBERTa (question classification)
  • Sentence Boundary Detection

GPU acceleration can provide 2-10x faster processing for real-time transcription.

Quick Setup

1. Check CUDA Availability

python test_cuda.py

2. Configure Device

Create a .env file:

cp .env.example .env

Edit .env:

# For GPU acceleration
USE_CUDA=true

# For CPU processing (default)
USE_CUDA=false

3. Run the App

python app.py

Detailed Configuration

Environment Variables

Variable Values Description
USE_CUDA true/false Enable/disable GPU acceleration

Device Selection Logic

1. If USE_CUDA=true AND CUDA available β†’ Use GPU
2. If USE_CUDA=true AND CUDA not available β†’ Fallback to CPU (with warning)
3. If USE_CUDA=false β†’ Use CPU
4. If no .env file β†’ Default to CPU

Model Configurations

Device Whisper RoBERTa Compute Type
CPU device="cpu" device=-1 int8
GPU device="cuda" device=0 float16

CUDA Requirements

System Requirements

  • NVIDIA GPU with CUDA Compute Capability 3.5+
  • CUDA Toolkit 11.8+ or 12.x
  • cuDNN 8.x
  • 4GB+ GPU memory recommended

Python Dependencies

# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then install other requirements
pip install -r requirements.txt

Performance Comparison

Typical Speedups with GPU

Model CPU Time GPU Time Speedup
Whisper (base) ~2-5s ~0.5-1s 3-5x
RoBERTa ~100ms ~20ms 5x
Overall Real-time lag Near instant 3-8x

Memory Usage

Configuration RAM GPU Memory
CPU Only 2-4GB 0GB
GPU Accelerated 1-2GB 2-6GB

Troubleshooting

Common Issues

1. "CUDA requested but not available"

⚠️ Warning: CUDA requested but not available, falling back to CPU

Solution: Install CUDA toolkit and PyTorch with CUDA support

2. "Out of memory" errors

Solutions:

  • Reduce model size (e.g., tiny.en β†’ base.en)
  • Set USE_CUDA=false to use CPU
  • Close other GPU applications

3. Models not loading on GPU

Check:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")

Testing Your Setup

Run the comprehensive test:

python test_cuda.py

This will test:

  • βœ… PyTorch CUDA detection
  • βœ… Transformers device support
  • βœ… Whisper model loading
  • βœ… GPU memory availability
  • βœ… Performance benchmark

Debug Mode

For detailed device information, check the app startup:

πŸ”§ Configuration:
   Device: CUDA
   Compute type: float16
   CUDA available: True
   GPU: NVIDIA GeForce RTX 3080
   GPU Memory: 10.0 GB

Installation Examples

Ubuntu/Linux with CUDA

# Install CUDA toolkit
sudo apt update
sudo apt install nvidia-cuda-toolkit

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo "USE_CUDA=true" > .env

# Test setup
python test_cuda.py

# Run app
python app.py

Windows with CUDA

# Install CUDA toolkit from NVIDIA website
# https://developer.nvidia.com/cuda-downloads

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo USE_CUDA=true > .env

# Test setup
python test_cuda.py

# Run app
python app.py

CPU-Only Installation

# Install PyTorch CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install app dependencies
pip install -r requirements.txt

# Configure for CPU
echo "USE_CUDA=false" > .env

# Run app
python app.py

Advanced Configuration

Custom Device Settings

You can override device settings in code:

# Force specific device
from components.transcriber import AudioProcessor
processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")

Mixed Precision

GPU configurations automatically use optimal precision:

  • CPU: int8 quantization for speed
  • GPU: float16 for memory efficiency

Multiple GPUs

For systems with multiple GPUs:

# Use specific GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # Use second GPU

Performance Tuning

For Maximum Speed (GPU)

USE_CUDA=true
  • Use base.en or small.en Whisper model
  • Ensure 4GB+ GPU memory available
  • Close other GPU applications

For Maximum Compatibility (CPU)

USE_CUDA=false
  • Use tiny.en Whisper model
  • Works on any system
  • Lower memory requirements

Balanced Performance

USE_CUDA=true  # with fallback to CPU
  • Use base.en Whisper model
  • Automatic device detection
  • Best of both worlds

Support

Getting Help

  1. Run diagnostic test: python test_cuda.py
  2. Check device info in app startup logs
  3. Verify .env configuration
  4. Test with minimal example

Reporting Issues

Include this information:

  • Output of python test_cuda.py
  • Your .env file contents
  • GPU model and memory
  • Error messages from app startup

Note: CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.