Spaces:

Scrapyard-Brampton
/

Testing

Sleeping

App Files Files Community

Testing / CUDA_SETUP.md

Sidak Singh

question boundary works

7b7db64 6 months ago

preview code

raw

history blame contribute delete

5.92 kB

CUDA Configuration Guide

This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.

Overview

The app supports both CPU and GPU processing for all AI models:

Whisper (speech-to-text)
RoBERTa (question classification)
Sentence Boundary Detection

GPU acceleration can provide 2-10x faster processing for real-time transcription.

Quick Setup

1. Check CUDA Availability

python test_cuda.py

2. Configure Device

Create a .env file:

cp .env.example .env

Edit .env:

# For GPU acceleration
USE_CUDA=true

# For CPU processing (default)
USE_CUDA=false

3. Run the App

python app.py

Detailed Configuration

Environment Variables

Variable	Values	Description
`USE_CUDA`	`true`/`false`	Enable/disable GPU acceleration

Device Selection Logic

1. If USE_CUDA=true AND CUDA available → Use GPU
2. If USE_CUDA=true AND CUDA not available → Fallback to CPU (with warning)
3. If USE_CUDA=false → Use CPU
4. If no .env file → Default to CPU

Model Configurations

Device	Whisper	RoBERTa	Compute Type
CPU	`device="cpu"`	`device=-1`	`int8`
GPU	`device="cuda"`	`device=0`	`float16`

CUDA Requirements

System Requirements

NVIDIA GPU with CUDA Compute Capability 3.5+
CUDA Toolkit 11.8+ or 12.x
cuDNN 8.x
4GB+ GPU memory recommended

Python Dependencies

# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then install other requirements
pip install -r requirements.txt

Performance Comparison

Typical Speedups with GPU

Model	CPU Time	GPU Time	Speedup
Whisper (base)	~2-5s	~0.5-1s	3-5x
RoBERTa	~100ms	~20ms	5x
Overall	Real-time lag	Near instant	3-8x

Memory Usage

Configuration	RAM	GPU Memory
CPU Only	2-4GB	0GB
GPU Accelerated	1-2GB	2-6GB

Troubleshooting

Common Issues

1. "CUDA requested but not available"

⚠️ Warning: CUDA requested but not available, falling back to CPU

Solution: Install CUDA toolkit and PyTorch with CUDA support

2. "Out of memory" errors

Solutions:

Reduce model size (e.g., tiny.en → base.en)
Set USE_CUDA=false to use CPU
Close other GPU applications

3. Models not loading on GPU

Check:

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")

Testing Your Setup

Run the comprehensive test:

python test_cuda.py

This will test:

✅ PyTorch CUDA detection
✅ Transformers device support
✅ Whisper model loading
✅ GPU memory availability
✅ Performance benchmark

Debug Mode

For detailed device information, check the app startup:

🔧 Configuration:
   Device: CUDA
   Compute type: float16
   CUDA available: True
   GPU: NVIDIA GeForce RTX 3080
   GPU Memory: 10.0 GB

Installation Examples

Ubuntu/Linux with CUDA

# Install CUDA toolkit
sudo apt update
sudo apt install nvidia-cuda-toolkit

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo "USE_CUDA=true" > .env

# Test setup
python test_cuda.py

# Run app
python app.py

Windows with CUDA

# Install CUDA toolkit from NVIDIA website
# https://developer.nvidia.com/cuda-downloads

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo USE_CUDA=true > .env

# Test setup
python test_cuda.py

# Run app
python app.py

CPU-Only Installation

# Install PyTorch CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install app dependencies
pip install -r requirements.txt

# Configure for CPU
echo "USE_CUDA=false" > .env

# Run app
python app.py

Advanced Configuration

Custom Device Settings

You can override device settings in code:

# Force specific device
from components.transcriber import AudioProcessor
processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")

Mixed Precision

GPU configurations automatically use optimal precision:

CPU: int8 quantization for speed
GPU: float16 for memory efficiency

Multiple GPUs

For systems with multiple GPUs:

# Use specific GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # Use second GPU

Performance Tuning

For Maximum Speed (GPU)

USE_CUDA=true

Use base.en or small.en Whisper model
Ensure 4GB+ GPU memory available
Close other GPU applications

For Maximum Compatibility (CPU)

USE_CUDA=false

Use tiny.en Whisper model
Works on any system
Lower memory requirements

Balanced Performance

USE_CUDA=true  # with fallback to CPU

Use base.en Whisper model
Automatic device detection
Best of both worlds

Support

Getting Help

Run diagnostic test: python test_cuda.py
Check device info in app startup logs
Verify .env configuration
Test with minimal example

Reporting Issues

Include this information:

Output of python test_cuda.py
Your .env file contents
GPU model and memory
Error messages from app startup

Note: CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.