Spaces:
Sleeping
Sleeping
CUDA Configuration Guide
This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.
Overview
The app supports both CPU and GPU processing for all AI models:
- Whisper (speech-to-text)
- RoBERTa (question classification)
- Sentence Boundary Detection
GPU acceleration can provide 2-10x faster processing for real-time transcription.
Quick Setup
1. Check CUDA Availability
python test_cuda.py
2. Configure Device
Create a .env file:
cp .env.example .env
Edit .env:
# For GPU acceleration
USE_CUDA=true
# For CPU processing (default)
USE_CUDA=false
3. Run the App
python app.py
Detailed Configuration
Environment Variables
| Variable | Values | Description |
|---|---|---|
USE_CUDA |
true/false |
Enable/disable GPU acceleration |
Device Selection Logic
1. If USE_CUDA=true AND CUDA available β Use GPU
2. If USE_CUDA=true AND CUDA not available β Fallback to CPU (with warning)
3. If USE_CUDA=false β Use CPU
4. If no .env file β Default to CPU
Model Configurations
| Device | Whisper | RoBERTa | Compute Type |
|---|---|---|---|
| CPU | device="cpu" |
device=-1 |
int8 |
| GPU | device="cuda" |
device=0 |
float16 |
CUDA Requirements
System Requirements
- NVIDIA GPU with CUDA Compute Capability 3.5+
- CUDA Toolkit 11.8+ or 12.x
- cuDNN 8.x
- 4GB+ GPU memory recommended
Python Dependencies
# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Then install other requirements
pip install -r requirements.txt
Performance Comparison
Typical Speedups with GPU
| Model | CPU Time | GPU Time | Speedup |
|---|---|---|---|
| Whisper (base) | ~2-5s | ~0.5-1s | 3-5x |
| RoBERTa | ~100ms | ~20ms | 5x |
| Overall | Real-time lag | Near instant | 3-8x |
Memory Usage
| Configuration | RAM | GPU Memory |
|---|---|---|
| CPU Only | 2-4GB | 0GB |
| GPU Accelerated | 1-2GB | 2-6GB |
Troubleshooting
Common Issues
1. "CUDA requested but not available"
β οΈ Warning: CUDA requested but not available, falling back to CPU
Solution: Install CUDA toolkit and PyTorch with CUDA support
2. "Out of memory" errors
Solutions:
- Reduce model size (e.g.,
tiny.enβbase.en) - Set
USE_CUDA=falseto use CPU - Close other GPU applications
3. Models not loading on GPU
Check:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
Testing Your Setup
Run the comprehensive test:
python test_cuda.py
This will test:
- β PyTorch CUDA detection
- β Transformers device support
- β Whisper model loading
- β GPU memory availability
- β Performance benchmark
Debug Mode
For detailed device information, check the app startup:
π§ Configuration:
Device: CUDA
Compute type: float16
CUDA available: True
GPU: NVIDIA GeForce RTX 3080
GPU Memory: 10.0 GB
Installation Examples
Ubuntu/Linux with CUDA
# Install CUDA toolkit
sudo apt update
sudo apt install nvidia-cuda-toolkit
# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install app dependencies
pip install -r requirements.txt
# Configure for GPU
echo "USE_CUDA=true" > .env
# Test setup
python test_cuda.py
# Run app
python app.py
Windows with CUDA
# Install CUDA toolkit from NVIDIA website
# https://developer.nvidia.com/cuda-downloads
# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install app dependencies
pip install -r requirements.txt
# Configure for GPU
echo USE_CUDA=true > .env
# Test setup
python test_cuda.py
# Run app
python app.py
CPU-Only Installation
# Install PyTorch CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install app dependencies
pip install -r requirements.txt
# Configure for CPU
echo "USE_CUDA=false" > .env
# Run app
python app.py
Advanced Configuration
Custom Device Settings
You can override device settings in code:
# Force specific device
from components.transcriber import AudioProcessor
processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")
Mixed Precision
GPU configurations automatically use optimal precision:
- CPU:
int8quantization for speed - GPU:
float16for memory efficiency
Multiple GPUs
For systems with multiple GPUs:
# Use specific GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Use second GPU
Performance Tuning
For Maximum Speed (GPU)
USE_CUDA=true
- Use
base.enorsmall.enWhisper model - Ensure 4GB+ GPU memory available
- Close other GPU applications
For Maximum Compatibility (CPU)
USE_CUDA=false
- Use
tiny.enWhisper model - Works on any system
- Lower memory requirements
Balanced Performance
USE_CUDA=true # with fallback to CPU
- Use
base.enWhisper model - Automatic device detection
- Best of both worlds
Support
Getting Help
- Run diagnostic test:
python test_cuda.py - Check device info in app startup logs
- Verify .env configuration
- Test with minimal example
Reporting Issues
Include this information:
- Output of
python test_cuda.py - Your
.envfile contents - GPU model and memory
- Error messages from app startup
Note: CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.