Testing / CUDA_SETUP.md
Sidak Singh
question boundary works
7b7db64
# CUDA Configuration Guide
This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.
## Overview
The app supports both CPU and GPU processing for all AI models:
- **Whisper** (speech-to-text)
- **RoBERTa** (question classification)
- **Sentence Boundary Detection**
GPU acceleration can provide **2-10x faster processing** for real-time transcription.
## Quick Setup
### 1. Check CUDA Availability
```bash
python test_cuda.py
```
### 2. Configure Device
Create a `.env` file:
```bash
cp .env.example .env
```
Edit `.env`:
```bash
# For GPU acceleration
USE_CUDA=true
# For CPU processing (default)
USE_CUDA=false
```
### 3. Run the App
```bash
python app.py
```
## Detailed Configuration
### Environment Variables
| Variable | Values | Description |
|----------|--------|-------------|
| `USE_CUDA` | `true`/`false` | Enable/disable GPU acceleration |
### Device Selection Logic
```
1. If USE_CUDA=true AND CUDA available → Use GPU
2. If USE_CUDA=true AND CUDA not available → Fallback to CPU (with warning)
3. If USE_CUDA=false → Use CPU
4. If no .env file → Default to CPU
```
### Model Configurations
| Device | Whisper | RoBERTa | Compute Type |
|--------|---------|---------|--------------|
| **CPU** | `device="cpu"` | `device=-1` | `int8` |
| **GPU** | `device="cuda"` | `device=0` | `float16` |
## CUDA Requirements
### System Requirements
- NVIDIA GPU with CUDA Compute Capability 3.5+
- CUDA Toolkit 11.8+ or 12.x
- cuDNN 8.x
- 4GB+ GPU memory recommended
### Python Dependencies
```bash
# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Then install other requirements
pip install -r requirements.txt
```
## Performance Comparison
### Typical Speedups with GPU
| Model | CPU Time | GPU Time | Speedup |
|-------|----------|----------|---------|
| Whisper (base) | ~2-5s | ~0.5-1s | 3-5x |
| RoBERTa | ~100ms | ~20ms | 5x |
| Overall | Real-time lag | Near instant | 3-8x |
### Memory Usage
| Configuration | RAM | GPU Memory |
|---------------|-----|------------|
| CPU Only | 2-4GB | 0GB |
| GPU Accelerated | 1-2GB | 2-6GB |
## Troubleshooting
### Common Issues
#### 1. "CUDA requested but not available"
```
⚠️ Warning: CUDA requested but not available, falling back to CPU
```
**Solution:** Install CUDA toolkit and PyTorch with CUDA support
#### 2. "Out of memory" errors
**Solutions:**
- Reduce model size (e.g., `tiny.en``base.en`)
- Set `USE_CUDA=false` to use CPU
- Close other GPU applications
#### 3. Models not loading on GPU
**Check:**
```python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
```
### Testing Your Setup
Run the comprehensive test:
```bash
python test_cuda.py
```
This will test:
- ✅ PyTorch CUDA detection
- ✅ Transformers device support
- ✅ Whisper model loading
- ✅ GPU memory availability
- ✅ Performance benchmark
### Debug Mode
For detailed device information, check the app startup:
```
🔧 Configuration:
Device: CUDA
Compute type: float16
CUDA available: True
GPU: NVIDIA GeForce RTX 3080
GPU Memory: 10.0 GB
```
## Installation Examples
### Ubuntu/Linux with CUDA
```bash
# Install CUDA toolkit
sudo apt update
sudo apt install nvidia-cuda-toolkit
# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install app dependencies
pip install -r requirements.txt
# Configure for GPU
echo "USE_CUDA=true" > .env
# Test setup
python test_cuda.py
# Run app
python app.py
```
### Windows with CUDA
```bash
# Install CUDA toolkit from NVIDIA website
# https://developer.nvidia.com/cuda-downloads
# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install app dependencies
pip install -r requirements.txt
# Configure for GPU
echo USE_CUDA=true > .env
# Test setup
python test_cuda.py
# Run app
python app.py
```
### CPU-Only Installation
```bash
# Install PyTorch CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install app dependencies
pip install -r requirements.txt
# Configure for CPU
echo "USE_CUDA=false" > .env
# Run app
python app.py
```
## Advanced Configuration
### Custom Device Settings
You can override device settings in code:
```python
# Force specific device
from components.transcriber import AudioProcessor
processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")
```
### Mixed Precision
GPU configurations automatically use optimal precision:
- **CPU:** `int8` quantization for speed
- **GPU:** `float16` for memory efficiency
### Multiple GPUs
For systems with multiple GPUs:
```python
# Use specific GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Use second GPU
```
## Performance Tuning
### For Maximum Speed (GPU)
```bash
USE_CUDA=true
```
- Use `base.en` or `small.en` Whisper model
- Ensure 4GB+ GPU memory available
- Close other GPU applications
### For Maximum Compatibility (CPU)
```bash
USE_CUDA=false
```
- Use `tiny.en` Whisper model
- Works on any system
- Lower memory requirements
### Balanced Performance
```bash
USE_CUDA=true # with fallback to CPU
```
- Use `base.en` Whisper model
- Automatic device detection
- Best of both worlds
## Support
### Getting Help
1. Run diagnostic test: `python test_cuda.py`
2. Check device info in app startup logs
3. Verify .env configuration
4. Test with minimal example
### Reporting Issues
Include this information:
- Output of `python test_cuda.py`
- Your `.env` file contents
- GPU model and memory
- Error messages from app startup
---
**Note:** CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.