# CUDA Configuration Guide

This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.

## Overview

The app supports both CPU and GPU processing for all AI models:
- **Whisper** (speech-to-text)
- **RoBERTa** (question classification) 
- **Sentence Boundary Detection**

GPU acceleration can provide **2-10x faster processing** for real-time transcription.

## Quick Setup

### 1. Check CUDA Availability
```bash
python test_cuda.py
```

### 2. Configure Device
Create a `.env` file:
```bash
cp .env.example .env
```

Edit `.env`:
```bash
# For GPU acceleration
USE_CUDA=true

# For CPU processing (default)
USE_CUDA=false
```

### 3. Run the App
```bash
python app.py
```

## Detailed Configuration

### Environment Variables

| Variable | Values | Description |
|----------|--------|-------------|
| `USE_CUDA` | `true`/`false` | Enable/disable GPU acceleration |

### Device Selection Logic

```
1. If USE_CUDA=true AND CUDA available → Use GPU
2. If USE_CUDA=true AND CUDA not available → Fallback to CPU (with warning)
3. If USE_CUDA=false → Use CPU
4. If no .env file → Default to CPU
```

### Model Configurations

| Device | Whisper | RoBERTa | Compute Type |
|--------|---------|---------|--------------|
| **CPU** | `device="cpu"` | `device=-1` | `int8` |
| **GPU** | `device="cuda"` | `device=0` | `float16` |

## CUDA Requirements

### System Requirements
- NVIDIA GPU with CUDA Compute Capability 3.5+
- CUDA Toolkit 11.8+ or 12.x
- cuDNN 8.x
- 4GB+ GPU memory recommended

### Python Dependencies
```bash
# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then install other requirements
pip install -r requirements.txt
```

## Performance Comparison

### Typical Speedups with GPU

| Model | CPU Time | GPU Time | Speedup |
|-------|----------|----------|---------|
| Whisper (base) | ~2-5s | ~0.5-1s | 3-5x |
| RoBERTa | ~100ms | ~20ms | 5x |
| Overall | Real-time lag | Near instant | 3-8x |

### Memory Usage

| Configuration | RAM | GPU Memory |
|---------------|-----|------------|
| CPU Only | 2-4GB | 0GB |
| GPU Accelerated | 1-2GB | 2-6GB |

## Troubleshooting

### Common Issues

#### 1. "CUDA requested but not available"
```
⚠️ Warning: CUDA requested but not available, falling back to CPU
```
**Solution:** Install CUDA toolkit and PyTorch with CUDA support

#### 2. "Out of memory" errors
**Solutions:**
- Reduce model size (e.g., `tiny.en` → `base.en`)
- Set `USE_CUDA=false` to use CPU
- Close other GPU applications

#### 3. Models not loading on GPU
**Check:**
```python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
```

### Testing Your Setup

Run the comprehensive test:
```bash
python test_cuda.py
```

This will test:
- ✅ PyTorch CUDA detection
- ✅ Transformers device support  
- ✅ Whisper model loading
- ✅ GPU memory availability
- ✅ Performance benchmark

### Debug Mode

For detailed device information, check the app startup:
```
🔧 Configuration:
   Device: CUDA
   Compute type: float16
   CUDA available: True
   GPU: NVIDIA GeForce RTX 3080
   GPU Memory: 10.0 GB
```

## Installation Examples

### Ubuntu/Linux with CUDA
```bash
# Install CUDA toolkit
sudo apt update
sudo apt install nvidia-cuda-toolkit

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo "USE_CUDA=true" > .env

# Test setup
python test_cuda.py

# Run app
python app.py
```

### Windows with CUDA
```bash
# Install CUDA toolkit from NVIDIA website
# https://developer.nvidia.com/cuda-downloads

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo USE_CUDA=true > .env

# Test setup
python test_cuda.py

# Run app
python app.py
```

### CPU-Only Installation
```bash
# Install PyTorch CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install app dependencies
pip install -r requirements.txt

# Configure for CPU
echo "USE_CUDA=false" > .env

# Run app
python app.py
```

## Advanced Configuration

### Custom Device Settings

You can override device settings in code:
```python
# Force specific device
from components.transcriber import AudioProcessor
processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")
```

### Mixed Precision

GPU configurations automatically use optimal precision:
- **CPU:** `int8` quantization for speed
- **GPU:** `float16` for memory efficiency

### Multiple GPUs

For systems with multiple GPUs:
```python
# Use specific GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # Use second GPU
```

## Performance Tuning

### For Maximum Speed (GPU)
```bash
USE_CUDA=true
```
- Use `base.en` or `small.en` Whisper model
- Ensure 4GB+ GPU memory available
- Close other GPU applications

### For Maximum Compatibility (CPU)
```bash
USE_CUDA=false
```
- Use `tiny.en` Whisper model
- Works on any system
- Lower memory requirements

### Balanced Performance
```bash
USE_CUDA=true  # with fallback to CPU
```
- Use `base.en` Whisper model
- Automatic device detection
- Best of both worlds

## Support

### Getting Help

1. Run diagnostic test: `python test_cuda.py`
2. Check device info in app startup logs
3. Verify .env configuration
4. Test with minimal example

### Reporting Issues

Include this information:
- Output of `python test_cuda.py`
- Your `.env` file contents
- GPU model and memory
- Error messages from app startup

---

**Note:** CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.