# CUDA Configuration Guide This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA. ## Overview The app supports both CPU and GPU processing for all AI models: - **Whisper** (speech-to-text) - **RoBERTa** (question classification) - **Sentence Boundary Detection** GPU acceleration can provide **2-10x faster processing** for real-time transcription. ## Quick Setup ### 1. Check CUDA Availability ```bash python test_cuda.py ``` ### 2. Configure Device Create a `.env` file: ```bash cp .env.example .env ``` Edit `.env`: ```bash # For GPU acceleration USE_CUDA=true # For CPU processing (default) USE_CUDA=false ``` ### 3. Run the App ```bash python app.py ``` ## Detailed Configuration ### Environment Variables | Variable | Values | Description | |----------|--------|-------------| | `USE_CUDA` | `true`/`false` | Enable/disable GPU acceleration | ### Device Selection Logic ``` 1. If USE_CUDA=true AND CUDA available → Use GPU 2. If USE_CUDA=true AND CUDA not available → Fallback to CPU (with warning) 3. If USE_CUDA=false → Use CPU 4. If no .env file → Default to CPU ``` ### Model Configurations | Device | Whisper | RoBERTa | Compute Type | |--------|---------|---------|--------------| | **CPU** | `device="cpu"` | `device=-1` | `int8` | | **GPU** | `device="cuda"` | `device=0` | `float16` | ## CUDA Requirements ### System Requirements - NVIDIA GPU with CUDA Compute Capability 3.5+ - CUDA Toolkit 11.8+ or 12.x - cuDNN 8.x - 4GB+ GPU memory recommended ### Python Dependencies ```bash # Install PyTorch with CUDA support first pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Then install other requirements pip install -r requirements.txt ``` ## Performance Comparison ### Typical Speedups with GPU | Model | CPU Time | GPU Time | Speedup | |-------|----------|----------|---------| | Whisper (base) | ~2-5s | ~0.5-1s | 3-5x | | RoBERTa | ~100ms | ~20ms | 5x | | Overall | Real-time lag | Near instant | 3-8x | ### Memory Usage | Configuration | RAM | GPU Memory | |---------------|-----|------------| | CPU Only | 2-4GB | 0GB | | GPU Accelerated | 1-2GB | 2-6GB | ## Troubleshooting ### Common Issues #### 1. "CUDA requested but not available" ``` ⚠️ Warning: CUDA requested but not available, falling back to CPU ``` **Solution:** Install CUDA toolkit and PyTorch with CUDA support #### 2. "Out of memory" errors **Solutions:** - Reduce model size (e.g., `tiny.en` → `base.en`) - Set `USE_CUDA=false` to use CPU - Close other GPU applications #### 3. Models not loading on GPU **Check:** ```python import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"CUDA version: {torch.version.cuda}") ``` ### Testing Your Setup Run the comprehensive test: ```bash python test_cuda.py ``` This will test: - ✅ PyTorch CUDA detection - ✅ Transformers device support - ✅ Whisper model loading - ✅ GPU memory availability - ✅ Performance benchmark ### Debug Mode For detailed device information, check the app startup: ``` 🔧 Configuration: Device: CUDA Compute type: float16 CUDA available: True GPU: NVIDIA GeForce RTX 3080 GPU Memory: 10.0 GB ``` ## Installation Examples ### Ubuntu/Linux with CUDA ```bash # Install CUDA toolkit sudo apt update sudo apt install nvidia-cuda-toolkit # Install PyTorch with CUDA pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Install app dependencies pip install -r requirements.txt # Configure for GPU echo "USE_CUDA=true" > .env # Test setup python test_cuda.py # Run app python app.py ``` ### Windows with CUDA ```bash # Install CUDA toolkit from NVIDIA website # https://developer.nvidia.com/cuda-downloads # Install PyTorch with CUDA pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Install app dependencies pip install -r requirements.txt # Configure for GPU echo USE_CUDA=true > .env # Test setup python test_cuda.py # Run app python app.py ``` ### CPU-Only Installation ```bash # Install PyTorch CPU version pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu # Install app dependencies pip install -r requirements.txt # Configure for CPU echo "USE_CUDA=false" > .env # Run app python app.py ``` ## Advanced Configuration ### Custom Device Settings You can override device settings in code: ```python # Force specific device from components.transcriber import AudioProcessor processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16") ``` ### Mixed Precision GPU configurations automatically use optimal precision: - **CPU:** `int8` quantization for speed - **GPU:** `float16` for memory efficiency ### Multiple GPUs For systems with multiple GPUs: ```python # Use specific GPU import os os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Use second GPU ``` ## Performance Tuning ### For Maximum Speed (GPU) ```bash USE_CUDA=true ``` - Use `base.en` or `small.en` Whisper model - Ensure 4GB+ GPU memory available - Close other GPU applications ### For Maximum Compatibility (CPU) ```bash USE_CUDA=false ``` - Use `tiny.en` Whisper model - Works on any system - Lower memory requirements ### Balanced Performance ```bash USE_CUDA=true # with fallback to CPU ``` - Use `base.en` Whisper model - Automatic device detection - Best of both worlds ## Support ### Getting Help 1. Run diagnostic test: `python test_cuda.py` 2. Check device info in app startup logs 3. Verify .env configuration 4. Test with minimal example ### Reporting Issues Include this information: - Output of `python test_cuda.py` - Your `.env` file contents - GPU model and memory - Error messages from app startup --- **Note:** CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.