Spaces:
Sleeping
Sleeping
| # CUDA Configuration Guide | |
| This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA. | |
| ## Overview | |
| The app supports both CPU and GPU processing for all AI models: | |
| - **Whisper** (speech-to-text) | |
| - **RoBERTa** (question classification) | |
| - **Sentence Boundary Detection** | |
| GPU acceleration can provide **2-10x faster processing** for real-time transcription. | |
| ## Quick Setup | |
| ### 1. Check CUDA Availability | |
| ```bash | |
| python test_cuda.py | |
| ``` | |
| ### 2. Configure Device | |
| Create a `.env` file: | |
| ```bash | |
| cp .env.example .env | |
| ``` | |
| Edit `.env`: | |
| ```bash | |
| # For GPU acceleration | |
| USE_CUDA=true | |
| # For CPU processing (default) | |
| USE_CUDA=false | |
| ``` | |
| ### 3. Run the App | |
| ```bash | |
| python app.py | |
| ``` | |
| ## Detailed Configuration | |
| ### Environment Variables | |
| | Variable | Values | Description | | |
| |----------|--------|-------------| | |
| | `USE_CUDA` | `true`/`false` | Enable/disable GPU acceleration | | |
| ### Device Selection Logic | |
| ``` | |
| 1. If USE_CUDA=true AND CUDA available → Use GPU | |
| 2. If USE_CUDA=true AND CUDA not available → Fallback to CPU (with warning) | |
| 3. If USE_CUDA=false → Use CPU | |
| 4. If no .env file → Default to CPU | |
| ``` | |
| ### Model Configurations | |
| | Device | Whisper | RoBERTa | Compute Type | | |
| |--------|---------|---------|--------------| | |
| | **CPU** | `device="cpu"` | `device=-1` | `int8` | | |
| | **GPU** | `device="cuda"` | `device=0` | `float16` | | |
| ## CUDA Requirements | |
| ### System Requirements | |
| - NVIDIA GPU with CUDA Compute Capability 3.5+ | |
| - CUDA Toolkit 11.8+ or 12.x | |
| - cuDNN 8.x | |
| - 4GB+ GPU memory recommended | |
| ### Python Dependencies | |
| ```bash | |
| # Install PyTorch with CUDA support first | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| # Then install other requirements | |
| pip install -r requirements.txt | |
| ``` | |
| ## Performance Comparison | |
| ### Typical Speedups with GPU | |
| | Model | CPU Time | GPU Time | Speedup | | |
| |-------|----------|----------|---------| | |
| | Whisper (base) | ~2-5s | ~0.5-1s | 3-5x | | |
| | RoBERTa | ~100ms | ~20ms | 5x | | |
| | Overall | Real-time lag | Near instant | 3-8x | | |
| ### Memory Usage | |
| | Configuration | RAM | GPU Memory | | |
| |---------------|-----|------------| | |
| | CPU Only | 2-4GB | 0GB | | |
| | GPU Accelerated | 1-2GB | 2-6GB | | |
| ## Troubleshooting | |
| ### Common Issues | |
| #### 1. "CUDA requested but not available" | |
| ``` | |
| ⚠️ Warning: CUDA requested but not available, falling back to CPU | |
| ``` | |
| **Solution:** Install CUDA toolkit and PyTorch with CUDA support | |
| #### 2. "Out of memory" errors | |
| **Solutions:** | |
| - Reduce model size (e.g., `tiny.en` → `base.en`) | |
| - Set `USE_CUDA=false` to use CPU | |
| - Close other GPU applications | |
| #### 3. Models not loading on GPU | |
| **Check:** | |
| ```python | |
| import torch | |
| print(f"CUDA available: {torch.cuda.is_available()}") | |
| print(f"CUDA version: {torch.version.cuda}") | |
| ``` | |
| ### Testing Your Setup | |
| Run the comprehensive test: | |
| ```bash | |
| python test_cuda.py | |
| ``` | |
| This will test: | |
| - ✅ PyTorch CUDA detection | |
| - ✅ Transformers device support | |
| - ✅ Whisper model loading | |
| - ✅ GPU memory availability | |
| - ✅ Performance benchmark | |
| ### Debug Mode | |
| For detailed device information, check the app startup: | |
| ``` | |
| 🔧 Configuration: | |
| Device: CUDA | |
| Compute type: float16 | |
| CUDA available: True | |
| GPU: NVIDIA GeForce RTX 3080 | |
| GPU Memory: 10.0 GB | |
| ``` | |
| ## Installation Examples | |
| ### Ubuntu/Linux with CUDA | |
| ```bash | |
| # Install CUDA toolkit | |
| sudo apt update | |
| sudo apt install nvidia-cuda-toolkit | |
| # Install PyTorch with CUDA | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| # Install app dependencies | |
| pip install -r requirements.txt | |
| # Configure for GPU | |
| echo "USE_CUDA=true" > .env | |
| # Test setup | |
| python test_cuda.py | |
| # Run app | |
| python app.py | |
| ``` | |
| ### Windows with CUDA | |
| ```bash | |
| # Install CUDA toolkit from NVIDIA website | |
| # https://developer.nvidia.com/cuda-downloads | |
| # Install PyTorch with CUDA | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| # Install app dependencies | |
| pip install -r requirements.txt | |
| # Configure for GPU | |
| echo USE_CUDA=true > .env | |
| # Test setup | |
| python test_cuda.py | |
| # Run app | |
| python app.py | |
| ``` | |
| ### CPU-Only Installation | |
| ```bash | |
| # Install PyTorch CPU version | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu | |
| # Install app dependencies | |
| pip install -r requirements.txt | |
| # Configure for CPU | |
| echo "USE_CUDA=false" > .env | |
| # Run app | |
| python app.py | |
| ``` | |
| ## Advanced Configuration | |
| ### Custom Device Settings | |
| You can override device settings in code: | |
| ```python | |
| # Force specific device | |
| from components.transcriber import AudioProcessor | |
| processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16") | |
| ``` | |
| ### Mixed Precision | |
| GPU configurations automatically use optimal precision: | |
| - **CPU:** `int8` quantization for speed | |
| - **GPU:** `float16` for memory efficiency | |
| ### Multiple GPUs | |
| For systems with multiple GPUs: | |
| ```python | |
| # Use specific GPU | |
| import os | |
| os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Use second GPU | |
| ``` | |
| ## Performance Tuning | |
| ### For Maximum Speed (GPU) | |
| ```bash | |
| USE_CUDA=true | |
| ``` | |
| - Use `base.en` or `small.en` Whisper model | |
| - Ensure 4GB+ GPU memory available | |
| - Close other GPU applications | |
| ### For Maximum Compatibility (CPU) | |
| ```bash | |
| USE_CUDA=false | |
| ``` | |
| - Use `tiny.en` Whisper model | |
| - Works on any system | |
| - Lower memory requirements | |
| ### Balanced Performance | |
| ```bash | |
| USE_CUDA=true # with fallback to CPU | |
| ``` | |
| - Use `base.en` Whisper model | |
| - Automatic device detection | |
| - Best of both worlds | |
| ## Support | |
| ### Getting Help | |
| 1. Run diagnostic test: `python test_cuda.py` | |
| 2. Check device info in app startup logs | |
| 3. Verify .env configuration | |
| 4. Test with minimal example | |
| ### Reporting Issues | |
| Include this information: | |
| - Output of `python test_cuda.py` | |
| - Your `.env` file contents | |
| - GPU model and memory | |
| - Error messages from app startup | |
| --- | |
| **Note:** CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance. |