Spaces:
Sleeping
Sleeping
File size: 5,918 Bytes
7b7db64 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
# CUDA Configuration Guide
This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.
## Overview
The app supports both CPU and GPU processing for all AI models:
- **Whisper** (speech-to-text)
- **RoBERTa** (question classification)
- **Sentence Boundary Detection**
GPU acceleration can provide **2-10x faster processing** for real-time transcription.
## Quick Setup
### 1. Check CUDA Availability
```bash
python test_cuda.py
```
### 2. Configure Device
Create a `.env` file:
```bash
cp .env.example .env
```
Edit `.env`:
```bash
# For GPU acceleration
USE_CUDA=true
# For CPU processing (default)
USE_CUDA=false
```
### 3. Run the App
```bash
python app.py
```
## Detailed Configuration
### Environment Variables
| Variable | Values | Description |
|----------|--------|-------------|
| `USE_CUDA` | `true`/`false` | Enable/disable GPU acceleration |
### Device Selection Logic
```
1. If USE_CUDA=true AND CUDA available β Use GPU
2. If USE_CUDA=true AND CUDA not available β Fallback to CPU (with warning)
3. If USE_CUDA=false β Use CPU
4. If no .env file β Default to CPU
```
### Model Configurations
| Device | Whisper | RoBERTa | Compute Type |
|--------|---------|---------|--------------|
| **CPU** | `device="cpu"` | `device=-1` | `int8` |
| **GPU** | `device="cuda"` | `device=0` | `float16` |
## CUDA Requirements
### System Requirements
- NVIDIA GPU with CUDA Compute Capability 3.5+
- CUDA Toolkit 11.8+ or 12.x
- cuDNN 8.x
- 4GB+ GPU memory recommended
### Python Dependencies
```bash
# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Then install other requirements
pip install -r requirements.txt
```
## Performance Comparison
### Typical Speedups with GPU
| Model | CPU Time | GPU Time | Speedup |
|-------|----------|----------|---------|
| Whisper (base) | ~2-5s | ~0.5-1s | 3-5x |
| RoBERTa | ~100ms | ~20ms | 5x |
| Overall | Real-time lag | Near instant | 3-8x |
### Memory Usage
| Configuration | RAM | GPU Memory |
|---------------|-----|------------|
| CPU Only | 2-4GB | 0GB |
| GPU Accelerated | 1-2GB | 2-6GB |
## Troubleshooting
### Common Issues
#### 1. "CUDA requested but not available"
```
β οΈ Warning: CUDA requested but not available, falling back to CPU
```
**Solution:** Install CUDA toolkit and PyTorch with CUDA support
#### 2. "Out of memory" errors
**Solutions:**
- Reduce model size (e.g., `tiny.en` β `base.en`)
- Set `USE_CUDA=false` to use CPU
- Close other GPU applications
#### 3. Models not loading on GPU
**Check:**
```python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
```
### Testing Your Setup
Run the comprehensive test:
```bash
python test_cuda.py
```
This will test:
- β
PyTorch CUDA detection
- β
Transformers device support
- β
Whisper model loading
- β
GPU memory availability
- β
Performance benchmark
### Debug Mode
For detailed device information, check the app startup:
```
π§ Configuration:
Device: CUDA
Compute type: float16
CUDA available: True
GPU: NVIDIA GeForce RTX 3080
GPU Memory: 10.0 GB
```
## Installation Examples
### Ubuntu/Linux with CUDA
```bash
# Install CUDA toolkit
sudo apt update
sudo apt install nvidia-cuda-toolkit
# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install app dependencies
pip install -r requirements.txt
# Configure for GPU
echo "USE_CUDA=true" > .env
# Test setup
python test_cuda.py
# Run app
python app.py
```
### Windows with CUDA
```bash
# Install CUDA toolkit from NVIDIA website
# https://developer.nvidia.com/cuda-downloads
# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install app dependencies
pip install -r requirements.txt
# Configure for GPU
echo USE_CUDA=true > .env
# Test setup
python test_cuda.py
# Run app
python app.py
```
### CPU-Only Installation
```bash
# Install PyTorch CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# Install app dependencies
pip install -r requirements.txt
# Configure for CPU
echo "USE_CUDA=false" > .env
# Run app
python app.py
```
## Advanced Configuration
### Custom Device Settings
You can override device settings in code:
```python
# Force specific device
from components.transcriber import AudioProcessor
processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")
```
### Mixed Precision
GPU configurations automatically use optimal precision:
- **CPU:** `int8` quantization for speed
- **GPU:** `float16` for memory efficiency
### Multiple GPUs
For systems with multiple GPUs:
```python
# Use specific GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Use second GPU
```
## Performance Tuning
### For Maximum Speed (GPU)
```bash
USE_CUDA=true
```
- Use `base.en` or `small.en` Whisper model
- Ensure 4GB+ GPU memory available
- Close other GPU applications
### For Maximum Compatibility (CPU)
```bash
USE_CUDA=false
```
- Use `tiny.en` Whisper model
- Works on any system
- Lower memory requirements
### Balanced Performance
```bash
USE_CUDA=true # with fallback to CPU
```
- Use `base.en` Whisper model
- Automatic device detection
- Best of both worlds
## Support
### Getting Help
1. Run diagnostic test: `python test_cuda.py`
2. Check device info in app startup logs
3. Verify .env configuration
4. Test with minimal example
### Reporting Issues
Include this information:
- Output of `python test_cuda.py`
- Your `.env` file contents
- GPU model and memory
- Error messages from app startup
---
**Note:** CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance. |