File size: 5,918 Bytes
7b7db64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
# CUDA Configuration Guide

This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.

## Overview

The app supports both CPU and GPU processing for all AI models:
- **Whisper** (speech-to-text)
- **RoBERTa** (question classification) 
- **Sentence Boundary Detection**

GPU acceleration can provide **2-10x faster processing** for real-time transcription.

## Quick Setup

### 1. Check CUDA Availability
```bash
python test_cuda.py
```

### 2. Configure Device
Create a `.env` file:
```bash
cp .env.example .env
```

Edit `.env`:
```bash
# For GPU acceleration
USE_CUDA=true

# For CPU processing (default)
USE_CUDA=false
```

### 3. Run the App
```bash
python app.py
```

## Detailed Configuration

### Environment Variables

| Variable | Values | Description |
|----------|--------|-------------|
| `USE_CUDA` | `true`/`false` | Enable/disable GPU acceleration |

### Device Selection Logic

```
1. If USE_CUDA=true AND CUDA available β†’ Use GPU
2. If USE_CUDA=true AND CUDA not available β†’ Fallback to CPU (with warning)
3. If USE_CUDA=false β†’ Use CPU
4. If no .env file β†’ Default to CPU
```

### Model Configurations

| Device | Whisper | RoBERTa | Compute Type |
|--------|---------|---------|--------------|
| **CPU** | `device="cpu"` | `device=-1` | `int8` |
| **GPU** | `device="cuda"` | `device=0` | `float16` |

## CUDA Requirements

### System Requirements
- NVIDIA GPU with CUDA Compute Capability 3.5+
- CUDA Toolkit 11.8+ or 12.x
- cuDNN 8.x
- 4GB+ GPU memory recommended

### Python Dependencies
```bash
# Install PyTorch with CUDA support first
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Then install other requirements
pip install -r requirements.txt
```

## Performance Comparison

### Typical Speedups with GPU

| Model | CPU Time | GPU Time | Speedup |
|-------|----------|----------|---------|
| Whisper (base) | ~2-5s | ~0.5-1s | 3-5x |
| RoBERTa | ~100ms | ~20ms | 5x |
| Overall | Real-time lag | Near instant | 3-8x |

### Memory Usage

| Configuration | RAM | GPU Memory |
|---------------|-----|------------|
| CPU Only | 2-4GB | 0GB |
| GPU Accelerated | 1-2GB | 2-6GB |

## Troubleshooting

### Common Issues

#### 1. "CUDA requested but not available"
```
⚠️ Warning: CUDA requested but not available, falling back to CPU
```
**Solution:** Install CUDA toolkit and PyTorch with CUDA support

#### 2. "Out of memory" errors
**Solutions:**
- Reduce model size (e.g., `tiny.en` β†’ `base.en`)
- Set `USE_CUDA=false` to use CPU
- Close other GPU applications

#### 3. Models not loading on GPU
**Check:**
```python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
```

### Testing Your Setup

Run the comprehensive test:
```bash
python test_cuda.py
```

This will test:
- βœ… PyTorch CUDA detection
- βœ… Transformers device support  
- βœ… Whisper model loading
- βœ… GPU memory availability
- βœ… Performance benchmark

### Debug Mode

For detailed device information, check the app startup:
```
πŸ”§ Configuration:
   Device: CUDA
   Compute type: float16
   CUDA available: True
   GPU: NVIDIA GeForce RTX 3080
   GPU Memory: 10.0 GB
```

## Installation Examples

### Ubuntu/Linux with CUDA
```bash
# Install CUDA toolkit
sudo apt update
sudo apt install nvidia-cuda-toolkit

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo "USE_CUDA=true" > .env

# Test setup
python test_cuda.py

# Run app
python app.py
```

### Windows with CUDA
```bash
# Install CUDA toolkit from NVIDIA website
# https://developer.nvidia.com/cuda-downloads

# Install PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Install app dependencies
pip install -r requirements.txt

# Configure for GPU
echo USE_CUDA=true > .env

# Test setup
python test_cuda.py

# Run app
python app.py
```

### CPU-Only Installation
```bash
# Install PyTorch CPU version
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

# Install app dependencies
pip install -r requirements.txt

# Configure for CPU
echo "USE_CUDA=false" > .env

# Run app
python app.py
```

## Advanced Configuration

### Custom Device Settings

You can override device settings in code:
```python
# Force specific device
from components.transcriber import AudioProcessor
processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")
```

### Mixed Precision

GPU configurations automatically use optimal precision:
- **CPU:** `int8` quantization for speed
- **GPU:** `float16` for memory efficiency

### Multiple GPUs

For systems with multiple GPUs:
```python
# Use specific GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"  # Use second GPU
```

## Performance Tuning

### For Maximum Speed (GPU)
```bash
USE_CUDA=true
```
- Use `base.en` or `small.en` Whisper model
- Ensure 4GB+ GPU memory available
- Close other GPU applications

### For Maximum Compatibility (CPU)
```bash
USE_CUDA=false
```
- Use `tiny.en` Whisper model
- Works on any system
- Lower memory requirements

### Balanced Performance
```bash
USE_CUDA=true  # with fallback to CPU
```
- Use `base.en` Whisper model
- Automatic device detection
- Best of both worlds

## Support

### Getting Help

1. Run diagnostic test: `python test_cuda.py`
2. Check device info in app startup logs
3. Verify .env configuration
4. Test with minimal example

### Reporting Issues

Include this information:
- Output of `python test_cuda.py`
- Your `.env` file contents
- GPU model and memory
- Error messages from app startup

---

**Note:** CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.