Spaces:

Scrapyard-Brampton
/

Testing

Sleeping

App Files Files Community

Testing / CUDA_SETUP.md

Sidak Singh

question boundary works

7b7db64 6 months ago

preview code

raw

history blame contribute delete

5.92 kB

	# CUDA Configuration Guide

	This guide explains how to configure the Speech Transcription App to use GPU acceleration with CUDA.

	## Overview

	The app supports both CPU and GPU processing for all AI models:
	- Whisper (speech-to-text)
	- RoBERTa (question classification)
	- Sentence Boundary Detection

	GPU acceleration can provide 2-10x faster processing for real-time transcription.

	## Quick Setup

	### 1. Check CUDA Availability
	```bash
	python test_cuda.py
	```

	### 2. Configure Device
	Create a `.env` file:
	```bash
	cp .env.example .env
	```

	Edit `.env`:
	```bash
	# For GPU acceleration
	USE_CUDA=true

	# For CPU processing (default)
	USE_CUDA=false
	```

	### 3. Run the App
	```bash
	python app.py
	```

	## Detailed Configuration

	### Environment Variables

	\| Variable \| Values \| Description \|
	\|----------\|--------\|-------------\|
	\| `USE_CUDA` \| `true`/`false` \| Enable/disable GPU acceleration \|

	### Device Selection Logic

	```
	1. If USE_CUDA=true AND CUDA available → Use GPU
	2. If USE_CUDA=true AND CUDA not available → Fallback to CPU (with warning)
	3. If USE_CUDA=false → Use CPU
	4. If no .env file → Default to CPU
	```

	### Model Configurations

	\| Device \| Whisper \| RoBERTa \| Compute Type \|
	\|--------\|---------\|---------\|--------------\|
	\| CPU \| `device="cpu"` \| `device=-1` \| `int8` \|
	\| GPU \| `device="cuda"` \| `device=0` \| `float16` \|

	## CUDA Requirements

	### System Requirements
	- NVIDIA GPU with CUDA Compute Capability 3.5+
	- CUDA Toolkit 11.8+ or 12.x
	- cuDNN 8.x
	- 4GB+ GPU memory recommended

	### Python Dependencies
	```bash
	# Install PyTorch with CUDA support first
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

	# Then install other requirements
	pip install -r requirements.txt
	```

	## Performance Comparison

	### Typical Speedups with GPU

	\| Model \| CPU Time \| GPU Time \| Speedup \|
	\|-------\|----------\|----------\|---------\|
	\| Whisper (base) \| ~2-5s \| ~0.5-1s \| 3-5x \|
	\| RoBERTa \| ~100ms \| ~20ms \| 5x \|
	\| Overall \| Real-time lag \| Near instant \| 3-8x \|

	### Memory Usage

	\| Configuration \| RAM \| GPU Memory \|
	\|---------------\|-----\|------------\|
	\| CPU Only \| 2-4GB \| 0GB \|
	\| GPU Accelerated \| 1-2GB \| 2-6GB \|

	## Troubleshooting

	### Common Issues

	#### 1. "CUDA requested but not available"
	```
	⚠️ Warning: CUDA requested but not available, falling back to CPU
	```
	Solution: Install CUDA toolkit and PyTorch with CUDA support

	#### 2. "Out of memory" errors
	Solutions:
	- Reduce model size (e.g., `tiny.en` → `base.en`)
	- Set `USE_CUDA=false` to use CPU
	- Close other GPU applications

	#### 3. Models not loading on GPU
	Check:
	```python
	import torch
	print(f"CUDA available: {torch.cuda.is_available()}")
	print(f"CUDA version: {torch.version.cuda}")
	```

	### Testing Your Setup

	Run the comprehensive test:
	```bash
	python test_cuda.py
	```

	This will test:
	- ✅ PyTorch CUDA detection
	- ✅ Transformers device support
	- ✅ Whisper model loading
	- ✅ GPU memory availability
	- ✅ Performance benchmark

	### Debug Mode

	For detailed device information, check the app startup:
	```
	🔧 Configuration:
	Device: CUDA
	Compute type: float16
	CUDA available: True
	GPU: NVIDIA GeForce RTX 3080
	GPU Memory: 10.0 GB
	```

	## Installation Examples

	### Ubuntu/Linux with CUDA
	```bash
	# Install CUDA toolkit
	sudo apt update
	sudo apt install nvidia-cuda-toolkit

	# Install PyTorch with CUDA
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

	# Install app dependencies
	pip install -r requirements.txt

	# Configure for GPU
	echo "USE_CUDA=true" > .env

	# Test setup
	python test_cuda.py

	# Run app
	python app.py
	```

	### Windows with CUDA
	```bash
	# Install CUDA toolkit from NVIDIA website
	# https://developer.nvidia.com/cuda-downloads

	# Install PyTorch with CUDA
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

	# Install app dependencies
	pip install -r requirements.txt

	# Configure for GPU
	echo USE_CUDA=true > .env

	# Test setup
	python test_cuda.py

	# Run app
	python app.py
	```

	### CPU-Only Installation
	```bash
	# Install PyTorch CPU version
	pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

	# Install app dependencies
	pip install -r requirements.txt

	# Configure for CPU
	echo "USE_CUDA=false" > .env

	# Run app
	python app.py
	```

	## Advanced Configuration

	### Custom Device Settings

	You can override device settings in code:
	```python
	# Force specific device
	from components.transcriber import AudioProcessor
	processor = AudioProcessor(model_size="base.en", device="cuda", compute_type="float16")
	```

	### Mixed Precision

	GPU configurations automatically use optimal precision:
	- CPU: `int8` quantization for speed
	- GPU: `float16` for memory efficiency

	### Multiple GPUs

	For systems with multiple GPUs:
	```python
	# Use specific GPU
	import os
	os.environ["CUDA_VISIBLE_DEVICES"] = "1" # Use second GPU
	```

	## Performance Tuning

	### For Maximum Speed (GPU)
	```bash
	USE_CUDA=true
	```
	- Use `base.en` or `small.en` Whisper model
	- Ensure 4GB+ GPU memory available
	- Close other GPU applications

	### For Maximum Compatibility (CPU)
	```bash
	USE_CUDA=false
	```
	- Use `tiny.en` Whisper model
	- Works on any system
	- Lower memory requirements

	### Balanced Performance
	```bash
	USE_CUDA=true # with fallback to CPU
	```
	- Use `base.en` Whisper model
	- Automatic device detection
	- Best of both worlds

	## Support

	### Getting Help

	1. Run diagnostic test: `python test_cuda.py`
	2. Check device info in app startup logs
	3. Verify .env configuration
	4. Test with minimal example

	### Reporting Issues

	Include this information:
	- Output of `python test_cuda.py`
	- Your `.env` file contents
	- GPU model and memory
	- Error messages from app startup

	---

	Note: CPU processing works perfectly for most use cases. GPU acceleration is optional for enhanced performance.