Spaces:

xTHExBEASTx
/

Whisper-Transcriber

Sleeping

App Files Files Community

Whisper-Transcriber / LOCAL_TESTING.md

Whisper Transcriber Bot

Initial commit: Complete Whisper Transcriber implementation

4051511 2 months ago

preview code

raw

history blame contribute delete

5.98 kB

	# Local Testing Guide

	## 🧪 Testing Your Whisper Transcriber Locally

	Before deploying to Hugging Face Spaces, test everything locally.

	## Prerequisites

	### 1. Install FFmpeg

	FFmpeg is required for audio/video processing.

	Windows:
	```bash
	# Using Chocolatey
	choco install ffmpeg

	# Or download from: https://ffmpeg.org/download.html
	# Add to PATH manually
	```

	Mac:
	```bash
	brew install ffmpeg
	```

	Linux:
	```bash
	sudo apt update
	sudo apt install ffmpeg
	```

	Verify installation:
	```bash
	ffmpeg -version
	```

	### 2. Python Environment

	Requires Python 3.8+

	```bash
	python --version
	```

	## 🚀 Setup

	### 1. Create Virtual Environment

	```bash
	# Create venv
	python -m venv venv

	# Activate
	# Windows:
	venv\Scripts\activate
	# Mac/Linux:
	source venv/bin/activate
	```

	### 2. Install Dependencies

	```bash
	pip install -r requirements.txt
	```

	Note: First installation may take 10-15 minutes (PyTorch is large).

	### 3. Set Environment Variable (Optional)

	For speaker diarization:

	```bash
	# Windows (PowerShell):
	$env:HF_TOKEN = "your_token_here"

	# Windows (CMD):
	set HF_TOKEN=your_token_here

	# Mac/Linux:
	export HF_TOKEN=your_token_here
	```

	Get your token from: [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)

	Accept terms at: [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1)

	## 🏃 Running the App

	```bash
	python app.py
	```

	The app will:
	1. Start Gradio server
	2. Open in browser automatically
	3. Display local URL: `http://127.0.0.1:7860`
	4. Display public share URL (optional)

	## 🧪 Test Cases

	### Test 1: Basic Audio File

	1. Prepare: Find a short MP3/WAV file (1-2 minutes)
	2. Upload: Use the file upload widget
	3. Settings:
	- Model: Small
	- Language: Auto
	- Diarization: Off
	4. Expected: Transcription in all formats within 1-2 minutes

	### Test 2: YouTube URL

	1. Input: Paste a short YouTube video URL
	2. Settings: Same as Test 1
	3. Expected: Download + transcription complete

	### Test 3: Video File

	1. Prepare: Short MP4 video file
	2. Upload: Video file
	3. Expected: Audio extracted automatically, then transcribed

	### Test 4: Language Selection

	1. Prepare: Non-English audio file
	2. Settings:
	- Model: Small
	- Language: Select specific language
	3. Expected: Accurate transcription in selected language

	### Test 5: Speaker Diarization

	1. Prepare: Audio with 2+ speakers
	2. Settings:
	- Model: Small
	- Diarization: Enabled
	- HF_TOKEN must be set
	3. Expected: Speakers labeled in output

	### Test 6: Large File (Chunking)

	1. Prepare: Audio file >30 minutes
	2. Upload: Large file
	3. Expected:
	- Progress shows chunking
	- Multiple chunks processed
	- Merged output with correct timestamps

	## 🐛 Common Issues & Solutions

	### Issue: ModuleNotFoundError

	```
	ModuleNotFoundError: No module named 'transformers'
	```

	Solution:
	```bash
	pip install -r requirements.txt
	```

	### Issue: FFmpeg Not Found

	```
	FileNotFoundError: ffmpeg not found
	```

	Solution:
	- Install FFmpeg (see Prerequisites)
	- Verify: `ffmpeg -version`
	- Make sure it's in PATH

	### Issue: CUDA/GPU Errors

	```
	RuntimeError: CUDA out of memory
	```

	Solution:
	The app automatically falls back to CPU. If you see this:
	- Use smaller model (tiny/small)
	- Restart Python
	- The app will use CPU instead

	### Issue: Download Fails (YouTube)

	```
	Failed to download from YouTube
	```

	Solution:
	- Video might be region-restricted
	- Try different video
	- Use direct file upload instead

	### Issue: Slow Processing

	Expected Times (CPU):
	- Tiny model: ~0.3x realtime (10min audio = 3min processing)
	- Small model: ~0.5-1x realtime
	- Medium model: ~1-2x realtime

	Solution:
	- Use smaller model
	- Use GPU if available
	- Try on HF Space with GPU

	### Issue: Diarization Not Working

	```
	Skipping diarization (HF_TOKEN not set)
	```

	Solution:
	- Set HF_TOKEN environment variable
	- Accept pyannote model terms
	- Restart app

	## 📊 Performance Benchmarks

	Tested on different hardware:

	\| Hardware \| Model \| 10min Audio \| GPU Used \|
	\|----------\|-------\|-------------\|----------\|
	\| CPU (8-core) \| Tiny \| ~2 min \| No \|
	\| CPU (8-core) \| Small \| ~4 min \| No \|
	\| CPU (8-core) \| Medium \| ~8 min \| No \|
	\| GPU (RTX 3060) \| Small \| ~1 min \| Yes \|
	\| GPU (RTX 3060) \| Medium \| ~2 min \| Yes \|

	Your results may vary

	## 🔍 Debugging

	### Enable Verbose Logging

	Modify `app.py`:

	```python
	logging.basicConfig(level=logging.DEBUG) # Change from INFO to DEBUG
	```

	### Check Logs

	- Console output shows all processing steps
	- Look for ERROR or WARNING messages
	- Progress callbacks show current operation

	### Test Individual Components

	Test each module separately:

	```python
	# Test audio processor
	from utils.audio_processor import AudioProcessor
	duration = AudioProcessor.get_audio_duration("test.mp3")
	print(f"Duration: {duration}s")

	# Test transcription
	from utils.transcription import WhisperTranscriber
	transcriber = WhisperTranscriber(model_size='tiny')
	transcriber.load_model()
	result = transcriber.transcribe("test.mp3")
	print(result['text'])
	```

	## 📝 Development Tips

	### Fast Iteration

	For faster testing during development:

	1. Use tiny model: Fastest processing
	2. Use short files: 30-60 seconds
	3. Disable diarization: Saves time
	4. Use local files: Faster than URLs

	### Code Changes

	The Gradio app auto-reloads on code changes. Just save and refresh browser.

	### Memory Usage

	Monitor memory:
	- Small model: ~2GB RAM
	- Medium model: ~4GB RAM
	- With GPU: +2GB VRAM

	## ✅ Ready for Deployment

	Once all tests pass:

	1. ✅ Basic transcription works
	2. ✅ YouTube download works
	3. ✅ All output formats generated
	4. ✅ Progress bars show correctly
	5. ✅ Large files process (chunking works)
	6. ✅ Diarization works (if enabled)

	You're ready to deploy to Hugging Face Spaces! 🚀

	See `DEPLOYMENT.md` for deployment instructions.