Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.8.0
Local Testing Guide
π§ͺ Testing Your Whisper Transcriber Locally
Before deploying to Hugging Face Spaces, test everything locally.
Prerequisites
1. Install FFmpeg
FFmpeg is required for audio/video processing.
Windows:
# Using Chocolatey
choco install ffmpeg
# Or download from: https://ffmpeg.org/download.html
# Add to PATH manually
Mac:
brew install ffmpeg
Linux:
sudo apt update
sudo apt install ffmpeg
Verify installation:
ffmpeg -version
2. Python Environment
Requires Python 3.8+
python --version
π Setup
1. Create Virtual Environment
# Create venv
python -m venv venv
# Activate
# Windows:
venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate
2. Install Dependencies
pip install -r requirements.txt
Note: First installation may take 10-15 minutes (PyTorch is large).
3. Set Environment Variable (Optional)
For speaker diarization:
# Windows (PowerShell):
$env:HF_TOKEN = "your_token_here"
# Windows (CMD):
set HF_TOKEN=your_token_here
# Mac/Linux:
export HF_TOKEN=your_token_here
Get your token from: huggingface.co/settings/tokens
Accept terms at: pyannote/speaker-diarization-3.1
π Running the App
python app.py
The app will:
- Start Gradio server
- Open in browser automatically
- Display local URL:
http://127.0.0.1:7860 - Display public share URL (optional)
π§ͺ Test Cases
Test 1: Basic Audio File
- Prepare: Find a short MP3/WAV file (1-2 minutes)
- Upload: Use the file upload widget
- Settings:
- Model: Small
- Language: Auto
- Diarization: Off
- Expected: Transcription in all formats within 1-2 minutes
Test 2: YouTube URL
- Input: Paste a short YouTube video URL
- Settings: Same as Test 1
- Expected: Download + transcription complete
Test 3: Video File
- Prepare: Short MP4 video file
- Upload: Video file
- Expected: Audio extracted automatically, then transcribed
Test 4: Language Selection
- Prepare: Non-English audio file
- Settings:
- Model: Small
- Language: Select specific language
- Expected: Accurate transcription in selected language
Test 5: Speaker Diarization
- Prepare: Audio with 2+ speakers
- Settings:
- Model: Small
- Diarization: Enabled
- HF_TOKEN must be set
- Expected: Speakers labeled in output
Test 6: Large File (Chunking)
- Prepare: Audio file >30 minutes
- Upload: Large file
- Expected:
- Progress shows chunking
- Multiple chunks processed
- Merged output with correct timestamps
π Common Issues & Solutions
Issue: ModuleNotFoundError
ModuleNotFoundError: No module named 'transformers'
Solution:
pip install -r requirements.txt
Issue: FFmpeg Not Found
FileNotFoundError: ffmpeg not found
Solution:
- Install FFmpeg (see Prerequisites)
- Verify:
ffmpeg -version - Make sure it's in PATH
Issue: CUDA/GPU Errors
RuntimeError: CUDA out of memory
Solution: The app automatically falls back to CPU. If you see this:
- Use smaller model (tiny/small)
- Restart Python
- The app will use CPU instead
Issue: Download Fails (YouTube)
Failed to download from YouTube
Solution:
- Video might be region-restricted
- Try different video
- Use direct file upload instead
Issue: Slow Processing
Expected Times (CPU):
- Tiny model: ~0.3x realtime (10min audio = 3min processing)
- Small model: ~0.5-1x realtime
- Medium model: ~1-2x realtime
Solution:
- Use smaller model
- Use GPU if available
- Try on HF Space with GPU
Issue: Diarization Not Working
Skipping diarization (HF_TOKEN not set)
Solution:
- Set HF_TOKEN environment variable
- Accept pyannote model terms
- Restart app
π Performance Benchmarks
Tested on different hardware:
| Hardware | Model | 10min Audio | GPU Used |
|---|---|---|---|
| CPU (8-core) | Tiny | ~2 min | No |
| CPU (8-core) | Small | ~4 min | No |
| CPU (8-core) | Medium | ~8 min | No |
| GPU (RTX 3060) | Small | ~1 min | Yes |
| GPU (RTX 3060) | Medium | ~2 min | Yes |
Your results may vary
π Debugging
Enable Verbose Logging
Modify app.py:
logging.basicConfig(level=logging.DEBUG) # Change from INFO to DEBUG
Check Logs
- Console output shows all processing steps
- Look for ERROR or WARNING messages
- Progress callbacks show current operation
Test Individual Components
Test each module separately:
# Test audio processor
from utils.audio_processor import AudioProcessor
duration = AudioProcessor.get_audio_duration("test.mp3")
print(f"Duration: {duration}s")
# Test transcription
from utils.transcription import WhisperTranscriber
transcriber = WhisperTranscriber(model_size='tiny')
transcriber.load_model()
result = transcriber.transcribe("test.mp3")
print(result['text'])
π Development Tips
Fast Iteration
For faster testing during development:
- Use tiny model: Fastest processing
- Use short files: 30-60 seconds
- Disable diarization: Saves time
- Use local files: Faster than URLs
Code Changes
The Gradio app auto-reloads on code changes. Just save and refresh browser.
Memory Usage
Monitor memory:
- Small model: ~2GB RAM
- Medium model: ~4GB RAM
- With GPU: +2GB VRAM
β Ready for Deployment
Once all tests pass:
- β Basic transcription works
- β YouTube download works
- β All output formats generated
- β Progress bars show correctly
- β Large files process (chunking works)
- β Diarization works (if enabled)
You're ready to deploy to Hugging Face Spaces! π
See DEPLOYMENT.md for deployment instructions.