# Local Testing Guide ## ๐Ÿงช Testing Your Whisper Transcriber Locally Before deploying to Hugging Face Spaces, test everything locally. ## Prerequisites ### 1. Install FFmpeg FFmpeg is required for audio/video processing. **Windows:** ```bash # Using Chocolatey choco install ffmpeg # Or download from: https://ffmpeg.org/download.html # Add to PATH manually ``` **Mac:** ```bash brew install ffmpeg ``` **Linux:** ```bash sudo apt update sudo apt install ffmpeg ``` Verify installation: ```bash ffmpeg -version ``` ### 2. Python Environment Requires Python 3.8+ ```bash python --version ``` ## ๐Ÿš€ Setup ### 1. Create Virtual Environment ```bash # Create venv python -m venv venv # Activate # Windows: venv\Scripts\activate # Mac/Linux: source venv/bin/activate ``` ### 2. Install Dependencies ```bash pip install -r requirements.txt ``` **Note:** First installation may take 10-15 minutes (PyTorch is large). ### 3. Set Environment Variable (Optional) For speaker diarization: ```bash # Windows (PowerShell): $env:HF_TOKEN = "your_token_here" # Windows (CMD): set HF_TOKEN=your_token_here # Mac/Linux: export HF_TOKEN=your_token_here ``` Get your token from: [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) Accept terms at: [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) ## ๐Ÿƒ Running the App ```bash python app.py ``` The app will: 1. Start Gradio server 2. Open in browser automatically 3. Display local URL: `http://127.0.0.1:7860` 4. Display public share URL (optional) ## ๐Ÿงช Test Cases ### Test 1: Basic Audio File 1. **Prepare**: Find a short MP3/WAV file (1-2 minutes) 2. **Upload**: Use the file upload widget 3. **Settings**: - Model: Small - Language: Auto - Diarization: Off 4. **Expected**: Transcription in all formats within 1-2 minutes ### Test 2: YouTube URL 1. **Input**: Paste a short YouTube video URL 2. **Settings**: Same as Test 1 3. **Expected**: Download + transcription complete ### Test 3: Video File 1. **Prepare**: Short MP4 video file 2. **Upload**: Video file 3. **Expected**: Audio extracted automatically, then transcribed ### Test 4: Language Selection 1. **Prepare**: Non-English audio file 2. **Settings**: - Model: Small - Language: Select specific language 3. **Expected**: Accurate transcription in selected language ### Test 5: Speaker Diarization 1. **Prepare**: Audio with 2+ speakers 2. **Settings**: - Model: Small - Diarization: Enabled - HF_TOKEN must be set 3. **Expected**: Speakers labeled in output ### Test 6: Large File (Chunking) 1. **Prepare**: Audio file >30 minutes 2. **Upload**: Large file 3. **Expected**: - Progress shows chunking - Multiple chunks processed - Merged output with correct timestamps ## ๐Ÿ› Common Issues & Solutions ### Issue: ModuleNotFoundError ``` ModuleNotFoundError: No module named 'transformers' ``` **Solution:** ```bash pip install -r requirements.txt ``` ### Issue: FFmpeg Not Found ``` FileNotFoundError: ffmpeg not found ``` **Solution:** - Install FFmpeg (see Prerequisites) - Verify: `ffmpeg -version` - Make sure it's in PATH ### Issue: CUDA/GPU Errors ``` RuntimeError: CUDA out of memory ``` **Solution:** The app automatically falls back to CPU. If you see this: - Use smaller model (tiny/small) - Restart Python - The app will use CPU instead ### Issue: Download Fails (YouTube) ``` Failed to download from YouTube ``` **Solution:** - Video might be region-restricted - Try different video - Use direct file upload instead ### Issue: Slow Processing **Expected Times (CPU):** - Tiny model: ~0.3x realtime (10min audio = 3min processing) - Small model: ~0.5-1x realtime - Medium model: ~1-2x realtime **Solution:** - Use smaller model - Use GPU if available - Try on HF Space with GPU ### Issue: Diarization Not Working ``` Skipping diarization (HF_TOKEN not set) ``` **Solution:** - Set HF_TOKEN environment variable - Accept pyannote model terms - Restart app ## ๐Ÿ“Š Performance Benchmarks Tested on different hardware: | Hardware | Model | 10min Audio | GPU Used | |----------|-------|-------------|----------| | CPU (8-core) | Tiny | ~2 min | No | | CPU (8-core) | Small | ~4 min | No | | CPU (8-core) | Medium | ~8 min | No | | GPU (RTX 3060) | Small | ~1 min | Yes | | GPU (RTX 3060) | Medium | ~2 min | Yes | *Your results may vary* ## ๐Ÿ” Debugging ### Enable Verbose Logging Modify `app.py`: ```python logging.basicConfig(level=logging.DEBUG) # Change from INFO to DEBUG ``` ### Check Logs - Console output shows all processing steps - Look for ERROR or WARNING messages - Progress callbacks show current operation ### Test Individual Components Test each module separately: ```python # Test audio processor from utils.audio_processor import AudioProcessor duration = AudioProcessor.get_audio_duration("test.mp3") print(f"Duration: {duration}s") # Test transcription from utils.transcription import WhisperTranscriber transcriber = WhisperTranscriber(model_size='tiny') transcriber.load_model() result = transcriber.transcribe("test.mp3") print(result['text']) ``` ## ๐Ÿ“ Development Tips ### Fast Iteration For faster testing during development: 1. **Use tiny model**: Fastest processing 2. **Use short files**: 30-60 seconds 3. **Disable diarization**: Saves time 4. **Use local files**: Faster than URLs ### Code Changes The Gradio app auto-reloads on code changes. Just save and refresh browser. ### Memory Usage Monitor memory: - Small model: ~2GB RAM - Medium model: ~4GB RAM - With GPU: +2GB VRAM ## โœ… Ready for Deployment Once all tests pass: 1. โœ… Basic transcription works 2. โœ… YouTube download works 3. โœ… All output formats generated 4. โœ… Progress bars show correctly 5. โœ… Large files process (chunking works) 6. โœ… Diarization works (if enabled) You're ready to deploy to Hugging Face Spaces! ๐Ÿš€ See `DEPLOYMENT.md` for deployment instructions.