Spaces:
Sleeping
Sleeping
| # Local Testing Guide | |
| ## π§ͺ Testing Your Whisper Transcriber Locally | |
| Before deploying to Hugging Face Spaces, test everything locally. | |
| ## Prerequisites | |
| ### 1. Install FFmpeg | |
| FFmpeg is required for audio/video processing. | |
| **Windows:** | |
| ```bash | |
| # Using Chocolatey | |
| choco install ffmpeg | |
| # Or download from: https://ffmpeg.org/download.html | |
| # Add to PATH manually | |
| ``` | |
| **Mac:** | |
| ```bash | |
| brew install ffmpeg | |
| ``` | |
| **Linux:** | |
| ```bash | |
| sudo apt update | |
| sudo apt install ffmpeg | |
| ``` | |
| Verify installation: | |
| ```bash | |
| ffmpeg -version | |
| ``` | |
| ### 2. Python Environment | |
| Requires Python 3.8+ | |
| ```bash | |
| python --version | |
| ``` | |
| ## π Setup | |
| ### 1. Create Virtual Environment | |
| ```bash | |
| # Create venv | |
| python -m venv venv | |
| # Activate | |
| # Windows: | |
| venv\Scripts\activate | |
| # Mac/Linux: | |
| source venv/bin/activate | |
| ``` | |
| ### 2. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| **Note:** First installation may take 10-15 minutes (PyTorch is large). | |
| ### 3. Set Environment Variable (Optional) | |
| For speaker diarization: | |
| ```bash | |
| # Windows (PowerShell): | |
| $env:HF_TOKEN = "your_token_here" | |
| # Windows (CMD): | |
| set HF_TOKEN=your_token_here | |
| # Mac/Linux: | |
| export HF_TOKEN=your_token_here | |
| ``` | |
| Get your token from: [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) | |
| Accept terms at: [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) | |
| ## π Running the App | |
| ```bash | |
| python app.py | |
| ``` | |
| The app will: | |
| 1. Start Gradio server | |
| 2. Open in browser automatically | |
| 3. Display local URL: `http://127.0.0.1:7860` | |
| 4. Display public share URL (optional) | |
| ## π§ͺ Test Cases | |
| ### Test 1: Basic Audio File | |
| 1. **Prepare**: Find a short MP3/WAV file (1-2 minutes) | |
| 2. **Upload**: Use the file upload widget | |
| 3. **Settings**: | |
| - Model: Small | |
| - Language: Auto | |
| - Diarization: Off | |
| 4. **Expected**: Transcription in all formats within 1-2 minutes | |
| ### Test 2: YouTube URL | |
| 1. **Input**: Paste a short YouTube video URL | |
| 2. **Settings**: Same as Test 1 | |
| 3. **Expected**: Download + transcription complete | |
| ### Test 3: Video File | |
| 1. **Prepare**: Short MP4 video file | |
| 2. **Upload**: Video file | |
| 3. **Expected**: Audio extracted automatically, then transcribed | |
| ### Test 4: Language Selection | |
| 1. **Prepare**: Non-English audio file | |
| 2. **Settings**: | |
| - Model: Small | |
| - Language: Select specific language | |
| 3. **Expected**: Accurate transcription in selected language | |
| ### Test 5: Speaker Diarization | |
| 1. **Prepare**: Audio with 2+ speakers | |
| 2. **Settings**: | |
| - Model: Small | |
| - Diarization: Enabled | |
| - HF_TOKEN must be set | |
| 3. **Expected**: Speakers labeled in output | |
| ### Test 6: Large File (Chunking) | |
| 1. **Prepare**: Audio file >30 minutes | |
| 2. **Upload**: Large file | |
| 3. **Expected**: | |
| - Progress shows chunking | |
| - Multiple chunks processed | |
| - Merged output with correct timestamps | |
| ## π Common Issues & Solutions | |
| ### Issue: ModuleNotFoundError | |
| ``` | |
| ModuleNotFoundError: No module named 'transformers' | |
| ``` | |
| **Solution:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ### Issue: FFmpeg Not Found | |
| ``` | |
| FileNotFoundError: ffmpeg not found | |
| ``` | |
| **Solution:** | |
| - Install FFmpeg (see Prerequisites) | |
| - Verify: `ffmpeg -version` | |
| - Make sure it's in PATH | |
| ### Issue: CUDA/GPU Errors | |
| ``` | |
| RuntimeError: CUDA out of memory | |
| ``` | |
| **Solution:** | |
| The app automatically falls back to CPU. If you see this: | |
| - Use smaller model (tiny/small) | |
| - Restart Python | |
| - The app will use CPU instead | |
| ### Issue: Download Fails (YouTube) | |
| ``` | |
| Failed to download from YouTube | |
| ``` | |
| **Solution:** | |
| - Video might be region-restricted | |
| - Try different video | |
| - Use direct file upload instead | |
| ### Issue: Slow Processing | |
| **Expected Times (CPU):** | |
| - Tiny model: ~0.3x realtime (10min audio = 3min processing) | |
| - Small model: ~0.5-1x realtime | |
| - Medium model: ~1-2x realtime | |
| **Solution:** | |
| - Use smaller model | |
| - Use GPU if available | |
| - Try on HF Space with GPU | |
| ### Issue: Diarization Not Working | |
| ``` | |
| Skipping diarization (HF_TOKEN not set) | |
| ``` | |
| **Solution:** | |
| - Set HF_TOKEN environment variable | |
| - Accept pyannote model terms | |
| - Restart app | |
| ## π Performance Benchmarks | |
| Tested on different hardware: | |
| | Hardware | Model | 10min Audio | GPU Used | | |
| |----------|-------|-------------|----------| | |
| | CPU (8-core) | Tiny | ~2 min | No | | |
| | CPU (8-core) | Small | ~4 min | No | | |
| | CPU (8-core) | Medium | ~8 min | No | | |
| | GPU (RTX 3060) | Small | ~1 min | Yes | | |
| | GPU (RTX 3060) | Medium | ~2 min | Yes | | |
| *Your results may vary* | |
| ## π Debugging | |
| ### Enable Verbose Logging | |
| Modify `app.py`: | |
| ```python | |
| logging.basicConfig(level=logging.DEBUG) # Change from INFO to DEBUG | |
| ``` | |
| ### Check Logs | |
| - Console output shows all processing steps | |
| - Look for ERROR or WARNING messages | |
| - Progress callbacks show current operation | |
| ### Test Individual Components | |
| Test each module separately: | |
| ```python | |
| # Test audio processor | |
| from utils.audio_processor import AudioProcessor | |
| duration = AudioProcessor.get_audio_duration("test.mp3") | |
| print(f"Duration: {duration}s") | |
| # Test transcription | |
| from utils.transcription import WhisperTranscriber | |
| transcriber = WhisperTranscriber(model_size='tiny') | |
| transcriber.load_model() | |
| result = transcriber.transcribe("test.mp3") | |
| print(result['text']) | |
| ``` | |
| ## π Development Tips | |
| ### Fast Iteration | |
| For faster testing during development: | |
| 1. **Use tiny model**: Fastest processing | |
| 2. **Use short files**: 30-60 seconds | |
| 3. **Disable diarization**: Saves time | |
| 4. **Use local files**: Faster than URLs | |
| ### Code Changes | |
| The Gradio app auto-reloads on code changes. Just save and refresh browser. | |
| ### Memory Usage | |
| Monitor memory: | |
| - Small model: ~2GB RAM | |
| - Medium model: ~4GB RAM | |
| - With GPU: +2GB VRAM | |
| ## β Ready for Deployment | |
| Once all tests pass: | |
| 1. β Basic transcription works | |
| 2. β YouTube download works | |
| 3. β All output formats generated | |
| 4. β Progress bars show correctly | |
| 5. β Large files process (chunking works) | |
| 6. β Diarization works (if enabled) | |
| You're ready to deploy to Hugging Face Spaces! π | |
| See `DEPLOYMENT.md` for deployment instructions. | |