Spaces:

xTHExBEASTx
/

Whisper-Transcriber

Sleeping

App Files Files Community

Whisper-Transcriber / LOCAL_TESTING.md

Whisper Transcriber Bot

Initial commit: Complete Whisper Transcriber implementation

4051511 2 months ago

preview code

raw

history blame contribute delete

5.98 kB

A newer version of the Gradio SDK is available: 6.8.0

Upgrade

Local Testing Guide

🧪 Testing Your Whisper Transcriber Locally

Before deploying to Hugging Face Spaces, test everything locally.

Prerequisites

1. Install FFmpeg

FFmpeg is required for audio/video processing.

Windows:

# Using Chocolatey
choco install ffmpeg

# Or download from: https://ffmpeg.org/download.html
# Add to PATH manually

Mac:

brew install ffmpeg

Linux:

sudo apt update
sudo apt install ffmpeg

Verify installation:

ffmpeg -version

2. Python Environment

Requires Python 3.8+

python --version

🚀 Setup

1. Create Virtual Environment

# Create venv
python -m venv venv

# Activate
# Windows:
venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate

2. Install Dependencies

pip install -r requirements.txt

Note: First installation may take 10-15 minutes (PyTorch is large).

3. Set Environment Variable (Optional)

For speaker diarization:

# Windows (PowerShell):
$env:HF_TOKEN = "your_token_here"

# Windows (CMD):
set HF_TOKEN=your_token_here

# Mac/Linux:
export HF_TOKEN=your_token_here

Get your token from: huggingface.co/settings/tokens

Accept terms at: pyannote/speaker-diarization-3.1

🏃 Running the App

python app.py

The app will:

Start Gradio server
Open in browser automatically
Display local URL: http://127.0.0.1:7860
Display public share URL (optional)

🧪 Test Cases

Test 1: Basic Audio File

Prepare: Find a short MP3/WAV file (1-2 minutes)
Upload: Use the file upload widget
Settings:
- Model: Small
- Language: Auto
- Diarization: Off
Expected: Transcription in all formats within 1-2 minutes

Test 2: YouTube URL

Input: Paste a short YouTube video URL
Settings: Same as Test 1
Expected: Download + transcription complete

Test 3: Video File

Prepare: Short MP4 video file
Upload: Video file
Expected: Audio extracted automatically, then transcribed

Test 4: Language Selection

Prepare: Non-English audio file
Settings:
- Model: Small
- Language: Select specific language
Expected: Accurate transcription in selected language

Test 5: Speaker Diarization

Prepare: Audio with 2+ speakers
Settings:
- Model: Small
- Diarization: Enabled
- HF_TOKEN must be set
Expected: Speakers labeled in output

Test 6: Large File (Chunking)

Prepare: Audio file >30 minutes
Upload: Large file
Expected:
- Progress shows chunking
- Multiple chunks processed
- Merged output with correct timestamps

🐛 Common Issues & Solutions

Issue: ModuleNotFoundError

ModuleNotFoundError: No module named 'transformers'

Solution:

pip install -r requirements.txt

Issue: FFmpeg Not Found

FileNotFoundError: ffmpeg not found

Solution:

Install FFmpeg (see Prerequisites)
Verify: ffmpeg -version
Make sure it's in PATH

Issue: CUDA/GPU Errors

RuntimeError: CUDA out of memory

Solution: The app automatically falls back to CPU. If you see this:

Use smaller model (tiny/small)
Restart Python
The app will use CPU instead

Issue: Download Fails (YouTube)

Failed to download from YouTube

Solution:

Video might be region-restricted
Try different video
Use direct file upload instead

Issue: Slow Processing

Expected Times (CPU):

Tiny model: ~0.3x realtime (10min audio = 3min processing)
Small model: ~0.5-1x realtime
Medium model: ~1-2x realtime

Solution:

Use smaller model
Use GPU if available
Try on HF Space with GPU

Issue: Diarization Not Working

Skipping diarization (HF_TOKEN not set)

Solution:

Set HF_TOKEN environment variable
Accept pyannote model terms
Restart app

📊 Performance Benchmarks

Tested on different hardware:

Hardware	Model	10min Audio	GPU Used
CPU (8-core)	Tiny	~2 min	No
CPU (8-core)	Small	~4 min	No
CPU (8-core)	Medium	~8 min	No
GPU (RTX 3060)	Small	~1 min	Yes
GPU (RTX 3060)	Medium	~2 min	Yes

Your results may vary

🔍 Debugging

Enable Verbose Logging

Modify app.py:

logging.basicConfig(level=logging.DEBUG)  # Change from INFO to DEBUG

Check Logs

Console output shows all processing steps
Look for ERROR or WARNING messages
Progress callbacks show current operation

Test Individual Components

Test each module separately:

# Test audio processor
from utils.audio_processor import AudioProcessor
duration = AudioProcessor.get_audio_duration("test.mp3")
print(f"Duration: {duration}s")

# Test transcription
from utils.transcription import WhisperTranscriber
transcriber = WhisperTranscriber(model_size='tiny')
transcriber.load_model()
result = transcriber.transcribe("test.mp3")
print(result['text'])

📝 Development Tips

Fast Iteration

For faster testing during development:

Use tiny model: Fastest processing
Use short files: 30-60 seconds
Disable diarization: Saves time
Use local files: Faster than URLs

Code Changes

The Gradio app auto-reloads on code changes. Just save and refresh browser.

Memory Usage

Monitor memory:

Small model: ~2GB RAM
Medium model: ~4GB RAM
With GPU: +2GB VRAM

✅ Ready for Deployment

Once all tests pass:

✅ Basic transcription works
✅ YouTube download works
✅ All output formats generated
✅ Progress bars show correctly
✅ Large files process (chunking works)
✅ Diarization works (if enabled)

You're ready to deploy to Hugging Face Spaces! 🚀

See DEPLOYMENT.md for deployment instructions.