Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.6.0
π€ START HERE - Whisper Transcriber Project
π Welcome!
You now have a complete, production-ready SRT generator using OpenAI Whisper!
π― What You Have
A fully-functional transcription system that can:
β Upload audio/video files β Download from YouTube β Auto-detect 99+ languages β Generate SRT, VTT, TXT, JSON β Identify speakers (optional) β Handle large files automatically β Show real-time progress β Provide public API
π Project Files
hf/
βββ π app.py # Main Gradio app (RUN THIS!)
βββ π¦ requirements.txt # Dependencies
βββ π« .gitignore # Git ignore rules
β
βββ π οΈ utils/ # Core modules (1,391 lines)
β βββ audio_processor.py # Audio extraction & chunking
β βββ downloader.py # YouTube & URL downloads
β βββ transcription.py # Whisper transcription
β βββ formatters.py # SRT/VTT/TXT/JSON output
β βββ diarization.py # Speaker identification
β
βββ π Documentation/
βββ β‘ QUICK_START.md # READ THIS FIRST!
βββ π§ͺ LOCAL_TESTING.md # Test locally
βββ π DEPLOYMENT.md # Deploy to HF Spaces
βββ π PROJECT_SUMMARY.md # Technical details
βββ π README.md # Full documentation
π Quick Start (Choose One)
Option A: Deploy to Hugging Face (5 minutes)
Easiest option - No local setup needed!
- Go to huggingface.co/spaces
- Create new Space (Gradio SDK)
- Upload all files from this folder
- Wait 5-10 minutes for build
- Done! Your app is live π
π See QUICK_START.md for detailed steps
Option B: Run Locally (10 minutes)
Full control - Run on your computer
# 1. Install FFmpeg
choco install ffmpeg # Windows
brew install ffmpeg # Mac
apt install ffmpeg # Linux
# 2. Setup Python
python -m venv venv
source venv/bin/activate # or venv\Scripts\activate on Windows
pip install -r requirements.txt
# 3. Run!
python app.py
Then open: http://127.0.0.1:7860
π See LOCAL_TESTING.md for detailed steps
π Documentation Guide
New to the project?
- Start with
QUICK_START.md(5-min read) - Then
README.mdfor full features
Want to test locally?
β LOCAL_TESTING.md
Ready to deploy?
β DEPLOYMENT.md
Need technical details?
β PROJECT_SUMMARY.md
π― First Steps After Setup
Test with a Sample
Find a short audio file (1-2 minutes)
- Or use a YouTube URL
Run the app
- Upload the file
- Select "Small" model
- Click "Generate Transcription"
Download results
- Try the SRT file first
- Open in a text editor
Example YouTube URL to test:
https://www.youtube.com/watch?v=dQw4w9WgXcQ
βοΈ Basic Settings
Model Selection
- Tiny: Fastest (use for testing)
- Small: Recommended (good balance)
- Medium: Best quality (slower)
Language
- Auto-detect: Works great! (recommended)
- Manual: Select if you know the language
Speaker Diarization
- Off: Faster (default)
- On: Identifies different speakers (requires HF token)
π What to Expect
Processing Time (10-minute audio)
| Setup | Model | Time |
|---|---|---|
| CPU | Tiny | ~1 min |
| CPU | Small | ~3-5 min |
| CPU | Medium | ~8-10 min |
| GPU | Small | ~1 min |
Output Files
After processing, you get 4 files:
- π filename.srt - Most common, for video players
- π filename.vtt - For web players
- π filename.txt - Plain text transcript
- π filename.json - Full data with word timestamps
π API Usage (Advanced)
Yes, this has an API! Use it in your code:
from gradio_client import Client
client = Client("YOUR_SPACE_URL")
result = client.predict(
url_input="https://youtube.com/watch?v=...",
model_size="small",
language="auto",
enable_diarization=False
)
π‘ Pro Tips
For Best Results
- Use high-quality audio (clear speech)
- Choose specific language if known
- Use Medium model for final production
For Speed
- Use Tiny model for quick tests
- Keep files under 10 minutes
- Disable speaker diarization
For YouTube
- Some videos may be restricted
- Use direct file upload as fallback
- Works with unlisted videos
π Common Issues
"ModuleNotFoundError"
β Run: pip install -r requirements.txt
"FFmpeg not found"
β Install FFmpeg (see QUICK_START.md)
"YouTube download failed"
β Video may be restricted, try file upload
"Slow processing"
β Normal on CPU, use smaller model or GPU
"Speaker diarization not working"
β Need HF_TOKEN (see DEPLOYMENT.md)
π¨ Features Included
Input Methods
β File upload (drag & drop) β YouTube URLs β Direct media URLs β Multiple formats (MP3, MP4, WAV, etc.)
Processing
β Auto audio extraction from video β Large file chunking (>30 min) β Multi-language support (99+) β Word-level timestamps β Speaker identification (optional)
Output
β SRT subtitles β VTT web format β Plain text β JSON with metadata β Preview in browser
UI/UX
β Real-time progress bars β Clear error messages β Download buttons for all formats β Model selection β Language selection β Clean, modern interface
Technical
β Public API endpoint β Automatic cleanup β GPU support (auto-detected) β Error handling β Memory efficient
π Next Steps
- Choose your deployment option (HF Spaces or Local)
- Read the relevant guide (QUICK_START.md or LOCAL_TESTING.md)
- Test with a sample file
- Share your app! (if deployed to HF Spaces)
π Need Help?
Documentation:
- QUICK_START.md - Basic setup
- LOCAL_TESTING.md - Local development
- DEPLOYMENT.md - HF Spaces deployment
- README.md - Full documentation
Support:
- Check the documentation first
- Review error messages
- Open an issue on GitHub
β Project Checklist
Before Deploying
- Read QUICK_START.md
- Choose deployment method
- Test locally (optional but recommended)
- Prepare sample files for testing
After Deploying
- Test basic transcription
- Try YouTube download
- Test different models
- Share with users!
π You're All Set!
Your Whisper Transcriber is ready to go!
Next step: Open QUICK_START.md and choose your deployment method.
Questions? Check the documentation files above.
Ready to transcribe? Let's go! π€
Built with:
- OpenAI Whisper (speech recognition)
- Gradio (web interface)
- PyTorch (deep learning)
- Pyannote.audio (speaker diarization)
- FFmpeg (audio/video processing)
- yt-dlp (YouTube downloads)
License: MIT (free for personal and commercial use)
Happy transcribing! π