Spaces:
Sleeping
Sleeping
| # π€ START HERE - Whisper Transcriber Project | |
| ## π Welcome! | |
| You now have a **complete, production-ready SRT generator** using OpenAI Whisper! | |
| --- | |
| ## π― What You Have | |
| A fully-functional transcription system that can: | |
| β Upload audio/video files | |
| β Download from YouTube | |
| β Auto-detect 99+ languages | |
| β Generate SRT, VTT, TXT, JSON | |
| β Identify speakers (optional) | |
| β Handle large files automatically | |
| β Show real-time progress | |
| β Provide public API | |
| --- | |
| ## π Project Files | |
| ``` | |
| hf/ | |
| βββ π app.py # Main Gradio app (RUN THIS!) | |
| βββ π¦ requirements.txt # Dependencies | |
| βββ π« .gitignore # Git ignore rules | |
| β | |
| βββ π οΈ utils/ # Core modules (1,391 lines) | |
| β βββ audio_processor.py # Audio extraction & chunking | |
| β βββ downloader.py # YouTube & URL downloads | |
| β βββ transcription.py # Whisper transcription | |
| β βββ formatters.py # SRT/VTT/TXT/JSON output | |
| β βββ diarization.py # Speaker identification | |
| β | |
| βββ π Documentation/ | |
| βββ β‘ QUICK_START.md # READ THIS FIRST! | |
| βββ π§ͺ LOCAL_TESTING.md # Test locally | |
| βββ π DEPLOYMENT.md # Deploy to HF Spaces | |
| βββ π PROJECT_SUMMARY.md # Technical details | |
| βββ π README.md # Full documentation | |
| ``` | |
| --- | |
| ## π Quick Start (Choose One) | |
| ### Option A: Deploy to Hugging Face (5 minutes) | |
| **Easiest option - No local setup needed!** | |
| 1. Go to [huggingface.co/spaces](https://huggingface.co/spaces) | |
| 2. Create new Space (Gradio SDK) | |
| 3. Upload all files from this folder | |
| 4. Wait 5-10 minutes for build | |
| 5. Done! Your app is live π | |
| **π See `QUICK_START.md` for detailed steps** | |
| --- | |
| ### Option B: Run Locally (10 minutes) | |
| **Full control - Run on your computer** | |
| ```bash | |
| # 1. Install FFmpeg | |
| choco install ffmpeg # Windows | |
| brew install ffmpeg # Mac | |
| apt install ffmpeg # Linux | |
| # 2. Setup Python | |
| python -m venv venv | |
| source venv/bin/activate # or venv\Scripts\activate on Windows | |
| pip install -r requirements.txt | |
| # 3. Run! | |
| python app.py | |
| ``` | |
| Then open: http://127.0.0.1:7860 | |
| **π See `LOCAL_TESTING.md` for detailed steps** | |
| --- | |
| ## π Documentation Guide | |
| **New to the project?** | |
| 1. Start with `QUICK_START.md` (5-min read) | |
| 2. Then `README.md` for full features | |
| **Want to test locally?** | |
| β `LOCAL_TESTING.md` | |
| **Ready to deploy?** | |
| β `DEPLOYMENT.md` | |
| **Need technical details?** | |
| β `PROJECT_SUMMARY.md` | |
| --- | |
| ## π― First Steps After Setup | |
| ### Test with a Sample | |
| 1. **Find a short audio file** (1-2 minutes) | |
| - Or use a YouTube URL | |
| 2. **Run the app** | |
| - Upload the file | |
| - Select "Small" model | |
| - Click "Generate Transcription" | |
| 3. **Download results** | |
| - Try the SRT file first | |
| - Open in a text editor | |
| **Example YouTube URL to test:** | |
| ``` | |
| https://www.youtube.com/watch?v=dQw4w9WgXcQ | |
| ``` | |
| --- | |
| ## βοΈ Basic Settings | |
| ### Model Selection | |
| - **Tiny**: Fastest (use for testing) | |
| - **Small**: Recommended (good balance) | |
| - **Medium**: Best quality (slower) | |
| ### Language | |
| - **Auto-detect**: Works great! (recommended) | |
| - **Manual**: Select if you know the language | |
| ### Speaker Diarization | |
| - **Off**: Faster (default) | |
| - **On**: Identifies different speakers (requires HF token) | |
| --- | |
| ## π What to Expect | |
| ### Processing Time (10-minute audio) | |
| | Setup | Model | Time | | |
| |-------|-------|------| | |
| | CPU | Tiny | ~1 min | | |
| | CPU | Small | ~3-5 min | | |
| | CPU | Medium | ~8-10 min | | |
| | GPU | Small | ~1 min | | |
| ### Output Files | |
| After processing, you get **4 files**: | |
| 1. **π filename.srt** - Most common, for video players | |
| 2. **π filename.vtt** - For web players | |
| 3. **π filename.txt** - Plain text transcript | |
| 4. **π filename.json** - Full data with word timestamps | |
| --- | |
| ## π API Usage (Advanced) | |
| Yes, this has an API! Use it in your code: | |
| ```python | |
| from gradio_client import Client | |
| client = Client("YOUR_SPACE_URL") | |
| result = client.predict( | |
| url_input="https://youtube.com/watch?v=...", | |
| model_size="small", | |
| language="auto", | |
| enable_diarization=False | |
| ) | |
| ``` | |
| --- | |
| ## π‘ Pro Tips | |
| ### For Best Results | |
| - Use high-quality audio (clear speech) | |
| - Choose specific language if known | |
| - Use Medium model for final production | |
| ### For Speed | |
| - Use Tiny model for quick tests | |
| - Keep files under 10 minutes | |
| - Disable speaker diarization | |
| ### For YouTube | |
| - Some videos may be restricted | |
| - Use direct file upload as fallback | |
| - Works with unlisted videos | |
| --- | |
| ## π Common Issues | |
| ### "ModuleNotFoundError" | |
| β Run: `pip install -r requirements.txt` | |
| ### "FFmpeg not found" | |
| β Install FFmpeg (see QUICK_START.md) | |
| ### "YouTube download failed" | |
| β Video may be restricted, try file upload | |
| ### "Slow processing" | |
| β Normal on CPU, use smaller model or GPU | |
| ### "Speaker diarization not working" | |
| β Need HF_TOKEN (see DEPLOYMENT.md) | |
| --- | |
| ## π¨ Features Included | |
| ### Input Methods | |
| β File upload (drag & drop) | |
| β YouTube URLs | |
| β Direct media URLs | |
| β Multiple formats (MP3, MP4, WAV, etc.) | |
| ### Processing | |
| β Auto audio extraction from video | |
| β Large file chunking (>30 min) | |
| β Multi-language support (99+) | |
| β Word-level timestamps | |
| β Speaker identification (optional) | |
| ### Output | |
| β SRT subtitles | |
| β VTT web format | |
| β Plain text | |
| β JSON with metadata | |
| β Preview in browser | |
| ### UI/UX | |
| β Real-time progress bars | |
| β Clear error messages | |
| β Download buttons for all formats | |
| β Model selection | |
| β Language selection | |
| β Clean, modern interface | |
| ### Technical | |
| β Public API endpoint | |
| β Automatic cleanup | |
| β GPU support (auto-detected) | |
| β Error handling | |
| β Memory efficient | |
| --- | |
| ## π Next Steps | |
| 1. **Choose your deployment option** (HF Spaces or Local) | |
| 2. **Read the relevant guide** (QUICK_START.md or LOCAL_TESTING.md) | |
| 3. **Test with a sample file** | |
| 4. **Share your app!** (if deployed to HF Spaces) | |
| --- | |
| ## π Need Help? | |
| **Documentation:** | |
| - QUICK_START.md - Basic setup | |
| - LOCAL_TESTING.md - Local development | |
| - DEPLOYMENT.md - HF Spaces deployment | |
| - README.md - Full documentation | |
| **Support:** | |
| - Check the documentation first | |
| - Review error messages | |
| - Open an issue on GitHub | |
| --- | |
| ## β Project Checklist | |
| ### Before Deploying | |
| - [ ] Read QUICK_START.md | |
| - [ ] Choose deployment method | |
| - [ ] Test locally (optional but recommended) | |
| - [ ] Prepare sample files for testing | |
| ### After Deploying | |
| - [ ] Test basic transcription | |
| - [ ] Try YouTube download | |
| - [ ] Test different models | |
| - [ ] Share with users! | |
| --- | |
| ## π You're All Set! | |
| Your Whisper Transcriber is **ready to go**! | |
| **Next step:** Open `QUICK_START.md` and choose your deployment method. | |
| **Questions?** Check the documentation files above. | |
| **Ready to transcribe?** Let's go! π€ | |
| --- | |
| **Built with:** | |
| - OpenAI Whisper (speech recognition) | |
| - Gradio (web interface) | |
| - PyTorch (deep learning) | |
| - Pyannote.audio (speaker diarization) | |
| - FFmpeg (audio/video processing) | |
| - yt-dlp (YouTube downloads) | |
| **License:** MIT (free for personal and commercial use) | |
| --- | |
| Happy transcribing! π | |