Spaces:

xTHExBEASTx
/

Whisper-Transcriber

Sleeping

File size: 7,065 Bytes
# 🎤 START HERE - Whisper Transcriber Project

## 👋 Welcome!

You now have a **complete, production-ready SRT generator** using OpenAI Whisper!

---

## 🎯 What You Have

A fully-functional transcription system that can:

✅ Upload audio/video files
✅ Download from YouTube
✅ Auto-detect 99+ languages
✅ Generate SRT, VTT, TXT, JSON
✅ Identify speakers (optional)
✅ Handle large files automatically
✅ Show real-time progress
✅ Provide public API

---

## 📁 Project Files

```
hf/
├── 🚀 app.py                    # Main Gradio app (RUN THIS!)
├── 📦 requirements.txt          # Dependencies
├── 🚫 .gitignore                # Git ignore rules
│
├── 🛠️  utils/                    # Core modules (1,391 lines)
│   ├── audio_processor.py      # Audio extraction & chunking
│   ├── downloader.py           # YouTube & URL downloads
│   ├── transcription.py        # Whisper transcription
│   ├── formatters.py           # SRT/VTT/TXT/JSON output
│   └── diarization.py          # Speaker identification
│
└── 📚 Documentation/
    ├── ⚡ QUICK_START.md        # READ THIS FIRST!
    ├── 🧪 LOCAL_TESTING.md     # Test locally
    ├── 🚀 DEPLOYMENT.md        # Deploy to HF Spaces
    ├── 📋 PROJECT_SUMMARY.md   # Technical details
    └── 📖 README.md            # Full documentation
```

---

## 🚀 Quick Start (Choose One)

### Option A: Deploy to Hugging Face (5 minutes)

**Easiest option - No local setup needed!**

1. Go to [huggingface.co/spaces](https://huggingface.co/spaces)
2. Create new Space (Gradio SDK)
3. Upload all files from this folder
4. Wait 5-10 minutes for build
5. Done! Your app is live 🎉

**👉 See `QUICK_START.md` for detailed steps**

---

### Option B: Run Locally (10 minutes)

**Full control - Run on your computer**

```bash
# 1. Install FFmpeg
choco install ffmpeg  # Windows
brew install ffmpeg   # Mac
apt install ffmpeg    # Linux

# 2. Setup Python
python -m venv venv
source venv/bin/activate  # or venv\Scripts\activate on Windows
pip install -r requirements.txt

# 3. Run!
python app.py
```

Then open: http://127.0.0.1:7860

**👉 See `LOCAL_TESTING.md` for detailed steps**

---

## 📖 Documentation Guide

**New to the project?**
1. Start with `QUICK_START.md` (5-min read)
2. Then `README.md` for full features

**Want to test locally?**
→ `LOCAL_TESTING.md`

**Ready to deploy?**
→ `DEPLOYMENT.md`

**Need technical details?**
→ `PROJECT_SUMMARY.md`

---

## 🎯 First Steps After Setup

### Test with a Sample

1. **Find a short audio file** (1-2 minutes)
   - Or use a YouTube URL

2. **Run the app**
   - Upload the file
   - Select "Small" model
   - Click "Generate Transcription"

3. **Download results**
   - Try the SRT file first
   - Open in a text editor

**Example YouTube URL to test:**
```
https://www.youtube.com/watch?v=dQw4w9WgXcQ
```

---

## ⚙️ Basic Settings

### Model Selection
- **Tiny**: Fastest (use for testing)
- **Small**: Recommended (good balance)
- **Medium**: Best quality (slower)

### Language
- **Auto-detect**: Works great! (recommended)
- **Manual**: Select if you know the language

### Speaker Diarization
- **Off**: Faster (default)
- **On**: Identifies different speakers (requires HF token)

---

## 📊 What to Expect

### Processing Time (10-minute audio)

| Setup | Model | Time |
|-------|-------|------|
| CPU | Tiny | ~1 min |
| CPU | Small | ~3-5 min |
| CPU | Medium | ~8-10 min |
| GPU | Small | ~1 min |

### Output Files

After processing, you get **4 files**:

1. **📄 filename.srt** - Most common, for video players
2. **📄 filename.vtt** - For web players
3. **📄 filename.txt** - Plain text transcript
4. **📄 filename.json** - Full data with word timestamps

---

## 🔌 API Usage (Advanced)

Yes, this has an API! Use it in your code:

```python
from gradio_client import Client

client = Client("YOUR_SPACE_URL")
result = client.predict(
    url_input="https://youtube.com/watch?v=...",
    model_size="small",
    language="auto",
    enable_diarization=False
)
```

---

## 💡 Pro Tips

### For Best Results
- Use high-quality audio (clear speech)
- Choose specific language if known
- Use Medium model for final production

### For Speed
- Use Tiny model for quick tests
- Keep files under 10 minutes
- Disable speaker diarization

### For YouTube
- Some videos may be restricted
- Use direct file upload as fallback
- Works with unlisted videos

---

## 🆘 Common Issues

### "ModuleNotFoundError"
→ Run: `pip install -r requirements.txt`

### "FFmpeg not found"
→ Install FFmpeg (see QUICK_START.md)

### "YouTube download failed"
→ Video may be restricted, try file upload

### "Slow processing"
→ Normal on CPU, use smaller model or GPU

### "Speaker diarization not working"
→ Need HF_TOKEN (see DEPLOYMENT.md)

---

## 🎨 Features Included

### Input Methods
✅ File upload (drag & drop)
✅ YouTube URLs
✅ Direct media URLs
✅ Multiple formats (MP3, MP4, WAV, etc.)

### Processing
✅ Auto audio extraction from video
✅ Large file chunking (>30 min)
✅ Multi-language support (99+)
✅ Word-level timestamps
✅ Speaker identification (optional)

### Output
✅ SRT subtitles
✅ VTT web format
✅ Plain text
✅ JSON with metadata
✅ Preview in browser

### UI/UX
✅ Real-time progress bars
✅ Clear error messages
✅ Download buttons for all formats
✅ Model selection
✅ Language selection
✅ Clean, modern interface

### Technical
✅ Public API endpoint
✅ Automatic cleanup
✅ GPU support (auto-detected)
✅ Error handling
✅ Memory efficient

---

## 🚀 Next Steps

1. **Choose your deployment option** (HF Spaces or Local)
2. **Read the relevant guide** (QUICK_START.md or LOCAL_TESTING.md)
3. **Test with a sample file**
4. **Share your app!** (if deployed to HF Spaces)

---

## 📞 Need Help?

**Documentation:**
- QUICK_START.md - Basic setup
- LOCAL_TESTING.md - Local development
- DEPLOYMENT.md - HF Spaces deployment
- README.md - Full documentation

**Support:**
- Check the documentation first
- Review error messages
- Open an issue on GitHub

---

## ✅ Project Checklist

### Before Deploying
- [ ] Read QUICK_START.md
- [ ] Choose deployment method
- [ ] Test locally (optional but recommended)
- [ ] Prepare sample files for testing

### After Deploying
- [ ] Test basic transcription
- [ ] Try YouTube download
- [ ] Test different models
- [ ] Share with users!

---

## 🎉 You're All Set!

Your Whisper Transcriber is **ready to go**!

**Next step:** Open `QUICK_START.md` and choose your deployment method.

**Questions?** Check the documentation files above.

**Ready to transcribe?** Let's go! 🎤

---

**Built with:**
- OpenAI Whisper (speech recognition)
- Gradio (web interface)
- PyTorch (deep learning)
- Pyannote.audio (speaker diarization)
- FFmpeg (audio/video processing)
- yt-dlp (YouTube downloads)

**License:** MIT (free for personal and commercial use)

---

Happy transcribing! 🎊