mr-don88
/

Auto-generate-subtitles

Model card Files Files and versions

mr-don88 commited on Mar 21

Commit

86fe08c

·

verified ·

1 Parent(s): 5622d7b

Update README.md

Files changed (1) hide show

README.md +67 -3

README.md CHANGED Viewed

@@ -1,3 +1,67 @@
----
-license: mit
----

+---
+title: Subtitle Generator
+emoji: 🎤
+colorFrom: blue
+colorTo: green
+sdk: gradio
+sdk_version: 4.44.1
+app_file: app.py
+pinned: false
+---
+# 🎤 Subtitle Generator
+Automatically generate subtitles from audio/video files using OpenAI's Whisper model.
+## Features
+- 🎙️ **Speech-to-Text**: Transcribe audio using Whisper
+- 🌍 **Auto Language Detection**: Automatically detects source language
+- 📝 **Word-Level Timing**: Shows timing for each individual word
+- 📄 **SRT Export**: Download subtitles in standard SRT format
+- 🎬 **Video Support**: Extract audio from video files (MP4, MOV, MKV, AVI)
+- 📖 **Story Mode**: Merge short sentences into longer segments
+## How to Use
+1. **Upload** an audio or video file
+   - Audio: MP3, WAV, OGG, M4A, FLAC, AAC
+   - Video: MP4, MOV, MKV, AVI
+2. **Select** Whisper model size
+   - `tiny`: Fastest, least accurate
+   - `base`: Balanced (recommended)
+   - `small`: More accurate, slower
+   - `medium`: Most accurate, slowest
+3. **Choose** processing mode
+   - **Podcast**: Keep original sentence structure
+   - **Story**: Merge short sentences (max 300 chars, 12 sec)
+4. **Click** "Generate Subtitles" and wait for processing
+5. **Download** the SRT file
+## Model Information
+| Model | Speed | Accuracy | Use Case |
+|-------|-------|----------|----------|
+| tiny | Fastest | Low | Quick previews |
+| base | Fast | Good | General use (recommended) |
+| small | Medium | Better | Podcasts, lectures |
+| medium | Slow | Best | Professional transcription |
+## Technical Details
+- Uses **faster-whisper** for efficient CPU inference
+- **VAD filter** removes silence/speech detection
+- **Word timestamps** for precise alignment
+- **Auto language detection** for multilingual support
+- **Smart number formatting** (e.g., "1030" → "10:30" in time context)
+## Notes
+- First transcription will download the model (~150MB for base model)
+- Processing time depends on file length and model size
+- Videos are processed by extracting audio first
+- All processing happens on CPU (no GPU required)