mr-don88 commited on
Commit
86fe08c
Β·
verified Β·
1 Parent(s): 5622d7b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -3
README.md CHANGED
@@ -1,3 +1,67 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Subtitle Generator
3
+ emoji: 🎀
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 4.44.1
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # 🎀 Subtitle Generator
13
+
14
+ Automatically generate subtitles from audio/video files using OpenAI's Whisper model.
15
+
16
+ ## Features
17
+
18
+ - πŸŽ™οΈ **Speech-to-Text**: Transcribe audio using Whisper
19
+ - 🌍 **Auto Language Detection**: Automatically detects source language
20
+ - πŸ“ **Word-Level Timing**: Shows timing for each individual word
21
+ - πŸ“„ **SRT Export**: Download subtitles in standard SRT format
22
+ - 🎬 **Video Support**: Extract audio from video files (MP4, MOV, MKV, AVI)
23
+ - πŸ“– **Story Mode**: Merge short sentences into longer segments
24
+
25
+ ## How to Use
26
+
27
+ 1. **Upload** an audio or video file
28
+ - Audio: MP3, WAV, OGG, M4A, FLAC, AAC
29
+ - Video: MP4, MOV, MKV, AVI
30
+
31
+ 2. **Select** Whisper model size
32
+ - `tiny`: Fastest, least accurate
33
+ - `base`: Balanced (recommended)
34
+ - `small`: More accurate, slower
35
+ - `medium`: Most accurate, slowest
36
+
37
+ 3. **Choose** processing mode
38
+ - **Podcast**: Keep original sentence structure
39
+ - **Story**: Merge short sentences (max 300 chars, 12 sec)
40
+
41
+ 4. **Click** "Generate Subtitles" and wait for processing
42
+
43
+ 5. **Download** the SRT file
44
+
45
+ ## Model Information
46
+
47
+ | Model | Speed | Accuracy | Use Case |
48
+ |-------|-------|----------|----------|
49
+ | tiny | Fastest | Low | Quick previews |
50
+ | base | Fast | Good | General use (recommended) |
51
+ | small | Medium | Better | Podcasts, lectures |
52
+ | medium | Slow | Best | Professional transcription |
53
+
54
+ ## Technical Details
55
+
56
+ - Uses **faster-whisper** for efficient CPU inference
57
+ - **VAD filter** removes silence/speech detection
58
+ - **Word timestamps** for precise alignment
59
+ - **Auto language detection** for multilingual support
60
+ - **Smart number formatting** (e.g., "1030" β†’ "10:30" in time context)
61
+
62
+ ## Notes
63
+
64
+ - First transcription will download the model (~150MB for base model)
65
+ - Processing time depends on file length and model size
66
+ - Videos are processed by extracting audio first
67
+ - All processing happens on CPU (no GPU required)