emmajeed commited on
Commit
7ee2bc7
ยท
verified ยท
1 Parent(s): 71a0fd4

Upload 5 files

Browse files
Files changed (5) hide show
  1. README.md +119 -6
  2. ai_providers.py +271 -0
  3. app.py +189 -0
  4. requirements.txt +6 -0
  5. transcribe_core.py +365 -0
README.md CHANGED
@@ -1,12 +1,125 @@
1
  ---
2
- title: Transcriptinator V2
3
- emoji: ๐Ÿ˜ป
4
- colorFrom: red
5
- colorTo: gray
6
  sdk: gradio
7
- sdk_version: 6.2.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Transcriptinator
3
+ emoji: ๐ŸŽ™๏ธ
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.16.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
+ # ๐ŸŽ™๏ธ Transcriptinator
13
+
14
+ Simple, fast audio transcription powered by Google's Gemini AI.
15
+
16
+ ## Features
17
+
18
+ - ๐ŸŽฏ **Simple & Fast** - Upload audio, get transcript in ~20-50 seconds
19
+ - ๐Ÿ“ **Smart Summaries** - Automatic summary and key ideas extraction
20
+ - ๐Ÿ”’ **Private** - Your API key, your data - nothing stored
21
+ - ๐Ÿ’ฐ **Free** - Uses your own Gemini API key (free tier: 15 requests/min)
22
+ - ๐Ÿ“„ **Markdown Output** - Clean, formatted transcripts ready to download
23
+
24
+ ## How to Use
25
+
26
+ ### 1. Get a Gemini API Key (Free)
27
+
28
+ 1. Go to [Google AI Studio](https://aistudio.google.com/app/apikey)
29
+ 2. Click "Create API key"
30
+ 3. Copy the key
31
+
32
+ ### 2. Transcribe Audio
33
+
34
+ 1. Upload your audio file (max 10 minutes)
35
+ - Supported formats: MP3, WAV, M4A, OGG, FLAC, WEBM
36
+ 2. Paste your API key
37
+ 3. Click "๐Ÿš€ Transcribe Audio"
38
+ 4. Wait ~20-50 seconds
39
+ 5. Download your transcript!
40
+
41
+ ## What You Get
42
+
43
+ Your transcript includes:
44
+
45
+ ```yaml
46
+ ---
47
+ title: "Your Audio File"
48
+ date_processed: "2025-12-24"
49
+ summary: "Quick 2-3 sentence overview..."
50
+ key_ideas:
51
+ - idea: "Main Point 1"
52
+ description: "Explanation..."
53
+ - idea: "Main Point 2"
54
+ description: "Explanation..."
55
+ note_id: "unique-id"
56
+ ---
57
+
58
+ ## Key Ideas
59
+ - **Main Point 1:** Explanation...
60
+ - **Main Point 2:** Explanation...
61
+
62
+ ## Full Transcription
63
+ [00:00] Speaker 1: Hello...
64
+ [00:15] Speaker 2: Welcome...
65
+ ```
66
+
67
+ ## Limitations
68
+
69
+ - **Maximum audio length:** 10 minutes (free HuggingFace tier timeout limit)
70
+ - **Processing time:** ~20-50 seconds depending on audio length
71
+ - **API rate limits:** 15 requests/minute (Gemini free tier)
72
+
73
+ ## Privacy & Security
74
+
75
+ โœ… **Your API key is never stored** - Used only for the current request
76
+ โœ… **Audio files are temporary** - Deleted immediately after processing
77
+ โœ… **No data collection** - Everything runs through your own API key
78
+
79
+ ## Technical Details
80
+
81
+ **AI Calls per transcription:** 3
82
+ 1. Transcription (with timestamps and speakers)
83
+ 2. Summary generation
84
+ 3. Key ideas extraction
85
+
86
+ **Processing time estimate:**
87
+ - 2-minute audio: ~22 seconds
88
+ - 5-minute audio: ~35 seconds
89
+ - 10-minute audio: ~50 seconds
90
+
91
+ ## Troubleshooting
92
+
93
+ **"Invalid API key"**
94
+ - Make sure you copied the entire key
95
+ - Generate a new key at [Google AI Studio](https://aistudio.google.com/app/apikey)
96
+
97
+ **"Audio file too long"**
98
+ - Maximum is 10 minutes for free tier
99
+ - Split longer files or use the [CLI version](https://github.com/YOUR_USERNAME/transcriptinator)
100
+
101
+ **"Processing timeout"**
102
+ - Audio might be too long or corrupted
103
+ - Try with a shorter, clearer audio file
104
+
105
+ ## Local Installation
106
+
107
+ Want to run unlimited length audio? Clone the full version:
108
+
109
+ ``bash
110
+ git clone https://github.com/YOUR_USERNAME/transcriptinator
111
+ cd transcriptinator
112
+ pip install -r requirements.txt
113
+ python audio_process_and_transcribe.py your_audio_folder -o output_folder
114
+ ```
115
+
116
+ ## Credits
117
+
118
+ Built with:
119
+ - [Gradio](https://gradio.app/) - Web interface
120
+ - [Google Gemini](https://ai.google.dev/) - AI transcription
121
+ - [HuggingFace Spaces](https://huggingface.co/spaces) - Hosting
122
+
123
+ ## License
124
+
125
+ MIT License - Feel free to use and modify!
ai_providers.py ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AI Provider Abstraction Layer for Transcriptinator
3
+ Supports multiple AI providers: Gemini and HuggingFace
4
+ """
5
+
6
+ from abc import ABC, abstractmethod
7
+ from typing import Dict, List
8
+ import google.generativeai as genai
9
+ import requests
10
+
11
+
12
+ class TranscriptionProvider(ABC):
13
+ """Base class for AI transcription providers"""
14
+
15
+ @abstractmethod
16
+ def transcribe(self, audio_file_path: str) -> str:
17
+ """Generate transcription from audio file"""
18
+ pass
19
+
20
+ @abstractmethod
21
+ def generate_summary(self, text: str) -> str:
22
+ """Generate summary from transcription text"""
23
+ pass
24
+
25
+ @abstractmethod
26
+ def generate_key_ideas(self, text: str) -> List[Dict[str, str]]:
27
+ """Extract key ideas from transcription text"""
28
+ pass
29
+
30
+
31
+ class GeminiProvider(TranscriptionProvider):
32
+ """Google Gemini provider with configurable models"""
33
+
34
+ AVAILABLE_MODELS = {
35
+ "Gemini 2.5 Flash": "models/gemini-2.5-flash",
36
+ "Gemini 2.0 Flash": "models/gemini-2.0-flash-exp",
37
+ "Gemini 1.5 Flash": "models/gemini-1.5-flash"
38
+ }
39
+
40
+ def __init__(self, api_key: str, model_name: str):
41
+ self.api_key = api_key
42
+ self.model_name = model_name
43
+ genai.configure(api_key=api_key)
44
+ self.model = genai.GenerativeModel(self.AVAILABLE_MODELS[model_name])
45
+
46
+ def transcribe(self, audio_file_path: str) -> str:
47
+ """Generate transcription using Gemini API with timestamps and speakers"""
48
+ try:
49
+ with open(audio_file_path, "rb") as audio_file:
50
+ audio_data = audio_file.read()
51
+
52
+ contents = [
53
+ {
54
+ "role": "user",
55
+ "parts": [
56
+ {
57
+ "mime_type": "audio/mp3",
58
+ "data": audio_data
59
+ },
60
+ "Create a clean transcription of the audio file in English. Tag timestamps and speakers separately within the transcription. If speakers can be identified, use their names; otherwise, use 'Speaker 1', 'Speaker 2', etc. **Return ONLY the raw transcription text, starting directly with the first line of the transcription.** Do not include any introductory phrases, speaker identification plans, completion messages, or any text other than the transcription itself."
61
+ ]
62
+ },
63
+ {
64
+ "role": "model",
65
+ "parts": [
66
+ "Understood. I will provide a clean, timestamped, and speaker-tagged transcription of the audio file, returning only the transcription text as requested."
67
+ ]
68
+ }
69
+ ]
70
+
71
+ response = self.model.generate_content(contents)
72
+ return response.text
73
+
74
+ except Exception as e:
75
+ raise Exception(f"Error during Gemini transcription: {e}")
76
+
77
+ def generate_summary(self, text: str) -> str:
78
+ """Generate a concise 2-3 sentence summary using Gemini"""
79
+ try:
80
+ prompt_text = f"""
81
+ Please read the following transcription text and write a concise summary of the main points in 2-3 sentences.
82
+
83
+ Transcription Text:
84
+ {text}
85
+
86
+ Summary:
87
+ """
88
+
89
+ response = self.model.generate_content(prompt_text)
90
+ return response.text.strip()
91
+
92
+ except Exception as e:
93
+ return f"Error generating summary: {e}"
94
+
95
+ def generate_key_ideas(self, text: str) -> List[Dict[str, str]]:
96
+ """Identify 3-5 key ideas from the transcription using Gemini"""
97
+ try:
98
+ prompt_text = f"""
99
+ Please read the following transcription text and identify 3-5 key ideas or concepts discussed.
100
+ Return these key ideas as a bulleted list, with each item in the list being an idea followed by a short (1-sentence) description of the idea.
101
+
102
+ Transcription Text:
103
+ {text}
104
+
105
+ Key Ideas:
106
+ """
107
+
108
+ response = self.model.generate_content(prompt_text)
109
+ key_ideas_text = response.text.strip()
110
+
111
+ key_ideas_list = []
112
+ for item in key_ideas_text.split('\n'):
113
+ item = item.lstrip('-* ')
114
+ if item:
115
+ parts = item.split(':', 1)
116
+ if len(parts) == 2:
117
+ idea = parts[0].strip()
118
+ description = parts[1].strip()
119
+ key_ideas_list.append({'idea': idea, 'description': description})
120
+ else:
121
+ key_ideas_list.append({'idea': item.strip(), 'description': ''})
122
+
123
+ return key_ideas_list
124
+
125
+ except Exception as e:
126
+ return [{'idea': 'Error generating key ideas', 'description': str(e)}]
127
+
128
+
129
+ class OpenRouterProvider(TranscriptionProvider):
130
+ """OpenRouter API provider for text generation (summary/key ideas)"""
131
+
132
+ # Using DeepSeek R1 - excellent free model for reasoning and text generation
133
+ MODEL_ID = "deepseek/deepseek-r1-0528:free"
134
+ API_URL = "https://openrouter.ai/api/v1/chat/completions"
135
+
136
+ def __init__(self, api_key: str, model_name: str = None):
137
+ # model_name is ignored for OpenRouter since we use fixed DeepSeek R1
138
+ self.api_key = api_key
139
+
140
+ def transcribe(self, audio_file_path: str) -> str:
141
+ """Not supported - OpenRouter doesn't handle audio"""
142
+ raise NotImplementedError("OpenRouter doesn't support audio transcription. Use Gemini provider.")
143
+
144
+ def generate_summary(self, text: str) -> str:
145
+ """Generate summary using OpenRouter DeepSeek R1"""
146
+ try:
147
+ # Truncate text if too long
148
+ max_chars = 8000
149
+ text_to_summarize = text[:max_chars] if len(text) > max_chars else text
150
+
151
+ headers = {
152
+ "Authorization": f"Bearer {self.api_key}",
153
+ "Content-Type": "application/json"
154
+ }
155
+
156
+ payload = {
157
+ "model": self.MODEL_ID,
158
+ "messages": [
159
+ {
160
+ "role": "user",
161
+ "content": f"Please provide a concise 2-3 sentence summary of the following transcription:\n\n{text_to_summarize}"
162
+ }
163
+ ]
164
+ }
165
+
166
+ response = requests.post(self.API_URL, headers=headers, json=payload)
167
+
168
+ # Handle errors
169
+ if response.status_code != 200:
170
+ return f"Summary unavailable: OpenRouter API error (status {response.status_code})"
171
+
172
+ result = response.json()
173
+
174
+ # Extract the response
175
+ if "choices" in result and len(result["choices"]) > 0:
176
+ return result["choices"][0]["message"]["content"].strip()
177
+
178
+ return "Summary generation completed but format unexpected."
179
+
180
+ except Exception as e:
181
+ return f"Error generating summary: {e}"
182
+
183
+ def generate_key_ideas(self, text: str) -> List[Dict[str, str]]:
184
+ """Generate key ideas using OpenRouter DeepSeek R1"""
185
+ try:
186
+ # Truncate text if too long
187
+ max_chars = 6000
188
+ text_to_analyze = text[:max_chars] if len(text) > max_chars else text
189
+
190
+ headers = {
191
+ "Authorization": f"Bearer {self.api_key}",
192
+ "Content-Type": "application/json"
193
+ }
194
+
195
+ payload = {
196
+ "model": self.MODEL_ID,
197
+ "messages": [
198
+ {
199
+ "role": "user",
200
+ "content": f"""Extract 3-5 key ideas from this transcription. Format each as:
201
+ Idea: Brief title
202
+ Description: One sentence explanation
203
+
204
+ {text_to_analyze}"""
205
+ }
206
+ ]
207
+ }
208
+
209
+ response = requests.post(self.API_URL, headers=headers, json=payload)
210
+
211
+ if response.status_code != 200:
212
+ return [{'idea': 'Key ideas unavailable', 'description': f'OpenRouter API error (status {response.status_code})'}]
213
+
214
+ result = response.json()
215
+
216
+ # Extract and parse the response
217
+ if "choices" in result and len(result["choices"]) > 0:
218
+ content = result["choices"][0]["message"]["content"]
219
+
220
+ # Parse the response into structured key ideas
221
+ key_ideas_list = []
222
+ lines = content.split('\n')
223
+
224
+ current_idea = None
225
+ for line in lines:
226
+ line = line.strip()
227
+ if line.startswith(("Idea:", "**Idea:")):
228
+ if current_idea:
229
+ key_ideas_list.append(current_idea)
230
+ idea_text = line.replace("Idea:", "").replace("**", "").strip()
231
+ current_idea = {'idea': idea_text, 'description': ''}
232
+ elif line.startswith(("Description:", "**Description:")) and current_idea:
233
+ desc_text = line.replace("Description:", "").replace("**", "").strip()
234
+ current_idea['description'] = desc_text
235
+ elif ':' in line and not current_idea:
236
+ # Fallback parsing
237
+ parts = line.split(':', 1)
238
+ if len(parts) == 2:
239
+ key_ideas_list.append({
240
+ 'idea': parts[0].strip('- โ€ข*123456789.').strip(),
241
+ 'description': parts[1].strip()
242
+ })
243
+
244
+ # Add last idea if exists
245
+ if current_idea and current_idea['idea']:
246
+ key_ideas_list.append(current_idea)
247
+
248
+ # Fallback if parsing fails
249
+ if not key_ideas_list:
250
+ # Just use first few sentences
251
+ sentences = [s.strip() for s in content.split('.') if s.strip()][:5]
252
+ for i, sent in enumerate(sentences, 1):
253
+ if sent:
254
+ key_ideas_list.append({'idea': f'Key Point {i}', 'description': sent})
255
+
256
+ return key_ideas_list[:5]
257
+
258
+ return [{'idea': 'Key ideas extraction', 'description': 'Unable to parse response'}]
259
+
260
+ except Exception as e:
261
+ return [{'idea': 'Error generating key ideas', 'description': str(e)}]
262
+
263
+
264
+ def get_provider(provider_type: str, api_key: str, model_name: str) -> TranscriptionProvider:
265
+ """Factory function to create appropriate provider"""
266
+ if provider_type == "Gemini":
267
+ return GeminiProvider(api_key, model_name)
268
+ elif provider_type == "OpenRouter":
269
+ return OpenRouterProvider(api_key, model_name)
270
+ else:
271
+ raise ValueError(f"Unknown provider: {provider_type}")
app.py ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Transcriptinator - HuggingFace Spaces Gradio Interface
3
+ Audio transcription with Gemini + OpenRouter
4
+ """
5
+
6
+ import gradio as gr
7
+ import os
8
+ from transcribe_core import process_audio_file, get_audio_duration
9
+ from ai_providers import GeminiProvider, OpenRouterProvider
10
+
11
+
12
+ def transcribe_audio(audio_file, gemini_key, openrouter_key, model_name):
13
+ """
14
+ Main transcription function for Gradio interface.
15
+
16
+ Args:
17
+ audio_file: Uploaded audio file
18
+ gemini_key: Gemini API key for transcription
19
+ openrouter_key: OpenRouter API key for summary/ideas
20
+ model_name: Gemini model to use
21
+
22
+ Returns:
23
+ Tuple of (status_message, download_file_path)
24
+ """
25
+ if not audio_file:
26
+ return "โŒ Please upload an audio file.", None
27
+
28
+ if not gemini_key or len(gemini_key.strip()) < 10:
29
+ return "โŒ Please provide a valid Gemini API key.", None
30
+
31
+ try:
32
+ # Create Gemini provider for transcription
33
+ gemini_provider = GeminiProvider(gemini_key, model_name)
34
+
35
+ # Create OpenRouter provider for summary/ideas (optional)
36
+ openrouter_provider = None
37
+ if openrouter_key and len(openrouter_key.strip()) > 10:
38
+ openrouter_provider = OpenRouterProvider(openrouter_key)
39
+
40
+ # Get audio duration and file size for estimate
41
+ duration = get_audio_duration(audio_file)
42
+ duration_min = duration / 60
43
+ file_size_mb = os.path.getsize(audio_file) / (1024 * 1024)
44
+
45
+ # Process the audio file
46
+ output_path, is_zip = process_audio_file(
47
+ audio_file,
48
+ gemini_provider,
49
+ openrouter_provider,
50
+ progress_callback=lambda msg, progress: None
51
+ )
52
+
53
+ # Determine file type for success message
54
+ if is_zip == "True":
55
+ file_type = "ZIP archive"
56
+ file_desc = "Multiple transcript files (chunked audio)"
57
+ else:
58
+ file_type = "Markdown file"
59
+ file_desc = "Single transcript file"
60
+
61
+ text_provider = "OpenRouter (DeepSeek R1)" if openrouter_provider else "Gemini"
62
+
63
+ success_msg = f"""โœ… **Transcription Complete!**
64
+
65
+ ๐Ÿ“ Original file: {os.path.basename(audio_file)}
66
+ โฑ๏ธ Duration: {duration_min:.1f} minutes
67
+ ๐Ÿ’พ Size: {file_size_mb:.1f} MB
68
+ ๐ŸŽ™๏ธ Transcription: Gemini ({model_name})
69
+ ๐Ÿ’ก Summary/Ideas: {text_provider}
70
+ ๐Ÿ“„ Output: {file_type}
71
+
72
+ {file_desc}
73
+
74
+ Click below to download your transcript(s)."""
75
+
76
+ # Return the file path directly - Gradio handles the download
77
+ return success_msg, output_path
78
+
79
+ except Exception as e:
80
+ error_msg = f"""โŒ **Error during transcription:**
81
+
82
+ {str(e)}
83
+
84
+ **Common issues:**
85
+ - Invalid API key
86
+ - Audio file too large or corrupted
87
+ - Network connection issues"""
88
+ return error_msg, None
89
+
90
+
91
+ # Create Gradio interface
92
+ with gr.Blocks(title="Transcriptinator", theme=gr.themes.Soft()) as app:
93
+ gr.Markdown("""
94
+ # ๐ŸŽ™๏ธ Transcriptinator
95
+ ### AI-Powered Audio Transcription
96
+
97
+ **Powered by:** Gemini (transcription) + OpenRouter DeepSeek R1 (summarization)
98
+ """)
99
+
100
+ with gr.Row():
101
+ with gr.Column(scale=2):
102
+ # Audio upload
103
+ audio_input = gr.Audio(
104
+ label="Upload Audio File",
105
+ type="filepath",
106
+ sources=["upload"],
107
+ )
108
+
109
+ gr.Markdown("""
110
+ **Supported formats:** MP3, WAV, M4A, OGG, FLAC, WEBM
111
+ **Large files (>30MB):** Automatically chunked and processed
112
+ """)
113
+
114
+ # Model selection
115
+ model_dropdown = gr.Dropdown(
116
+ choices=list(GeminiProvider.AVAILABLE_MODELS.keys()),
117
+ value="Gemini 2.5 Flash",
118
+ label="Gemini Model",
119
+ info="Select which Gemini model to use for transcription"
120
+ )
121
+
122
+ # API keys
123
+ gemini_key_input = gr.Textbox(
124
+ label="Gemini API Key (Required)",
125
+ placeholder="Enter your Gemini API key...",
126
+ type="password",
127
+ info="Get one free at: https://aistudio.google.com/app/apikey"
128
+ )
129
+
130
+ openrouter_key_input = gr.Textbox(
131
+ label="OpenRouter API Key (Optional)",
132
+ placeholder="Enter your OpenRouter key for better summaries...",
133
+ type="password",
134
+ info="Leave empty to use Gemini for all tasks | Get free at: https://openrouter.ai"
135
+ )
136
+
137
+ # Submit button
138
+ submit_btn = gr.Button("๐Ÿš€ Transcribe Audio", variant="primary", size="lg")
139
+
140
+ with gr.Column(scale=1):
141
+ # Status output
142
+ status_output = gr.Markdown(label="Status")
143
+
144
+ # Download button
145
+ download_output = gr.File(label="๐Ÿ“ฅ Download Transcript", interactive=False)
146
+
147
+ # Information section
148
+ gr.Markdown("""
149
+ ---
150
+ ### ๐ŸŽฏ What you'll get:
151
+ - ๐Ÿ“ **Full transcription** with timestamps and speaker detection
152
+ - ๐Ÿ“Š **Summary** in 2-3 sentences
153
+ - ๐Ÿ’ก **Key ideas** with descriptions
154
+ - ๐Ÿ“„ **Markdown file** ready to download
155
+
156
+ ### ๐Ÿค– AI Models:
157
+
158
+ **Gemini** (Google) - Transcription:
159
+ - Gemini 2.5 Flash (recommended - fastest, best quality)
160
+ - Gemini 2.0 Flash (experimental)
161
+ - Gemini 1.5 Flash (stable)
162
+ - Native audio support with timestamps and speakers
163
+
164
+ **OpenRouter** (Optional) - Summarization:
165
+ - Uses DeepSeek R1 (free, excellent reasoning)
166
+ - Better summaries and key ideas extraction
167
+ - Leave API key empty to use Gemini for everything
168
+
169
+ ### ๐Ÿ”’ Privacy:
170
+ - Your API keys are never stored
171
+ - Audio files are processed temporarily and deleted
172
+ - All processing happens through your own credentials
173
+
174
+ ### ๐Ÿ’ก Tips:
175
+ - **New users:** Start with just Gemini API key
176
+ - **Better summaries:** Add OpenRouter key (optional, free)
177
+ - **Large files:** App automatically chunks files >30MB
178
+ """)
179
+
180
+ # Connect the transcription function
181
+ submit_btn.click(
182
+ fn=transcribe_audio,
183
+ inputs=[audio_input, gemini_key_input, openrouter_key_input, model_dropdown],
184
+ outputs=[status_output, download_output]
185
+ )
186
+
187
+ # Launch the app with queuing enabled
188
+ if __name__ == "__main__":
189
+ app.queue().launch()
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ gradio
2
+ google-generativeai==0.8.3
3
+ pyyaml==6.0.1
4
+ ffmpeg-python==0.2.0
5
+ psutil==5.9.0
6
+ requests==2.31.0
transcribe_core.py ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Simplified transcription core for HuggingFace Spaces deployment.
3
+ Version with chunking support for large files (>30MB).
4
+ Now supports multiple AI providers via provider abstraction.
5
+ """
6
+
7
+ import os
8
+ from datetime import date, timedelta
9
+ import yaml
10
+ import uuid
11
+ from typing import List, Dict, Tuple
12
+ import ffmpeg
13
+ import gc
14
+ import psutil
15
+ import zipfile
16
+ import time
17
+ from ai_providers import TranscriptionProvider
18
+
19
+
20
+ def format_timestamp(seconds: float) -> str:
21
+ """Convert seconds to ffmpeg time format (HH:MM:SS.xxx)."""
22
+ td = timedelta(seconds=float(seconds))
23
+ hours = int(seconds // 3600)
24
+ minutes = int((seconds % 3600) // 60)
25
+ secs = seconds % 60
26
+ return f"{hours:02d}:{minutes:02d}:{secs:06.3f}"
27
+
28
+
29
+ def check_memory_usage() -> bool:
30
+ """Check current memory usage and print warning if too high."""
31
+ process = psutil.Process()
32
+ memory_percent = process.memory_percent()
33
+ if memory_percent > 80:
34
+ print(f"Warning: High memory usage ({memory_percent:.1f}%)")
35
+ return False
36
+ return True
37
+
38
+
39
+ def clean_partial_chunks(base_file_path: str) -> None:
40
+ """Clean up any existing partial chunks before starting."""
41
+ try:
42
+ base_name = os.path.splitext(os.path.basename(base_file_path))[0]
43
+ output_folder = os.path.dirname(base_file_path)
44
+ pattern = f"{base_name}_part*"
45
+
46
+ print(f"Cleaning up any existing chunks matching: {pattern}")
47
+ for file in os.listdir(output_folder):
48
+ if file.startswith(f"{base_name}_part") and file.endswith(".mp3"):
49
+ file_path = os.path.join(output_folder, file)
50
+ try:
51
+ os.remove(file_path)
52
+ print(f"Removed existing chunk: {file}")
53
+ except Exception as e:
54
+ print(f"Warning: Could not remove {file}: {e}")
55
+ except Exception as e:
56
+ print(f"Warning: Error during cleanup: {e}")
57
+
58
+
59
+ def chunk_audio_file(audio_file_path: str, chunk_duration_minutes: int = 25, overlap_seconds: int = 5) -> List[str]:
60
+ """Chunks an audio file into smaller parts using ffmpeg streaming."""
61
+ chunked_files = []
62
+ try:
63
+ # Clean up any existing chunks first
64
+ clean_partial_chunks(audio_file_path)
65
+
66
+ # Get audio duration
67
+ print("\nAnalyzing audio file duration...")
68
+ duration = get_audio_duration(audio_file_path)
69
+ if duration is None:
70
+ print("Error: Could not determine audio file duration.")
71
+ return chunked_files
72
+
73
+ chunk_length = chunk_duration_minutes * 60
74
+ overlap = overlap_seconds
75
+ start_time = 0
76
+ chunk_index = 1
77
+
78
+ base_name = os.path.splitext(os.path.basename(audio_file_path))[0]
79
+ output_folder = os.path.dirname(audio_file_path)
80
+
81
+ total_chunks = int((duration - overlap) / (chunk_length - overlap)) + 1
82
+ print(f"\nChunking audio file: {audio_file_path}")
83
+ print(f"Total duration: {format_timestamp(duration)}")
84
+ print(f"Chunk duration: {chunk_duration_minutes} minutes, Overlap: {overlap_seconds} seconds")
85
+ print(f"Estimated number of chunks: {total_chunks}\n")
86
+
87
+ while start_time < duration:
88
+ if not check_memory_usage():
89
+ print("Memory usage too high, waiting before continuing...")
90
+ time.sleep(5)
91
+ continue
92
+
93
+ # Calculate end time for current chunk
94
+ end_time = min(start_time + chunk_length, duration)
95
+
96
+ # Make sure we don't create a tiny final chunk
97
+ if end_time - start_time < 30: # If chunk would be less than 30 seconds
98
+ if chunk_index > 1: # If not the first chunk
99
+ break # Skip creating this small final chunk
100
+ end_time = duration # If it's the first chunk, include all audio
101
+
102
+ chunk_file_name = f"{base_name}_part{chunk_index}.mp3"
103
+ chunk_file_path = os.path.join(output_folder, chunk_file_name)
104
+
105
+ print(f"Creating chunk {chunk_index}/{total_chunks}: {chunk_file_name}")
106
+ print(f" Time range: {format_timestamp(start_time)} to {format_timestamp(end_time)}")
107
+
108
+ try:
109
+ # Use ffmpeg to extract chunk
110
+ if os.path.exists(chunk_file_path):
111
+ os.remove(chunk_file_path)
112
+
113
+ stream = ffmpeg.input(audio_file_path, ss=start_time, t=end_time-start_time)
114
+ stream = ffmpeg.output(stream, chunk_file_path, acodec='libmp3lame', loglevel='error')
115
+ ffmpeg.run(stream, capture_stdout=True, capture_stderr=True, overwrite_output=True)
116
+
117
+ if os.path.exists(chunk_file_path):
118
+ chunk_size = os.path.getsize(chunk_file_path) / (1024 * 1024)
119
+ print(f" โœ“ Saved chunk: {chunk_file_path} ({chunk_size:.2f}MB)")
120
+ chunked_files.append(chunk_file_path)
121
+ chunk_index += 1
122
+ else:
123
+ print(f" โœ— Error: Chunk file was not created")
124
+ break
125
+
126
+ except ffmpeg.Error as e:
127
+ print(f" โœ— Error processing chunk: {e.stderr.decode() if e.stderr else str(e)}")
128
+ break
129
+
130
+ # Update start time for next chunk, considering overlap
131
+ if end_time == duration: # If this was the last chunk
132
+ break
133
+ start_time = end_time - overlap
134
+
135
+ # Force garbage collection after each chunk
136
+ gc.collect()
137
+
138
+ created_chunks = chunk_index - 1
139
+ print(f"\nAudio file chunking completed:")
140
+ print(f"- Created {created_chunks} out of {total_chunks} expected chunks")
141
+ print(f"- Final chunk duration: {format_timestamp(end_time - start_time)}")
142
+
143
+ except Exception as e:
144
+ print(f"Error during audio chunking: {e}")
145
+
146
+ return chunked_files
147
+
148
+
149
+ def get_audio_duration(file_path: str) -> float:
150
+ """Get the duration of an audio file using ffmpeg."""
151
+ try:
152
+ probe = ffmpeg.probe(file_path)
153
+ duration = float(probe['format']['duration'])
154
+ return duration
155
+ except Exception as e:
156
+ raise Exception(f"Error getting audio duration: {e}")
157
+
158
+
159
+ def generate_transcription(audio_file_path: str, provider: TranscriptionProvider) -> str:
160
+ """
161
+ Generate transcription using the configured AI provider.
162
+
163
+ Args:
164
+ audio_file_path: Path to audio file
165
+ provider: TranscriptionProvider instance (Gemini or HuggingFace)
166
+
167
+ Returns:
168
+ Transcription text (with timestamps/speakers for Gemini, plain text for HF)
169
+ """
170
+ try:
171
+ return provider.transcribe(audio_file_path)
172
+ except Exception as e:
173
+ raise Exception(f"Error during transcription: {e}")
174
+
175
+
176
+ def generate_summary(transcription_text: str, provider: TranscriptionProvider) -> str:
177
+ """
178
+ Generate a concise 2-3 sentence summary using the configured provider.
179
+
180
+ Args:
181
+ transcription_text: Full transcription
182
+ provider: TranscriptionProvider instance
183
+
184
+ Returns:
185
+ Summary text
186
+ """
187
+ try:
188
+ return provider.generate_summary(transcription_text)
189
+ except Exception as e:
190
+ return f"Error generating summary: {e}"
191
+
192
+
193
+ def generate_key_ideas(transcription_text: str, provider: TranscriptionProvider) -> List[Dict[str, str]]:
194
+ """
195
+ Identify 3-5 key ideas from the transcription using the configured provider.
196
+
197
+ Args:
198
+ transcription_text: Full transcription
199
+ provider: TranscriptionProvider instance
200
+
201
+ Returns:
202
+ List of {idea, description} dictionaries
203
+ """
204
+ try:
205
+ return provider.generate_key_ideas(transcription_text)
206
+ except Exception as e:
207
+ return [{'idea': 'Error generating key ideas', 'description': str(e)}]
208
+
209
+
210
+ def create_transcript_markdown(audio_filename: str, transcription: str, summary: str, key_ideas: List[Dict[str, str]]) -> str:
211
+ """
212
+ Create a formatted markdown file with YAML frontmatter.
213
+
214
+ Args:
215
+ audio_filename: Name of the audio file
216
+ transcription: Full transcription text
217
+ summary: Summary text
218
+ key_ideas: List of key ideas
219
+
220
+ Returns:
221
+ Formatted markdown content
222
+ """
223
+ base_name = os.path.splitext(audio_filename)[0]
224
+
225
+ # Build YAML frontmatter
226
+ yaml_metadata = {
227
+ 'title': base_name,
228
+ 'audio_file': audio_filename,
229
+ 'date_processed': str(date.today()),
230
+ 'summary': summary,
231
+ 'key_ideas': key_ideas,
232
+ 'note_id': str(uuid.uuid4())
233
+ }
234
+
235
+ yaml_frontmatter = "---\n" + yaml.dump(yaml_metadata, sort_keys=False, indent=2, allow_unicode=True) + "---\n\n"
236
+
237
+ # Build content sections
238
+ content = yaml_frontmatter
239
+
240
+ # Key ideas section
241
+ content += "## Key Ideas\n\n"
242
+ if key_ideas:
243
+ for idea_item in key_ideas:
244
+ if idea_item['description']:
245
+ content += f"- **{idea_item['idea']}:** {idea_item['description']}\n"
246
+ else:
247
+ content += f"- **{idea_item['idea']}**\n"
248
+ else:
249
+ content += "*(No key ideas generated)*\n"
250
+
251
+ content += "\n## Full Transcription\n\n"
252
+ content += transcription
253
+
254
+ return content
255
+
256
+
257
+ def process_audio_file(audio_file_path: str, gemini_provider: TranscriptionProvider, openrouter_provider: TranscriptionProvider = None, progress_callback=None) -> Tuple[str, str]:
258
+ """
259
+ Process an audio file and return the markdown content or ZIP of multiple files.
260
+
261
+ Args:
262
+ audio_file_path: Path to audio file
263
+ gemini_provider: GeminiProvider for transcription
264
+ openrouter_provider: Optional OpenRouterProvider for summary/ideas (if None, uses gemini_provider)
265
+ progress_callback: Optional callback function for progress updates
266
+
267
+ Returns:
268
+ Tuple of (output_file_path, is_zip_boolean_as_string)
269
+ - If single file: ("path/to/file.md", "False")
270
+ - If chunked: ("path/to/file.zip", "True")
271
+ """
272
+ audio_filename = os.path.basename(audio_file_path)
273
+ base_name = os.path.splitext(audio_filename)[0]
274
+
275
+ # Check file size
276
+ file_size_mb = os.path.getsize(audio_file_path) / (1024 * 1024)
277
+ print(f"\nProcessing: {audio_filename} ({file_size_mb:.2f}MB)")
278
+
279
+ # Determine if chunking is needed
280
+ files_to_transcribe = []
281
+ if file_size_mb > 30:
282
+ print(f"File is larger than 30MB. Chunking into smaller parts...")
283
+ if progress_callback:
284
+ progress_callback("๐Ÿ“ฆ Chunking large audio file...", 0.1)
285
+
286
+ chunked_files = chunk_audio_file(audio_file_path)
287
+ files_to_transcribe.extend(chunked_files)
288
+ else:
289
+ print("File is small enough to process directly")
290
+ files_to_transcribe.append(audio_file_path)
291
+
292
+ # Process each file (chunk or original)
293
+ markdown_files = []
294
+ total_files = len(files_to_transcribe)
295
+
296
+ for idx, file_path in enumerate(files_to_transcribe, 1):
297
+ file_name = os.path.basename(file_path)
298
+ print(f"\nTranscribing {idx}/{total_files}: {file_name}")
299
+
300
+ if progress_callback:
301
+ progress = 0.2 + (0.6 * (idx - 1) / total_files)
302
+ progress_callback(f"๐ŸŽ™๏ธ Transcribing part {idx}/{total_files}...", progress)
303
+
304
+ # Transcribe using Gemini
305
+ transcription = generate_transcription(file_path, gemini_provider)
306
+
307
+ if progress_callback:
308
+ progress_callback(f"๐Ÿ“ Generating metadata for part {idx}/{total_files}...", progress + 0.1)
309
+
310
+ # Generate metadata using OpenRouter if available, otherwise Gemini
311
+ text_provider = openrouter_provider if openrouter_provider else gemini_provider
312
+ summary = generate_summary(transcription, text_provider)
313
+ key_ideas = generate_key_ideas(transcription, text_provider)
314
+
315
+ # Create markdown
316
+ markdown_content = create_transcript_markdown(file_name, transcription, summary, key_ideas)
317
+
318
+ # Save markdown file to outputs directory
319
+ output_dir = "outputs"
320
+ os.makedirs(output_dir, exist_ok=True)
321
+
322
+ output_filename = os.path.splitext(file_name)[0] + ".md"
323
+ markdown_path = os.path.join(output_dir, output_filename)
324
+
325
+ with open(markdown_path, 'w', encoding='utf-8') as f:
326
+ f.write(markdown_content)
327
+
328
+ markdown_files.append(markdown_path)
329
+
330
+ # Clean up chunk audio file
331
+ if "_part" in file_name:
332
+ try:
333
+ os.remove(file_path)
334
+ print(f"Deleted chunk: {file_name}")
335
+ except Exception as e:
336
+ print(f"Warning: Could not delete chunk {file_name}: {e}")
337
+
338
+ # Return result
339
+ if len(markdown_files) == 1:
340
+ # Single file - return as-is
341
+ return markdown_files[0], "False"
342
+ else:
343
+ # Multiple files - create ZIP
344
+ if progress_callback:
345
+ progress_callback("๐Ÿ“ฆ Creating ZIP file...", 0.9)
346
+
347
+ output_dir = "outputs"
348
+ os.makedirs(output_dir, exist_ok=True)
349
+
350
+ zip_filename = f"{base_name}_transcripts.zip"
351
+ zip_path = os.path.join(output_dir, zip_filename)
352
+
353
+ with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zipf:
354
+ for md_file in markdown_files:
355
+ # Add with proper filename
356
+ basename = os.path.basename(md_file)
357
+ zipf.write(md_file, basename)
358
+ # Delete individual md files after adding to ZIP
359
+ try:
360
+ os.remove(md_file)
361
+ except Exception as e:
362
+ print(f"Warning: Could not delete {md_file}: {e}")
363
+
364
+ print(f"\nโœ… Created ZIP with {len(markdown_files)} transcripts: {zip_filename}")
365
+ return zip_path, "True"