dalinstone commited on
Commit
5bdcfc9
·
verified ·
1 Parent(s): fe7ca23

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +26 -10
app.py CHANGED
@@ -147,25 +147,30 @@ async def process_audio(api_key, audio_file, num_speakers):
147
 
148
 
149
  # --- Enhancement ---
150
- ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance this transcript for maximum readability while maintaining the core message.
151
  IMPORTANT: Respond ONLY with the enhanced transcript. Do not include any explanations, headers, or phrases like "Here is the transcript."
152
 
153
  Note: Below you'll find an auto-generated transcript that may help with speaker identification, but focus on cleaning the transcript into a high-quality transcript from the audio.
154
 
155
- Think about your job as if you were transcribing an interview for a print book where the priority is the reading audience. It should just be a total pleasure to read this as a written artifact where all the flubs and repetitions and conversational artifacts and filler words and false starts are removed, where a bunch of helpful punctuation is added. It should basically read like somebody wrote it specifically for reading rather than just something somebody said extemporaneously.
 
 
156
 
157
  Please:
158
  1. Fix speaker attribution errors, especially at segment boundaries. Watch for incomplete thoughts that were likely from the previous speaker.
159
 
160
- 2. Optimize AGGRESSIVELY for accuracy:
161
- - Accuracy is the most important thing!!
162
- - Remove ALL conversational artifacts (yeah, so, I mean, etc.)
163
- - Remove ALL filler words (um, uh, like, you know)
164
  - Remove false starts and self-corrections completely
165
- - Remove redundant phrases and hesitations and stutters.
166
  - DO NOT REPHRASE, PARAPHRASE, OR MODIFY WHAT THE SPEAKER SAYS. You should only be cleaning up the transcript, not generating what you think is a better way of saying something.
167
- - Break up run-on sentences into clear, concise statements
 
 
 
168
  - Maintain natural conversation flow
 
169
 
170
  3. Format the output consistently:
171
  - Keep the "Speaker X: Words I am saying" format (no brackets, no other formatting)
@@ -184,8 +189,19 @@ ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance
184
  - Focus on the most important discussion points
185
  - Skip minor tangents or small talk
186
  - When grouping topics or discussion points you should place 3 equal signs (example: ===) at the beginning of the topic point with no other text or numbers. These will be used in some other program to parse the 'topic sections'. Do not place "===" at the end of the topic section, only at the beginning of a new topic section.
 
187
  - In a 2 hour long podcast episode we would expect between 5 - 8 topic sections, not 9 - 15 topic sections (unless clearly warranted).
188
 
 
 
 
 
 
 
 
 
 
 
189
  Example input:
190
 
191
  00:00:00,379 --> 00:01:15,530 [speaker_0]
@@ -203,9 +219,9 @@ ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance
203
 
204
  ===
205
 
206
- Speaker A: I've been working on this new project at work, and we're seeing amazing results with our new approach. It's really transforming how we do things. I wonder, how did the client react to the work you did?.
207
 
208
- Speaker B: When we showed it to the client last week, they were completely blown away by what we achieved. They couldn't believe it was the same system they had before.
209
 
210
  Enhance the following transcript, starting directly with the speaker format: '''
211
 
 
147
 
148
 
149
  # --- Enhancement ---
150
+ ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance this transcript for maximum readability while preserving the FULL substance and detail of everything each speaker says.
151
  IMPORTANT: Respond ONLY with the enhanced transcript. Do not include any explanations, headers, or phrases like "Here is the transcript."
152
 
153
  Note: Below you'll find an auto-generated transcript that may help with speaker identification, but focus on cleaning the transcript into a high-quality transcript from the audio.
154
 
155
+ Think about your job as if you were transcribing an interview for a print book where the priority is the reading audience. It should just be a total pleasure to read this as a written artifact where all the flubs and repetitions and conversational artifacts and filler words and false starts are removed, where a bunch of helpful punctuation is added. It should basically read like somebody wrote it specifically for reading rather than just something said extemporaneously.
156
+
157
+ CRITICAL PRINCIPLE: Your job is to CLEAN the transcript, not to SHORTEN it. The enhanced transcript should retain virtually all of the substantive content from the original. If a speaker makes a point, gives an example, walks through reasoning, or provides a detail — it MUST remain in the output. You are removing verbal noise, not content. A good rule of thumb: if you find yourself producing something significantly shorter, you are cutting too much.
158
 
159
  Please:
160
  1. Fix speaker attribution errors, especially at segment boundaries. Watch for incomplete thoughts that were likely from the previous speaker.
161
 
162
+ 2. Optimize for accuracy and completeness:
163
+ - Accuracy and completeness are the most important things.
164
+ - Remove filler words (um, uh, like, you know) and conversational artifacts (yeah, so, I mean, right)
 
165
  - Remove false starts and self-corrections completely
166
+ - Remove stutters, redundant restarts of the same phrase, and hesitations
167
  - DO NOT REPHRASE, PARAPHRASE, OR MODIFY WHAT THE SPEAKER SAYS. You should only be cleaning up the transcript, not generating what you think is a better way of saying something.
168
+ - DO NOT SUMMARIZE OR CONDENSE. If a speaker takes three sentences to explain something, the output should still be approximately three sentences — just without the filler words. Never collapse a multi-sentence explanation into a single sentence.
169
+ - DO NOT DROP substantive content: examples, analogies, qualifications, reasoning steps, asides that carry meaning, or any detail that contributes to the speaker's argument or narrative. When in doubt about whether something is filler or substance, KEEP IT.
170
+ - Preserve the speaker's natural voice and characteristic phrases. If a speaker says "I think" or "my guess is" as a genuine hedge or qualifier (not as a verbal tic), keep it.
171
+ - Break up run-on sentences into clear, readable statements while preserving their full content
172
  - Maintain natural conversation flow
173
+ - IMPORTANT: Apply the same level of detail preservation throughout the ENTIRE transcript. Do not become more aggressive with cutting as the transcript gets longer. The last 30 minutes should be treated with the same care as the first 30 minutes.
174
 
175
  3. Format the output consistently:
176
  - Keep the "Speaker X: Words I am saying" format (no brackets, no other formatting)
 
189
  - Focus on the most important discussion points
190
  - Skip minor tangents or small talk
191
  - When grouping topics or discussion points you should place 3 equal signs (example: ===) at the beginning of the topic point with no other text or numbers. These will be used in some other program to parse the 'topic sections'. Do not place "===" at the end of the topic section, only at the beginning of a new topic section.
192
+ - Ensure that the speaker is separated by TWO semi colons in direct succession '::'
193
  - In a 2 hour long podcast episode we would expect between 5 - 8 topic sections, not 9 - 15 topic sections (unless clearly warranted).
194
 
195
+ Example of what TO remove vs. what to KEEP:
196
+
197
+ Original: "Yeah, so, I mean, like, the thing is, uh, when we, when we showed it to the client last week, they were, you know, completely blown away by what we achieved. Like, they, they couldn't even, you know, they couldn't believe it was the same system they had before. And I remember the CFO turned to me and said, you know, 'This is going to save us millions.' Which, you know, that was a big moment for the team."
198
+
199
+ CORRECT (clean but complete): "When we showed it to the client last week, they were completely blown away by what we achieved. They couldn't believe it was the same system they had before. I remember the CFO turned to me and said, 'This is going to save us millions.' That was a big moment for the team."
200
+
201
+ WRONG (summarized): "When we showed it to the client last week, they were completely blown away. The CFO said it would save them millions."
202
+
203
+ The WRONG version drops the detail about not believing it was the same system and the emotional significance to the team. These are substance, not filler.
204
+
205
  Example input:
206
 
207
  00:00:00,379 --> 00:01:15,530 [speaker_0]
 
219
 
220
  ===
221
 
222
+ Speaker A:: I've been working on this new project at work, and what's really interesting is that we're seeing amazing results with our new approach. It's really transforming how we do things. I wonder, how did the client react to the work you did?
223
 
224
+ Speaker B:: The thing is, when we showed it to the client last week, they were completely blown away by what we achieved. They couldn't believe it was the same system they had before.
225
 
226
  Enhance the following transcript, starting directly with the speaker format: '''
227