transcriber-elevenlabs-gemini

Build error

App Files Files Community

dalinstone commited on Feb 14

Commit

5bdcfc9

verified ·

1 Parent(s): fe7ca23

Update app.py

Browse files

Files changed (1) hide show

app.py +26 -10

app.py CHANGED Viewed

@@ -147,25 +147,30 @@ async def process_audio(api_key, audio_file, num_speakers):
 # --- Enhancement ---
-ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance this transcript for maximum readability while maintaining the core message.
                 IMPORTANT: Respond ONLY with the enhanced transcript. Do not include any explanations, headers, or phrases like "Here is the transcript."
                 Note: Below you'll find an auto-generated transcript that may help with speaker identification, but focus on cleaning the transcript into a high-quality transcript from the audio.
-                Think about your job as if you were transcribing an interview for a print book where the priority is the reading audience. It should just be a total pleasure to read this as a written artifact where all the flubs and repetitions and conversational artifacts and filler words and false starts are removed, where a bunch of helpful punctuation is added. It should basically read like somebody wrote it specifically for reading rather than just something somebody said extemporaneously.
                 Please:
                 1. Fix speaker attribution errors, especially at segment boundaries. Watch for incomplete thoughts that were likely from the previous speaker.
-                2. Optimize AGGRESSIVELY for accuracy:
-                - Accuracy is the most important thing!!
-                - Remove ALL conversational artifacts (yeah, so, I mean, etc.)
-                - Remove ALL filler words (um, uh, like, you know)
                 - Remove false starts and self-corrections completely
-                - Remove redundant phrases and hesitations and stutters.
                 - DO NOT REPHRASE, PARAPHRASE, OR MODIFY WHAT THE SPEAKER SAYS. You should only be cleaning up the transcript, not generating what you think is a better way of saying something.
-                - Break up run-on sentences into clear, concise statements
                 - Maintain natural conversation flow
                 3. Format the output consistently:
                 - Keep the "Speaker X: Words I am saying" format (no brackets, no other formatting)
@@ -184,8 +189,19 @@ ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance
                 - Focus on the most important discussion points
                 - Skip minor tangents or small talk
                 - When grouping topics or discussion points you should place 3 equal signs (example: ===) at the beginning of the topic point with no other text or numbers. These will be used in some other program to parse the 'topic sections'. Do not place "===" at the end of the topic section, only at the beginning of a new topic section.
                 - In a 2 hour long podcast episode we would expect between 5 - 8 topic sections, not 9 - 15 topic sections (unless clearly warranted).
                 Example input:
                 00:00:00,379 --> 00:01:15,530 [speaker_0]
@@ -203,9 +219,9 @@ ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance
                 ===
-                Speaker A: I've been working on this new project at work, and we're seeing amazing results with our new approach. It's really transforming how we do things. I wonder, how did the client react to the work you did?.
-                Speaker B: When we showed it to the client last week, they were completely blown away by what we achieved. They couldn't believe it was the same system they had before.
                 Enhance the following transcript, starting directly with the speaker format: '''

 # --- Enhancement ---
+ENHANCE_PROMPT = '''You are an expert transcript editor. Your task is to enhance this transcript for maximum readability while preserving the FULL substance and detail of everything each speaker says.
                 IMPORTANT: Respond ONLY with the enhanced transcript. Do not include any explanations, headers, or phrases like "Here is the transcript."
                 Note: Below you'll find an auto-generated transcript that may help with speaker identification, but focus on cleaning the transcript into a high-quality transcript from the audio.
+                Think about your job as if you were transcribing an interview for a print book where the priority is the reading audience. It should just be a total pleasure to read this as a written artifact where all the flubs and repetitions and conversational artifacts and filler words and false starts are removed, where a bunch of helpful punctuation is added. It should basically read like somebody wrote it specifically for reading rather than just something said extemporaneously.
+                CRITICAL PRINCIPLE: Your job is to CLEAN the transcript, not to SHORTEN it. The enhanced transcript should retain virtually all of the substantive content from the original. If a speaker makes a point, gives an example, walks through reasoning, or provides a detail — it MUST remain in the output. You are removing verbal noise, not content. A good rule of thumb: if you find yourself producing something significantly shorter, you are cutting too much.
                 Please:
                 1. Fix speaker attribution errors, especially at segment boundaries. Watch for incomplete thoughts that were likely from the previous speaker.
+                2. Optimize for accuracy and completeness:
+                - Accuracy and completeness are the most important things.
+                - Remove filler words (um, uh, like, you know) and conversational artifacts (yeah, so, I mean, right)
                 - Remove false starts and self-corrections completely
+                - Remove stutters, redundant restarts of the same phrase, and hesitations
                 - DO NOT REPHRASE, PARAPHRASE, OR MODIFY WHAT THE SPEAKER SAYS. You should only be cleaning up the transcript, not generating what you think is a better way of saying something.
+                - DO NOT SUMMARIZE OR CONDENSE. If a speaker takes three sentences to explain something, the output should still be approximately three sentences — just without the filler words. Never collapse a multi-sentence explanation into a single sentence.
+                - DO NOT DROP substantive content: examples, analogies, qualifications, reasoning steps, asides that carry meaning, or any detail that contributes to the speaker's argument or narrative. When in doubt about whether something is filler or substance, KEEP IT.
+                - Preserve the speaker's natural voice and characteristic phrases. If a speaker says "I think" or "my guess is" as a genuine hedge or qualifier (not as a verbal tic), keep it.
+                - Break up run-on sentences into clear, readable statements while preserving their full content
                 - Maintain natural conversation flow
+                - IMPORTANT: Apply the same level of detail preservation throughout the ENTIRE transcript. Do not become more aggressive with cutting as the transcript gets longer. The last 30 minutes should be treated with the same care as the first 30 minutes.
                 3. Format the output consistently:
                 - Keep the "Speaker X: Words I am saying" format (no brackets, no other formatting)
                 - Focus on the most important discussion points
                 - Skip minor tangents or small talk
                 - When grouping topics or discussion points you should place 3 equal signs (example: ===) at the beginning of the topic point with no other text or numbers. These will be used in some other program to parse the 'topic sections'. Do not place "===" at the end of the topic section, only at the beginning of a new topic section.
+                - Ensure that the speaker is separated by TWO semi colons in direct succession '::'
                 - In a 2 hour long podcast episode we would expect between 5 - 8 topic sections, not 9 - 15 topic sections (unless clearly warranted).
+                Example of what TO remove vs. what to KEEP:
+                Original: "Yeah, so, I mean, like, the thing is, uh, when we, when we showed it to the client last week, they were, you know, completely blown away by what we achieved. Like, they, they couldn't even, you know, they couldn't believe it was the same system they had before. And I remember the CFO turned to me and said, you know, 'This is going to save us millions.' Which, you know, that was a big moment for the team."
+                CORRECT (clean but complete): "When we showed it to the client last week, they were completely blown away by what we achieved. They couldn't believe it was the same system they had before. I remember the CFO turned to me and said, 'This is going to save us millions.' That was a big moment for the team."
+                WRONG (summarized): "When we showed it to the client last week, they were completely blown away. The CFO said it would save them millions."
+                The WRONG version drops the detail about not believing it was the same system and the emotional significance to the team. These are substance, not filler.
                 Example input:
                 00:00:00,379 --> 00:01:15,530 [speaker_0]
                 ===
+                Speaker A:: I've been working on this new project at work, and what's really interesting is that we're seeing amazing results with our new approach. It's really transforming how we do things. I wonder, how did the client react to the work you did?
+                Speaker B:: The thing is, when we showed it to the client last week, they were completely blown away by what we achieved. They couldn't believe it was the same system they had before.
                 Enhance the following transcript, starting directly with the speaker format: '''