Spaces:

andrijdavid
/

diarization

Sleeping

App Files Files Community

andrijdavid Qwen-Coder commited on Oct 9, 2025

Commit

5bff499

1 Parent(s): ddae84a

Fix transcription errors by improving audio segment handling\n\n- Add checks for empty audio segments to avoid creating invalid files\n- Pad very short audio segments to ensure Whisper compatibility\n- Use explicit WAV format with PCM_16 subtype for better compatibility\n- Add error handling around transcription to gracefully handle segment errors

Browse files

Files changed (1) hide show

app.py +25 -9

app.py CHANGED Viewed

@@ -189,19 +189,35 @@ class DiarizationTranscriptionTranslation:
                 end_sample = int(segment["end"] * orig_sr)
                 segment_audio = audio[start_sample:end_sample]
                 # Save the segment as a temporary file for Whisper
                 temp_file = f"temp_segment_{segment['start']}_{segment['end']}.wav"
-                sf.write(temp_file, segment_audio, orig_sr)
                 # Transcribe the segment
-                transcription_result = self.transcribe_audio(temp_file)
-                # Handle both possible return formats
-                if isinstance(transcription_result, dict) and "text" in transcription_result:
-                    transcribed_text = transcription_result["text"]
-                elif isinstance(transcription_result, str):
-                    transcribed_text = transcription_result
-                else:
-                    transcribed_text = str(transcription_result)
                 # Translate if necessary
                 translated_text = self.translate_text(transcribed_text)

                 end_sample = int(segment["end"] * orig_sr)
                 segment_audio = audio[start_sample:end_sample]
+                # Ensure segment_audio is not empty
+                if len(segment_audio) == 0:
+                    continue  # Skip empty segments
+                # Add a small amount of silence if segment is too short for Whisper
+                if len(segment_audio) < orig_sr * 0.1:  # Less than 0.1 seconds
+                    min_samples = int(orig_sr * 0.1)
+                    zeros_to_add = min_samples - len(segment_audio)
+                    segment_audio = np.pad(segment_audio, (0, zeros_to_add), mode='constant')
                 # Save the segment as a temporary file for Whisper
                 temp_file = f"temp_segment_{segment['start']}_{segment['end']}.wav"
+                # Use subtype parameter to ensure proper WAV format
+                sf.write(temp_file, segment_audio, orig_sr, format='WAV', subtype='PCM_16')
                 # Transcribe the segment
+                try:
+                    transcription_result = self.transcribe_audio(temp_file)
+                    # Handle both possible return formats
+                    if isinstance(transcription_result, dict) and "text" in transcription_result:
+                        transcribed_text = transcription_result["text"]
+                    elif isinstance(transcription_result, str):
+                        transcribed_text = transcription_result
+                    else:
+                        transcribed_text = str(transcription_result)
+                except Exception as e:
+                    print(f"Error transcribing segment {temp_file}: {str(e)}")
+                    transcribed_text = f"Transcription error: {str(e)}"
+                    # Continue with the error message as the transcription
                 # Translate if necessary
                 translated_text = self.translate_text(transcribed_text)