Shatha2030 commited on
Commit
90f2b62
·
verified ·
1 Parent(s): f24bd97

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +191 -0
README.md CHANGED
@@ -11,3 +11,194 @@ short_description: Text extraction and summarization
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ Speech and Video Transcription & Summarization App
16
+
17
+ This project is a Python-based application that provides an interactive Gradio interface for extracting text from audio and video files using OpenAI's Whisper model and summarizing the extracted text using an Arabic BART summarization model.
18
+
19
+ Table of Contents
20
+
21
+ Introduction
22
+
23
+ Features
24
+
25
+ Requirements
26
+
27
+ Installation & Setup
28
+
29
+ Usage
30
+
31
+ Code Explanation
32
+
33
+ 1. Basic Setup
34
+
35
+ 2. Model Loading
36
+
37
+ 3. Helper Functions
38
+
39
+ 4. Example Audio Processing
40
+
41
+ 5. User Interface
42
+
43
+ Notes
44
+
45
+ License
46
+
47
+ Introduction
48
+
49
+ This application provides an end-to-end solution for speech-to-text conversion from audio or video files, leveraging ASR (Automatic Speech Recognition) technology. The extracted text can then be summarized using a pre-trained BART summarization model for Arabic. The system supports both audio and video file processing, breaking down large files into smaller segments for efficient processing.
50
+
51
+ Features
52
+
53
+ Extract text from audio and video: Supports MP4, AVI, MOV, MKV formats for video, and WAV, MP3 for audio.
54
+
55
+ Handles large files: Automatically splits long audio files into 30-second segments for smoother processing.
56
+
57
+ Summarizes extracted text: Uses a fine-tuned BART model to generate concise summaries.
58
+
59
+ Interactive UI: Built with Gradio, providing a simple drag-and-drop interface.
60
+
61
+ Requirements
62
+
63
+ Python: 3.6 or later
64
+
65
+ Required Libraries:
66
+
67
+ gradio
68
+
69
+ torch
70
+
71
+ transformers
72
+
73
+ moviepy
74
+
75
+ librosa
76
+
77
+ soundfile
78
+
79
+ numpy
80
+
81
+ re
82
+
83
+ os (built-in)
84
+
85
+ GPU Support: If available, the system will use CUDA for faster processing.
86
+
87
+ Installation & Setup
88
+
89
+ Install Python: Ensure you have Python 3.6 or later installed.
90
+
91
+ Install required dependencies: Run:
92
+
93
+ pip install gradio torch transformers moviepy librosa soundfile numpy
94
+
95
+ Additional Requirements:
96
+
97
+ To process video files, install FFmpeg: FFmpeg Official Site.
98
+
99
+ Ensure an internet connection for downloading models on first run.
100
+
101
+ Usage
102
+
103
+ Run the application:
104
+
105
+ python filename.py
106
+
107
+ This will launch a Gradio interface with a local or public URL.
108
+
109
+ Using the UI:
110
+
111
+ Test Example: A sample audio file is provided; click the "Try Example ⚡" button to test it.
112
+
113
+ Upload File: Drag and drop an audio (WAV, MP3) or video (MP4, AVI, MOV) file.
114
+
115
+ Extract Text: Click "Extract Text" after uploading to convert speech to text.
116
+
117
+ Summarize Text: Once the text is extracted, click "Summarize" to generate a concise summary.
118
+
119
+ Code Explanation
120
+
121
+ 1. Basic Setup
122
+
123
+ Detect GPU availability: Uses CUDA if available.
124
+
125
+ device = "cuda" if torch.cuda.is_available() else "cpu"
126
+
127
+ 2. Model Loading
128
+
129
+ ASR Model: Uses Whisper-medium from OpenAI.
130
+
131
+ Summarization Model: Loads a fine-tuned BART model for Arabic.
132
+
133
+ pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium", device=0 if device=="cuda" else -1)
134
+ bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
135
+ bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
136
+
137
+ 3. Helper Functions
138
+
139
+ Text Cleaning: Removes extra spaces.
140
+
141
+ def clean_text(text):
142
+ return re.sub(r'\s+', ' ', text).strip()
143
+
144
+ Audio/Video Processing:
145
+
146
+ Extracts audio from video files.
147
+
148
+ Splits long audio into 30-second segments.
149
+
150
+ Uses Whisper ASR to transcribe speech into text.
151
+
152
+ def convert_audio_to_text(uploaded_file):
153
+ ...
154
+
155
+ Text Summarization: Uses BART to generate summaries.
156
+
157
+ def summarize_text(text):
158
+ ...
159
+
160
+ 4. Example Audio Processing
161
+
162
+ A sample MP3 file is provided for testing.
163
+
164
+ Function process_example_audio ensures the file exists and processes it.
165
+
166
+ EXAMPLE_AUDIO_PATH = "AUDIO-2025-02-24-22-10-37.mp3"
167
+
168
+ def process_example_audio():
169
+ if not os.path.exists(EXAMPLE_AUDIO_PATH):
170
+ return "⛔ Example file not found!"
171
+ return convert_audio_to_text(EXAMPLE_AUDIO_PATH)
172
+
173
+ 5. User Interface
174
+
175
+ Gradio UI Components:
176
+
177
+ Audio preview & Example button
178
+
179
+ File upload section
180
+
181
+ Buttons for text extraction and summarization
182
+
183
+ Textboxes to display results
184
+
185
+ Button Callbacks: Link UI buttons to processing functions.
186
+
187
+ with gr.Blocks() as demo:
188
+ ...
189
+ extract_btn.click(convert_audio_to_text, inputs=file_input, outputs=extracted_text)
190
+ summarize_btn.click(summarize_text, inputs=extracted_text, outputs=summary_output)
191
+ example_btn.click(process_example_audio, outputs=extracted_text)
192
+
193
+ Launch the App: Runs the Gradio interface.
194
+
195
+ if __name__ == "__main__":
196
+ demo.launch()
197
+
198
+ Notes
199
+
200
+ Processing Speed: Large files take longer due to segmentation and ASR processing.
201
+
202
+ Video Files: Ensure FFmpeg is installed for proper audio extraction.
203
+
204
+ Resources: Large models like Whisper and BART require GPU acceleration for optimal performance.