Spaces:
Build error
Build error
File size: 4,879 Bytes
039b3d7 90f2b62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 |
---
title: Text Extraction Summarization
emoji: 📈
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
short_description: Text extraction and summarization
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Speech and Video Transcription & Summarization App
This project is a Python-based application that provides an interactive Gradio interface for extracting text from audio and video files using OpenAI's Whisper model and summarizing the extracted text using an Arabic BART summarization model.
Table of Contents
Introduction
Features
Requirements
Installation & Setup
Usage
Code Explanation
1. Basic Setup
2. Model Loading
3. Helper Functions
4. Example Audio Processing
5. User Interface
Notes
License
Introduction
This application provides an end-to-end solution for speech-to-text conversion from audio or video files, leveraging ASR (Automatic Speech Recognition) technology. The extracted text can then be summarized using a pre-trained BART summarization model for Arabic. The system supports both audio and video file processing, breaking down large files into smaller segments for efficient processing.
Features
Extract text from audio and video: Supports MP4, AVI, MOV, MKV formats for video, and WAV, MP3 for audio.
Handles large files: Automatically splits long audio files into 30-second segments for smoother processing.
Summarizes extracted text: Uses a fine-tuned BART model to generate concise summaries.
Interactive UI: Built with Gradio, providing a simple drag-and-drop interface.
Requirements
Python: 3.6 or later
Required Libraries:
gradio
torch
transformers
moviepy
librosa
soundfile
numpy
re
os (built-in)
GPU Support: If available, the system will use CUDA for faster processing.
Installation & Setup
Install Python: Ensure you have Python 3.6 or later installed.
Install required dependencies: Run:
pip install gradio torch transformers moviepy librosa soundfile numpy
Additional Requirements:
To process video files, install FFmpeg: FFmpeg Official Site.
Ensure an internet connection for downloading models on first run.
Usage
Run the application:
python filename.py
This will launch a Gradio interface with a local or public URL.
Using the UI:
Test Example: A sample audio file is provided; click the "Try Example ⚡" button to test it.
Upload File: Drag and drop an audio (WAV, MP3) or video (MP4, AVI, MOV) file.
Extract Text: Click "Extract Text" after uploading to convert speech to text.
Summarize Text: Once the text is extracted, click "Summarize" to generate a concise summary.
Code Explanation
1. Basic Setup
Detect GPU availability: Uses CUDA if available.
device = "cuda" if torch.cuda.is_available() else "cpu"
2. Model Loading
ASR Model: Uses Whisper-medium from OpenAI.
Summarization Model: Loads a fine-tuned BART model for Arabic.
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium", device=0 if device=="cuda" else -1)
bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
3. Helper Functions
Text Cleaning: Removes extra spaces.
def clean_text(text):
return re.sub(r'\s+', ' ', text).strip()
Audio/Video Processing:
Extracts audio from video files.
Splits long audio into 30-second segments.
Uses Whisper ASR to transcribe speech into text.
def convert_audio_to_text(uploaded_file):
...
Text Summarization: Uses BART to generate summaries.
def summarize_text(text):
...
4. Example Audio Processing
A sample MP3 file is provided for testing.
Function process_example_audio ensures the file exists and processes it.
EXAMPLE_AUDIO_PATH = "AUDIO-2025-02-24-22-10-37.mp3"
def process_example_audio():
if not os.path.exists(EXAMPLE_AUDIO_PATH):
return "⛔ Example file not found!"
return convert_audio_to_text(EXAMPLE_AUDIO_PATH)
5. User Interface
Gradio UI Components:
Audio preview & Example button
File upload section
Buttons for text extraction and summarization
Textboxes to display results
Button Callbacks: Link UI buttons to processing functions.
with gr.Blocks() as demo:
...
extract_btn.click(convert_audio_to_text, inputs=file_input, outputs=extracted_text)
summarize_btn.click(summarize_text, inputs=extracted_text, outputs=summary_output)
example_btn.click(process_example_audio, outputs=extracted_text)
Launch the App: Runs the Gradio interface.
if __name__ == "__main__":
demo.launch()
Notes
Processing Speed: Large files take longer due to segmentation and ASR processing.
Video Files: Ensure FFmpeg is installed for proper audio extraction.
Resources: Large models like Whisper and BART require GPU acceleration for optimal performance.
|