Spaces:
Build error
A newer version of the Gradio SDK is available:
6.5.1
title: Text Extraction Summarization
emoji: 📈
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
short_description: Text extraction and summarization
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Speech and Video Transcription & Summarization App
This project is a Python-based application that provides an interactive Gradio interface for extracting text from audio and video files using OpenAI's Whisper model and summarizing the extracted text using an Arabic BART summarization model.
Table of Contents
Introduction
Features
Requirements
Installation & Setup
Usage
Code Explanation
Basic Setup
Model Loading
Helper Functions
Example Audio Processing
User Interface
Notes
License
Introduction
This application provides an end-to-end solution for speech-to-text conversion from audio or video files, leveraging ASR (Automatic Speech Recognition) technology. The extracted text can then be summarized using a pre-trained BART summarization model for Arabic. The system supports both audio and video file processing, breaking down large files into smaller segments for efficient processing.
Features
Extract text from audio and video: Supports MP4, AVI, MOV, MKV formats for video, and WAV, MP3 for audio.
Handles large files: Automatically splits long audio files into 30-second segments for smoother processing.
Summarizes extracted text: Uses a fine-tuned BART model to generate concise summaries.
Interactive UI: Built with Gradio, providing a simple drag-and-drop interface.
Requirements
Python: 3.6 or later
Required Libraries:
gradio
torch
transformers
moviepy
librosa
soundfile
numpy
re
os (built-in)
GPU Support: If available, the system will use CUDA for faster processing.
Installation & Setup
Install Python: Ensure you have Python 3.6 or later installed.
Install required dependencies: Run:
pip install gradio torch transformers moviepy librosa soundfile numpy
Additional Requirements:
To process video files, install FFmpeg: FFmpeg Official Site.
Ensure an internet connection for downloading models on first run.
Usage
Run the application:
python filename.py
This will launch a Gradio interface with a local or public URL.
Using the UI:
Test Example: A sample audio file is provided; click the "Try Example ⚡" button to test it.
Upload File: Drag and drop an audio (WAV, MP3) or video (MP4, AVI, MOV) file.
Extract Text: Click "Extract Text" after uploading to convert speech to text.
Summarize Text: Once the text is extracted, click "Summarize" to generate a concise summary.
Code Explanation
- Basic Setup
Detect GPU availability: Uses CUDA if available.
device = "cuda" if torch.cuda.is_available() else "cpu"
- Model Loading
ASR Model: Uses Whisper-medium from OpenAI.
Summarization Model: Loads a fine-tuned BART model for Arabic.
pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium", device=0 if device=="cuda" else -1) bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart") bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")
- Helper Functions
Text Cleaning: Removes extra spaces.
def clean_text(text): return re.sub(r'\s+', ' ', text).strip()
Audio/Video Processing:
Extracts audio from video files.
Splits long audio into 30-second segments.
Uses Whisper ASR to transcribe speech into text.
def convert_audio_to_text(uploaded_file): ...
Text Summarization: Uses BART to generate summaries.
def summarize_text(text): ...
- Example Audio Processing
A sample MP3 file is provided for testing.
Function process_example_audio ensures the file exists and processes it.
EXAMPLE_AUDIO_PATH = "AUDIO-2025-02-24-22-10-37.mp3"
def process_example_audio(): if not os.path.exists(EXAMPLE_AUDIO_PATH): return "⛔ Example file not found!" return convert_audio_to_text(EXAMPLE_AUDIO_PATH)
- User Interface
Gradio UI Components:
Audio preview & Example button
File upload section
Buttons for text extraction and summarization
Textboxes to display results
Button Callbacks: Link UI buttons to processing functions.
with gr.Blocks() as demo: ... extract_btn.click(convert_audio_to_text, inputs=file_input, outputs=extracted_text) summarize_btn.click(summarize_text, inputs=extracted_text, outputs=summary_output) example_btn.click(process_example_audio, outputs=extracted_text)
Launch the App: Runs the Gradio interface.
if name == "main": demo.launch()
Notes
Processing Speed: Large files take longer due to segmentation and ASR processing.
Video Files: Ensure FFmpeg is installed for proper audio extraction.
Resources: Large models like Whisper and BART require GPU acceleration for optimal performance.