---
title: Text Extraction Summarization
emoji: 📈
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.18.0
app_file: app.py
pinned: false
short_description: Text extraction and summarization
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Speech and Video Transcription & Summarization App

This project is a Python-based application that provides an interactive Gradio interface for extracting text from audio and video files using OpenAI's Whisper model and summarizing the extracted text using an Arabic BART summarization model.

Table of Contents

Introduction

Features

Requirements

Installation & Setup

Usage

Code Explanation

1. Basic Setup

2. Model Loading

3. Helper Functions

4. Example Audio Processing

5. User Interface

Notes

License

Introduction

This application provides an end-to-end solution for speech-to-text conversion from audio or video files, leveraging ASR (Automatic Speech Recognition) technology. The extracted text can then be summarized using a pre-trained BART summarization model for Arabic. The system supports both audio and video file processing, breaking down large files into smaller segments for efficient processing.

Features

Extract text from audio and video: Supports MP4, AVI, MOV, MKV formats for video, and WAV, MP3 for audio.

Handles large files: Automatically splits long audio files into 30-second segments for smoother processing.

Summarizes extracted text: Uses a fine-tuned BART model to generate concise summaries.

Interactive UI: Built with Gradio, providing a simple drag-and-drop interface.

Requirements

Python: 3.6 or later

Required Libraries:

gradio

torch

transformers

moviepy

librosa

soundfile

numpy

re

os (built-in)

GPU Support: If available, the system will use CUDA for faster processing.

Installation & Setup

Install Python: Ensure you have Python 3.6 or later installed.

Install required dependencies: Run:

pip install gradio torch transformers moviepy librosa soundfile numpy

Additional Requirements:

To process video files, install FFmpeg: FFmpeg Official Site.

Ensure an internet connection for downloading models on first run.

Usage

Run the application:

python filename.py

This will launch a Gradio interface with a local or public URL.

Using the UI:

Test Example: A sample audio file is provided; click the "Try Example ⚡" button to test it.

Upload File: Drag and drop an audio (WAV, MP3) or video (MP4, AVI, MOV) file.

Extract Text: Click "Extract Text" after uploading to convert speech to text.

Summarize Text: Once the text is extracted, click "Summarize" to generate a concise summary.

Code Explanation

1. Basic Setup

Detect GPU availability: Uses CUDA if available.

device = "cuda" if torch.cuda.is_available() else "cpu"

2. Model Loading

ASR Model: Uses Whisper-medium from OpenAI.

Summarization Model: Loads a fine-tuned BART model for Arabic.

pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium", device=0 if device=="cuda" else -1)
bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart")
bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart")

3. Helper Functions

Text Cleaning: Removes extra spaces.

def clean_text(text):
    return re.sub(r'\s+', ' ', text).strip()

Audio/Video Processing:

Extracts audio from video files.

Splits long audio into 30-second segments.

Uses Whisper ASR to transcribe speech into text.

def convert_audio_to_text(uploaded_file):
    ...

Text Summarization: Uses BART to generate summaries.

def summarize_text(text):
    ...

4. Example Audio Processing

A sample MP3 file is provided for testing.

Function process_example_audio ensures the file exists and processes it.

EXAMPLE_AUDIO_PATH = "AUDIO-2025-02-24-22-10-37.mp3"

def process_example_audio():
    if not os.path.exists(EXAMPLE_AUDIO_PATH):
        return "⛔ Example file not found!"
    return convert_audio_to_text(EXAMPLE_AUDIO_PATH)

5. User Interface

Gradio UI Components:

Audio preview & Example button

File upload section

Buttons for text extraction and summarization

Textboxes to display results

Button Callbacks: Link UI buttons to processing functions.

with gr.Blocks() as demo:
    ...
    extract_btn.click(convert_audio_to_text, inputs=file_input, outputs=extracted_text)
    summarize_btn.click(summarize_text, inputs=extracted_text, outputs=summary_output)
    example_btn.click(process_example_audio, outputs=extracted_text)

Launch the App: Runs the Gradio interface.

if __name__ == "__main__":
    demo.launch()

Notes

Processing Speed: Large files take longer due to segmentation and ASR processing.

Video Files: Ensure FFmpeg is installed for proper audio extraction.

Resources: Large models like Whisper and BART require GPU acceleration for optimal performance.