--- title: Text Extraction Summarization emoji: 📈 colorFrom: yellow colorTo: purple sdk: gradio sdk_version: 5.18.0 app_file: app.py pinned: false short_description: Text extraction and summarization --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference Speech and Video Transcription & Summarization App This project is a Python-based application that provides an interactive Gradio interface for extracting text from audio and video files using OpenAI's Whisper model and summarizing the extracted text using an Arabic BART summarization model. Table of Contents Introduction Features Requirements Installation & Setup Usage Code Explanation 1. Basic Setup 2. Model Loading 3. Helper Functions 4. Example Audio Processing 5. User Interface Notes License Introduction This application provides an end-to-end solution for speech-to-text conversion from audio or video files, leveraging ASR (Automatic Speech Recognition) technology. The extracted text can then be summarized using a pre-trained BART summarization model for Arabic. The system supports both audio and video file processing, breaking down large files into smaller segments for efficient processing. Features Extract text from audio and video: Supports MP4, AVI, MOV, MKV formats for video, and WAV, MP3 for audio. Handles large files: Automatically splits long audio files into 30-second segments for smoother processing. Summarizes extracted text: Uses a fine-tuned BART model to generate concise summaries. Interactive UI: Built with Gradio, providing a simple drag-and-drop interface. Requirements Python: 3.6 or later Required Libraries: gradio torch transformers moviepy librosa soundfile numpy re os (built-in) GPU Support: If available, the system will use CUDA for faster processing. Installation & Setup Install Python: Ensure you have Python 3.6 or later installed. Install required dependencies: Run: pip install gradio torch transformers moviepy librosa soundfile numpy Additional Requirements: To process video files, install FFmpeg: FFmpeg Official Site. Ensure an internet connection for downloading models on first run. Usage Run the application: python filename.py This will launch a Gradio interface with a local or public URL. Using the UI: Test Example: A sample audio file is provided; click the "Try Example ⚡" button to test it. Upload File: Drag and drop an audio (WAV, MP3) or video (MP4, AVI, MOV) file. Extract Text: Click "Extract Text" after uploading to convert speech to text. Summarize Text: Once the text is extracted, click "Summarize" to generate a concise summary. Code Explanation 1. Basic Setup Detect GPU availability: Uses CUDA if available. device = "cuda" if torch.cuda.is_available() else "cpu" 2. Model Loading ASR Model: Uses Whisper-medium from OpenAI. Summarization Model: Loads a fine-tuned BART model for Arabic. pipe = pipeline("automatic-speech-recognition", model="openai/whisper-medium", device=0 if device=="cuda" else -1) bart_model = AutoModelForSeq2SeqLM.from_pretrained("ahmedabdo/arabic-summarizer-bart") bart_tokenizer = AutoTokenizer.from_pretrained("ahmedabdo/arabic-summarizer-bart") 3. Helper Functions Text Cleaning: Removes extra spaces. def clean_text(text): return re.sub(r'\s+', ' ', text).strip() Audio/Video Processing: Extracts audio from video files. Splits long audio into 30-second segments. Uses Whisper ASR to transcribe speech into text. def convert_audio_to_text(uploaded_file): ... Text Summarization: Uses BART to generate summaries. def summarize_text(text): ... 4. Example Audio Processing A sample MP3 file is provided for testing. Function process_example_audio ensures the file exists and processes it. EXAMPLE_AUDIO_PATH = "AUDIO-2025-02-24-22-10-37.mp3" def process_example_audio(): if not os.path.exists(EXAMPLE_AUDIO_PATH): return "⛔ Example file not found!" return convert_audio_to_text(EXAMPLE_AUDIO_PATH) 5. User Interface Gradio UI Components: Audio preview & Example button File upload section Buttons for text extraction and summarization Textboxes to display results Button Callbacks: Link UI buttons to processing functions. with gr.Blocks() as demo: ... extract_btn.click(convert_audio_to_text, inputs=file_input, outputs=extracted_text) summarize_btn.click(summarize_text, inputs=extracted_text, outputs=summary_output) example_btn.click(process_example_audio, outputs=extracted_text) Launch the App: Runs the Gradio interface. if __name__ == "__main__": demo.launch() Notes Processing Speed: Large files take longer due to segmentation and ASR processing. Video Files: Ensure FFmpeg is installed for proper audio extraction. Resources: Large models like Whisper and BART require GPU acceleration for optimal performance.