video-to-notes / README.md
dhananjay1006
Simplify Gradio interface for better compatibility - remove complex event chains and visibility logic
f544853

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: YouTube Slide Extractor
emoji: πŸŽ“
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.20.0
app_file: app.py
pinned: false
license: mit

πŸŽ“ YouTube Slide Extractor

An AI-powered tool that automatically extracts slides from educational YouTube videos using computer vision and OCR technology.

πŸš€ Features

  • Automatic Slide Detection: Uses computer vision to identify unique slides in videos
  • OCR Text Extraction: Extracts text content from slides using Tesseract OCR
  • PDF Generation: Creates downloadable PDF documents from extracted slides
  • Smart Filtering: Removes duplicate frames and focuses on content changes
  • Batch Download: Download all results in a convenient ZIP file

🎯 How It Works

  1. Video Download: Downloads the YouTube video using yt-dlp
  2. Frame Extraction: Extracts frames at regular intervals (every 5 seconds)
  3. Slide Detection: Uses structural similarity to identify unique slides
  4. OCR Processing: Extracts text content from each slide
  5. PDF Creation: Generates a PDF with slides and extracted text
  6. Results Packaging: Creates a ZIP file with all slides, PDF, and metadata

πŸ“‹ Usage Instructions

  1. Enter Video URL: Paste a YouTube video URL in the input field
  2. Configure Options: Choose whether to generate a PDF (recommended)
  3. Start Extraction: Click the "Start Extraction" button
  4. Monitor Progress: Check the status updates as processing occurs
  5. Download Results: Download the ZIP file and/or PDF when complete

πŸ”§ Technical Details

Computer Vision Pipeline

  • Frame Extraction: OpenCV for video processing
  • Similarity Detection: Structural Similarity Index (SSIM) for duplicate detection
  • Image Processing: Preprocessing for improved OCR accuracy

OCR Technology

  • Engine: Tesseract OCR with optimized configurations
  • Preprocessing: Noise reduction and contrast enhancement
  • Text Extraction: Configurable OCR parameters for best results

Supported Video Types

  • Educational lectures and presentations
  • Tutorial videos with slide content
  • Webinars and online courses
  • Conference presentations

⚠️ Limitations

  • Video Length: Longer videos may take more time to process
  • Quality Dependency: OCR accuracy depends on slide image quality
  • Text-Heavy Content: Works best with slides containing clear text
  • Processing Time: Typically 1-3 minutes for a 10-minute video

πŸ› οΈ Technology Stack

  • Frontend: Gradio for the web interface
  • Computer Vision: OpenCV and scikit-image
  • OCR: Tesseract via pytesseract
  • Video Processing: yt-dlp for YouTube downloads
  • PDF Generation: ReportLab for document creation
  • Image Processing: PIL/Pillow for image manipulation

πŸ“Š Performance Tips

  • Shorter Videos: Process videos under 30 minutes for faster results
  • High Quality: Use videos with clear, readable slides
  • Stable Content: Works best with static presentation slides
  • Good Lighting: Videos with good contrast produce better OCR results

πŸ”’ Privacy & Security

  • No Data Storage: Videos and slides are processed temporarily
  • Local Processing: All processing happens on the server instance
  • Automatic Cleanup: Temporary files are cleaned up after processing
  • No User Tracking: No personal data is collected or stored

πŸ“„ License

MIT License - Feel free to use and modify for your projects.


Note: This tool is designed for educational purposes and respects YouTube's terms of service. Please ensure you have permission to download and process videos.