Spaces:
Build error
Build error
dhananjay1006
Simplify Gradio interface for better compatibility - remove complex event chains and visibility logic
f544853 A newer version of the Gradio SDK is available: 6.13.0
metadata
title: YouTube Slide Extractor
emoji: π
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.20.0
app_file: app.py
pinned: false
license: mit
π YouTube Slide Extractor
An AI-powered tool that automatically extracts slides from educational YouTube videos using computer vision and OCR technology.
π Features
- Automatic Slide Detection: Uses computer vision to identify unique slides in videos
- OCR Text Extraction: Extracts text content from slides using Tesseract OCR
- PDF Generation: Creates downloadable PDF documents from extracted slides
- Smart Filtering: Removes duplicate frames and focuses on content changes
- Batch Download: Download all results in a convenient ZIP file
π― How It Works
- Video Download: Downloads the YouTube video using yt-dlp
- Frame Extraction: Extracts frames at regular intervals (every 5 seconds)
- Slide Detection: Uses structural similarity to identify unique slides
- OCR Processing: Extracts text content from each slide
- PDF Creation: Generates a PDF with slides and extracted text
- Results Packaging: Creates a ZIP file with all slides, PDF, and metadata
π Usage Instructions
- Enter Video URL: Paste a YouTube video URL in the input field
- Configure Options: Choose whether to generate a PDF (recommended)
- Start Extraction: Click the "Start Extraction" button
- Monitor Progress: Check the status updates as processing occurs
- Download Results: Download the ZIP file and/or PDF when complete
π§ Technical Details
Computer Vision Pipeline
- Frame Extraction: OpenCV for video processing
- Similarity Detection: Structural Similarity Index (SSIM) for duplicate detection
- Image Processing: Preprocessing for improved OCR accuracy
OCR Technology
- Engine: Tesseract OCR with optimized configurations
- Preprocessing: Noise reduction and contrast enhancement
- Text Extraction: Configurable OCR parameters for best results
Supported Video Types
- Educational lectures and presentations
- Tutorial videos with slide content
- Webinars and online courses
- Conference presentations
β οΈ Limitations
- Video Length: Longer videos may take more time to process
- Quality Dependency: OCR accuracy depends on slide image quality
- Text-Heavy Content: Works best with slides containing clear text
- Processing Time: Typically 1-3 minutes for a 10-minute video
π οΈ Technology Stack
- Frontend: Gradio for the web interface
- Computer Vision: OpenCV and scikit-image
- OCR: Tesseract via pytesseract
- Video Processing: yt-dlp for YouTube downloads
- PDF Generation: ReportLab for document creation
- Image Processing: PIL/Pillow for image manipulation
π Performance Tips
- Shorter Videos: Process videos under 30 minutes for faster results
- High Quality: Use videos with clear, readable slides
- Stable Content: Works best with static presentation slides
- Good Lighting: Videos with good contrast produce better OCR results
π Privacy & Security
- No Data Storage: Videos and slides are processed temporarily
- Local Processing: All processing happens on the server instance
- Automatic Cleanup: Temporary files are cleaned up after processing
- No User Tracking: No personal data is collected or stored
π License
MIT License - Feel free to use and modify for your projects.
Note: This tool is designed for educational purposes and respects YouTube's terms of service. Please ensure you have permission to download and process videos.