Spaces:

dhananjay1006
/

video-to-notes

Build error

App Files Files Community

video-to-notes / README.md

dhananjay1006

Simplify Gradio interface for better compatibility - remove complex event chains and visibility logic

f544853 10 months ago

preview code

raw

history blame contribute delete

3.71 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: YouTube Slide Extractor
emoji: 🎓
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.20.0
app_file: app.py
pinned: false
license: mit

🎓 YouTube Slide Extractor

An AI-powered tool that automatically extracts slides from educational YouTube videos using computer vision and OCR technology.

🚀 Features

Automatic Slide Detection: Uses computer vision to identify unique slides in videos
OCR Text Extraction: Extracts text content from slides using Tesseract OCR
PDF Generation: Creates downloadable PDF documents from extracted slides
Smart Filtering: Removes duplicate frames and focuses on content changes
Batch Download: Download all results in a convenient ZIP file

🎯 How It Works

Video Download: Downloads the YouTube video using yt-dlp
Frame Extraction: Extracts frames at regular intervals (every 5 seconds)
Slide Detection: Uses structural similarity to identify unique slides
OCR Processing: Extracts text content from each slide
PDF Creation: Generates a PDF with slides and extracted text
Results Packaging: Creates a ZIP file with all slides, PDF, and metadata

📋 Usage Instructions

Enter Video URL: Paste a YouTube video URL in the input field
Configure Options: Choose whether to generate a PDF (recommended)
Start Extraction: Click the "Start Extraction" button
Monitor Progress: Check the status updates as processing occurs
Download Results: Download the ZIP file and/or PDF when complete

🔧 Technical Details

Computer Vision Pipeline

Frame Extraction: OpenCV for video processing
Similarity Detection: Structural Similarity Index (SSIM) for duplicate detection
Image Processing: Preprocessing for improved OCR accuracy

OCR Technology

Engine: Tesseract OCR with optimized configurations
Preprocessing: Noise reduction and contrast enhancement
Text Extraction: Configurable OCR parameters for best results

Supported Video Types

Educational lectures and presentations
Tutorial videos with slide content
Webinars and online courses
Conference presentations

⚠️ Limitations

Video Length: Longer videos may take more time to process
Quality Dependency: OCR accuracy depends on slide image quality
Text-Heavy Content: Works best with slides containing clear text
Processing Time: Typically 1-3 minutes for a 10-minute video

🛠️ Technology Stack

Frontend: Gradio for the web interface
Computer Vision: OpenCV and scikit-image
OCR: Tesseract via pytesseract
Video Processing: yt-dlp for YouTube downloads
PDF Generation: ReportLab for document creation
Image Processing: PIL/Pillow for image manipulation

📊 Performance Tips

Shorter Videos: Process videos under 30 minutes for faster results
High Quality: Use videos with clear, readable slides
Stable Content: Works best with static presentation slides
Good Lighting: Videos with good contrast produce better OCR results

🔒 Privacy & Security

No Data Storage: Videos and slides are processed temporarily
Local Processing: All processing happens on the server instance
Automatic Cleanup: Temporary files are cleaned up after processing
No User Tracking: No personal data is collected or stored

📄 License

MIT License - Feel free to use and modify for your projects.

Note: This tool is designed for educational purposes and respects YouTube's terms of service. Please ensure you have permission to download and process videos.