--- title: LectureWhisperer emoji: πŸ† colorFrom: gray colorTo: green sdk: gradio sdk_version: 6.6.0 app_file: app.py pinned: false license: mit short_description: The Lecture Whisperer is a multimodal AI study assistant. --- title: Lecture Whisperer emoji: πŸŽ“ colorFrom: indigo colorTo: purple sdk: gradio sdk_version: "6.0" app_file: app.py pinned: false license: mit --- # πŸŽ“ The Lecture Whisperer > **Turn any lecture recording + slides into a full AI-powered study toolkit β€” in minutes.** [![Hugging Face Spaces](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces) [![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)](https://python.org) [![Gradio 6.0](https://img.shields.io/badge/Gradio-6.0-orange)](https://gradio.app) [![License: MIT](https://img.shields.io/badge/License-MIT-green)](LICENSE) --- ## πŸ“Έ Overview The Lecture Whisperer is a multi-modal AI app that takes a raw lecture audio file and PDF slides as input and produces: - βœ… A full timestamped transcript - βœ… Extracted key concepts from every slide - βœ… A smart sync report mapping *what was said* β†’ *which slide it belongs to* - βœ… An interactive Q&A chatbot grounded in your lecture content - βœ… A generated multiple-choice quiz for exam prep All model inference runs via the **Hugging Face Inference API** β€” no GPU required on the Space itself. --- ## 🧠 Models Used | Task | Model | |---|---| | Audio Transcription | [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3) | | Slide Vision & Text Extraction | [`Qwen/Qwen2-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | | Quiz Generation & Q&A Chatbot | [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | --- ## ✨ Features ### Tab 1 β€” Upload & Process - Upload an **MP3 or WAV** lecture recording - Upload a **PDF** of lecture slides - Hit **⚑ Process Lecture** to run the full pipeline - View a live **Sync Report** showing which transcript segment maps to which slide ### Tab 2 β€” Dashboard - **πŸ’¬ Chatbot** β€” Ask any question about the lecture. Answers are grounded in the transcript and slide content (RAG-lite) - **πŸ–ΌοΈ Slide Gallery** β€” Browse all extracted slide images side by side ### Tab 3 β€” Mock Quiz - Click **🧠 Generate Mock Quiz** to instantly produce 7 multiple-choice questions - Questions are generated strictly from the lecture transcript β€” no hallucinated content --- ## πŸ”§ How It Works ### 1. Audio Transcription Whisper Large v3 processes the audio file via the HF Inference API and returns timestamped sentence chunks: ``` [00:04] Welcome to today's lecture on classical mechanics. [00:20] Newton's Second Law states that force equals mass times acceleration. ``` ### 2. Slide Processing Each PDF page is converted to an image using `pdf2image`. Each image is sent to Qwen2-VL with a prompt to extract all visible text, equations, bullet points, and concepts. ### 3. Sync Logic A keyword-overlap engine indexes every slide's content into a word set. Each transcript segment is then scored against every slide β€” the highest overlap wins. Example output: ``` [04:20] Newton's Second Law states F = ma β†’ Slide 5 (score: 4) ``` ### 4. Chatbot Q&A When you ask a question, the app: 1. Finds relevant transcript lines by keyword matching 2. Finds relevant slides by keyword matching 3. Stuffs both into a Llama-3 prompt as context 4. Returns a grounded answer ### 5. Quiz Generation The full transcript is passed to Llama-3-8B with a strict instruction to generate MCQs *only* from the provided content β€” no external knowledge injected. --- ## πŸš€ Running Locally ### Prerequisites - Python 3.10+ - `poppler-utils` installed on your system: ```bash # Ubuntu / Debian sudo apt install poppler-utils # macOS brew install poppler ``` ### Setup ```bash git clone https://huggingface.co/spaces/YOUR_USERNAME/lecture-whisperer cd lecture-whisperer pip install -r requirements.txt ``` ### Set your HF Token ```bash export HF_TOKEN=hf_your_token_here ``` ### Run ```bash python app.py ``` Then open [http://localhost:7860](http://localhost:7860) in your browser. --- ## πŸ”‘ Required Secrets (for HF Spaces) Go to your Space β†’ **Settings β†’ Variables and secrets** β†’ add: | Secret Name | Value | |---|---| | `HF_TOKEN` | Your Hugging Face API token (read access) | Make sure you have accepted the terms for gated models: - [Meta Llama 3](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) β€” click "Agree and access repository" - [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) β€” click "Agree and access repository" --- ## πŸ“ Project Structure ``` lecture-whisperer/ β”œβ”€β”€ app.py # Main Gradio application β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ packages.txt # System dependencies (poppler-utils) └── README.md # This file ``` --- ## πŸ“¦ Dependencies ``` gradio>=6.0.0 pdf2image>=1.17.0 Pillow>=10.0.0 requests>=2.31.0 ``` System dependency (handled by `packages.txt` on HF Spaces): ``` poppler-utils ``` --- ## ⚠️ Known Limitations - **Processing time** β€” Whisper transcription via the free Inference API can take 2–5 minutes for a 1-hour lecture. The app includes automatic retry logic for cold-start delays. - **Sync accuracy** β€” The current sync engine uses keyword overlap scoring. It works well for technical content but may miss semantic matches (e.g. paraphrased concepts). Future versions will use sentence embeddings. - **API rate limits** β€” The HF free Inference API has rate limits. For heavy usage, consider upgrading to a PRO token or running models locally. - **Gated models** β€” Llama-3 and Qwen2-VL require accepting license terms on the HF model page before your token can access them. --- ## πŸ—ΊοΈ Roadmap - [ ] Sentence-embedding based sync (replace keyword overlap with `all-MiniLM-L6-v2`) - [ ] One-click lecture summary (5 bullet points) - [ ] Export quiz as downloadable PDF - [ ] Speaker diarization (identify multiple speakers) - [ ] Support for YouTube URLs as audio input - [ ] Persistent chat history per session --- ## 🀝 Contributing Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change. --- ## πŸ“„ License This project is licensed under the [MIT License](LICENSE). --- ## πŸ™ Acknowledgements - [OpenAI Whisper](https://github.com/openai/whisper) for the transcription model - [Qwen Team at Alibaba](https://huggingface.co/Qwen) for Qwen2-VL - [Meta AI](https://ai.meta.com) for Llama 3 - [Hugging Face](https://huggingface.co) for the Inference API and Spaces platform - [Gradio](https://gradio.app) for the UI framework --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference