Spaces:
Running
A newer version of the Gradio SDK is available: 6.11.0
title: LectureWhisperer
emoji: π
colorFrom: gray
colorTo: green
sdk: gradio
sdk_version: 6.6.0
app_file: app.py
pinned: false
license: mit
short_description: The Lecture Whisperer is a multimodal AI study assistant.
title: Lecture Whisperer emoji: π colorFrom: indigo colorTo: purple sdk: gradio sdk_version: "6.0" app_file: app.py pinned: false license: mit
π The Lecture Whisperer
Turn any lecture recording + slides into a full AI-powered study toolkit β in minutes.
πΈ Overview
The Lecture Whisperer is a multi-modal AI app that takes a raw lecture audio file and PDF slides as input and produces:
- β A full timestamped transcript
- β Extracted key concepts from every slide
- β A smart sync report mapping what was said β which slide it belongs to
- β An interactive Q&A chatbot grounded in your lecture content
- β A generated multiple-choice quiz for exam prep
All model inference runs via the Hugging Face Inference API β no GPU required on the Space itself.
π§ Models Used
| Task | Model |
|---|---|
| Audio Transcription | openai/whisper-large-v3 |
| Slide Vision & Text Extraction | Qwen/Qwen2-VL-7B-Instruct |
| Quiz Generation & Q&A Chatbot | meta-llama/Meta-Llama-3-8B-Instruct |
β¨ Features
Tab 1 β Upload & Process
- Upload an MP3 or WAV lecture recording
- Upload a PDF of lecture slides
- Hit β‘ Process Lecture to run the full pipeline
- View a live Sync Report showing which transcript segment maps to which slide
Tab 2 β Dashboard
- π¬ Chatbot β Ask any question about the lecture. Answers are grounded in the transcript and slide content (RAG-lite)
- πΌοΈ Slide Gallery β Browse all extracted slide images side by side
Tab 3 β Mock Quiz
- Click π§ Generate Mock Quiz to instantly produce 7 multiple-choice questions
- Questions are generated strictly from the lecture transcript β no hallucinated content
π§ How It Works
1. Audio Transcription
Whisper Large v3 processes the audio file via the HF Inference API and returns timestamped sentence chunks:
[00:04] Welcome to today's lecture on classical mechanics.
[00:20] Newton's Second Law states that force equals mass times acceleration.
2. Slide Processing
Each PDF page is converted to an image using pdf2image. Each image is sent to Qwen2-VL with a prompt to extract all visible text, equations, bullet points, and concepts.
3. Sync Logic
A keyword-overlap engine indexes every slide's content into a word set. Each transcript segment is then scored against every slide β the highest overlap wins. Example output:
[04:20] Newton's Second Law states F = ma
β Slide 5 (score: 4)
4. Chatbot Q&A
When you ask a question, the app:
- Finds relevant transcript lines by keyword matching
- Finds relevant slides by keyword matching
- Stuffs both into a Llama-3 prompt as context
- Returns a grounded answer
5. Quiz Generation
The full transcript is passed to Llama-3-8B with a strict instruction to generate MCQs only from the provided content β no external knowledge injected.
π Running Locally
Prerequisites
- Python 3.10+
poppler-utilsinstalled on your system:# Ubuntu / Debian sudo apt install poppler-utils # macOS brew install poppler
Setup
git clone https://huggingface.co/spaces/YOUR_USERNAME/lecture-whisperer
cd lecture-whisperer
pip install -r requirements.txt
Set your HF Token
export HF_TOKEN=hf_your_token_here
Run
python app.py
Then open http://localhost:7860 in your browser.
π Required Secrets (for HF Spaces)
Go to your Space β Settings β Variables and secrets β add:
| Secret Name | Value |
|---|---|
HF_TOKEN |
Your Hugging Face API token (read access) |
Make sure you have accepted the terms for gated models:
- Meta Llama 3 β click "Agree and access repository"
- Qwen2-VL β click "Agree and access repository"
π Project Structure
lecture-whisperer/
βββ app.py # Main Gradio application
βββ requirements.txt # Python dependencies
βββ packages.txt # System dependencies (poppler-utils)
βββ README.md # This file
π¦ Dependencies
gradio>=6.0.0
pdf2image>=1.17.0
Pillow>=10.0.0
requests>=2.31.0
System dependency (handled by packages.txt on HF Spaces):
poppler-utils
β οΈ Known Limitations
- Processing time β Whisper transcription via the free Inference API can take 2β5 minutes for a 1-hour lecture. The app includes automatic retry logic for cold-start delays.
- Sync accuracy β The current sync engine uses keyword overlap scoring. It works well for technical content but may miss semantic matches (e.g. paraphrased concepts). Future versions will use sentence embeddings.
- API rate limits β The HF free Inference API has rate limits. For heavy usage, consider upgrading to a PRO token or running models locally.
- Gated models β Llama-3 and Qwen2-VL require accepting license terms on the HF model page before your token can access them.
πΊοΈ Roadmap
- Sentence-embedding based sync (replace keyword overlap with
all-MiniLM-L6-v2) - One-click lecture summary (5 bullet points)
- Export quiz as downloadable PDF
- Speaker diarization (identify multiple speakers)
- Support for YouTube URLs as audio input
- Persistent chat history per session
π€ Contributing
Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.
π License
This project is licensed under the MIT License.
π Acknowledgements
- OpenAI Whisper for the transcription model
- Qwen Team at Alibaba for Qwen2-VL
- Meta AI for Llama 3
- Hugging Face for the Inference API and Spaces platform
- Gradio for the UI framework
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference