Spaces:
Sleeping
Sleeping
| title: LectureWhisperer | |
| emoji: π | |
| colorFrom: gray | |
| colorTo: green | |
| sdk: gradio | |
| sdk_version: 6.6.0 | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| short_description: The Lecture Whisperer is a multimodal AI study assistant. | |
| title: Lecture Whisperer | |
| emoji: π | |
| colorFrom: indigo | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: "6.0" | |
| app_file: app.py | |
| pinned: false | |
| license: mit | |
| --- | |
| # π The Lecture Whisperer | |
| > **Turn any lecture recording + slides into a full AI-powered study toolkit β in minutes.** | |
| [](https://huggingface.co/spaces) | |
| [](https://python.org) | |
| [](https://gradio.app) | |
| [](LICENSE) | |
| --- | |
| ## πΈ Overview | |
| The Lecture Whisperer is a multi-modal AI app that takes a raw lecture audio file and PDF slides as input and produces: | |
| - β A full timestamped transcript | |
| - β Extracted key concepts from every slide | |
| - β A smart sync report mapping *what was said* β *which slide it belongs to* | |
| - β An interactive Q&A chatbot grounded in your lecture content | |
| - β A generated multiple-choice quiz for exam prep | |
| All model inference runs via the **Hugging Face Inference API** β no GPU required on the Space itself. | |
| --- | |
| ## π§ Models Used | |
| | Task | Model | | |
| |---|---| | |
| | Audio Transcription | [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3) | | |
| | Slide Vision & Text Extraction | [`Qwen/Qwen2-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) | | |
| | Quiz Generation & Q&A Chatbot | [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | | |
| --- | |
| ## β¨ Features | |
| ### Tab 1 β Upload & Process | |
| - Upload an **MP3 or WAV** lecture recording | |
| - Upload a **PDF** of lecture slides | |
| - Hit **β‘ Process Lecture** to run the full pipeline | |
| - View a live **Sync Report** showing which transcript segment maps to which slide | |
| ### Tab 2 β Dashboard | |
| - **π¬ Chatbot** β Ask any question about the lecture. Answers are grounded in the transcript and slide content (RAG-lite) | |
| - **πΌοΈ Slide Gallery** β Browse all extracted slide images side by side | |
| ### Tab 3 β Mock Quiz | |
| - Click **π§ Generate Mock Quiz** to instantly produce 7 multiple-choice questions | |
| - Questions are generated strictly from the lecture transcript β no hallucinated content | |
| --- | |
| ## π§ How It Works | |
| ### 1. Audio Transcription | |
| Whisper Large v3 processes the audio file via the HF Inference API and returns timestamped sentence chunks: | |
| ``` | |
| [00:04] Welcome to today's lecture on classical mechanics. | |
| [00:20] Newton's Second Law states that force equals mass times acceleration. | |
| ``` | |
| ### 2. Slide Processing | |
| Each PDF page is converted to an image using `pdf2image`. Each image is sent to Qwen2-VL with a prompt to extract all visible text, equations, bullet points, and concepts. | |
| ### 3. Sync Logic | |
| A keyword-overlap engine indexes every slide's content into a word set. Each transcript segment is then scored against every slide β the highest overlap wins. Example output: | |
| ``` | |
| [04:20] Newton's Second Law states F = ma | |
| β Slide 5 (score: 4) | |
| ``` | |
| ### 4. Chatbot Q&A | |
| When you ask a question, the app: | |
| 1. Finds relevant transcript lines by keyword matching | |
| 2. Finds relevant slides by keyword matching | |
| 3. Stuffs both into a Llama-3 prompt as context | |
| 4. Returns a grounded answer | |
| ### 5. Quiz Generation | |
| The full transcript is passed to Llama-3-8B with a strict instruction to generate MCQs *only* from the provided content β no external knowledge injected. | |
| --- | |
| ## π Running Locally | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - `poppler-utils` installed on your system: | |
| ```bash | |
| # Ubuntu / Debian | |
| sudo apt install poppler-utils | |
| # macOS | |
| brew install poppler | |
| ``` | |
| ### Setup | |
| ```bash | |
| git clone https://huggingface.co/spaces/YOUR_USERNAME/lecture-whisperer | |
| cd lecture-whisperer | |
| pip install -r requirements.txt | |
| ``` | |
| ### Set your HF Token | |
| ```bash | |
| export HF_TOKEN=hf_your_token_here | |
| ``` | |
| ### Run | |
| ```bash | |
| python app.py | |
| ``` | |
| Then open [http://localhost:7860](http://localhost:7860) in your browser. | |
| --- | |
| ## π Required Secrets (for HF Spaces) | |
| Go to your Space β **Settings β Variables and secrets** β add: | |
| | Secret Name | Value | | |
| |---|---| | |
| | `HF_TOKEN` | Your Hugging Face API token (read access) | | |
| Make sure you have accepted the terms for gated models: | |
| - [Meta Llama 3](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) β click "Agree and access repository" | |
| - [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) β click "Agree and access repository" | |
| --- | |
| ## π Project Structure | |
| ``` | |
| lecture-whisperer/ | |
| βββ app.py # Main Gradio application | |
| βββ requirements.txt # Python dependencies | |
| βββ packages.txt # System dependencies (poppler-utils) | |
| βββ README.md # This file | |
| ``` | |
| --- | |
| ## π¦ Dependencies | |
| ``` | |
| gradio>=6.0.0 | |
| pdf2image>=1.17.0 | |
| Pillow>=10.0.0 | |
| requests>=2.31.0 | |
| ``` | |
| System dependency (handled by `packages.txt` on HF Spaces): | |
| ``` | |
| poppler-utils | |
| ``` | |
| --- | |
| ## β οΈ Known Limitations | |
| - **Processing time** β Whisper transcription via the free Inference API can take 2β5 minutes for a 1-hour lecture. The app includes automatic retry logic for cold-start delays. | |
| - **Sync accuracy** β The current sync engine uses keyword overlap scoring. It works well for technical content but may miss semantic matches (e.g. paraphrased concepts). Future versions will use sentence embeddings. | |
| - **API rate limits** β The HF free Inference API has rate limits. For heavy usage, consider upgrading to a PRO token or running models locally. | |
| - **Gated models** β Llama-3 and Qwen2-VL require accepting license terms on the HF model page before your token can access them. | |
| --- | |
| ## πΊοΈ Roadmap | |
| - [ ] Sentence-embedding based sync (replace keyword overlap with `all-MiniLM-L6-v2`) | |
| - [ ] One-click lecture summary (5 bullet points) | |
| - [ ] Export quiz as downloadable PDF | |
| - [ ] Speaker diarization (identify multiple speakers) | |
| - [ ] Support for YouTube URLs as audio input | |
| - [ ] Persistent chat history per session | |
| --- | |
| ## π€ Contributing | |
| Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change. | |
| --- | |
| ## π License | |
| This project is licensed under the [MIT License](LICENSE). | |
| --- | |
| ## π Acknowledgements | |
| - [OpenAI Whisper](https://github.com/openai/whisper) for the transcription model | |
| - [Qwen Team at Alibaba](https://huggingface.co/Qwen) for Qwen2-VL | |
| - [Meta AI](https://ai.meta.com) for Llama 3 | |
| - [Hugging Face](https://huggingface.co) for the Inference API and Spaces platform | |
| - [Gradio](https://gradio.app) for the UI framework | |
| --- | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |