Spaces:
Sleeping
Sleeping
Update README.md
Browse filesextended readme file
README.md
CHANGED
|
@@ -10,7 +10,218 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
short_description: The Lecture Whisperer is a multimodal AI study assistant.
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
| 10 |
license: mit
|
| 11 |
short_description: The Lecture Whisperer is a multimodal AI study assistant.
|
| 12 |
|
| 13 |
+
---
|
| 14 |
+
title: Lecture Whisperer
|
| 15 |
+
emoji: π
|
| 16 |
+
colorFrom: indigo
|
| 17 |
+
colorTo: purple
|
| 18 |
+
sdk: gradio
|
| 19 |
+
sdk_version: "6.0"
|
| 20 |
+
app_file: app.py
|
| 21 |
+
pinned: false
|
| 22 |
+
license: mit
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
# π The Lecture Whisperer
|
| 26 |
+
|
| 27 |
+
> **Turn any lecture recording + slides into a full AI-powered study toolkit β in minutes.**
|
| 28 |
+
|
| 29 |
+
[](https://huggingface.co/spaces)
|
| 30 |
+
[](https://python.org)
|
| 31 |
+
[](https://gradio.app)
|
| 32 |
+
[](LICENSE)
|
| 33 |
+
|
| 34 |
+
---
|
| 35 |
+
|
| 36 |
+
## πΈ Overview
|
| 37 |
+
|
| 38 |
+
The Lecture Whisperer is a multi-modal AI app that takes a raw lecture audio file and PDF slides as input and produces:
|
| 39 |
+
|
| 40 |
+
- β
A full timestamped transcript
|
| 41 |
+
- β
Extracted key concepts from every slide
|
| 42 |
+
- β
A smart sync report mapping *what was said* β *which slide it belongs to*
|
| 43 |
+
- β
An interactive Q&A chatbot grounded in your lecture content
|
| 44 |
+
- β
A generated multiple-choice quiz for exam prep
|
| 45 |
+
|
| 46 |
+
All model inference runs via the **Hugging Face Inference API** β no GPU required on the Space itself.
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## π§ Models Used
|
| 51 |
+
|
| 52 |
+
| Task | Model |
|
| 53 |
+
|---|---|
|
| 54 |
+
| Audio Transcription | [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3) |
|
| 55 |
+
| Slide Vision & Text Extraction | [`Qwen/Qwen2-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) |
|
| 56 |
+
| Quiz Generation & Q&A Chatbot | [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## β¨ Features
|
| 61 |
+
|
| 62 |
+
### Tab 1 β Upload & Process
|
| 63 |
+
- Upload an **MP3 or WAV** lecture recording
|
| 64 |
+
- Upload a **PDF** of lecture slides
|
| 65 |
+
- Hit **β‘ Process Lecture** to run the full pipeline
|
| 66 |
+
- View a live **Sync Report** showing which transcript segment maps to which slide
|
| 67 |
+
|
| 68 |
+
### Tab 2 β Dashboard
|
| 69 |
+
- **π¬ Chatbot** β Ask any question about the lecture. Answers are grounded in the transcript and slide content (RAG-lite)
|
| 70 |
+
- **πΌοΈ Slide Gallery** β Browse all extracted slide images side by side
|
| 71 |
+
|
| 72 |
+
### Tab 3 β Mock Quiz
|
| 73 |
+
- Click **π§ Generate Mock Quiz** to instantly produce 7 multiple-choice questions
|
| 74 |
+
- Questions are generated strictly from the lecture transcript β no hallucinated content
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## π§ How It Works
|
| 79 |
+
|
| 80 |
+
### 1. Audio Transcription
|
| 81 |
+
Whisper Large v3 processes the audio file via the HF Inference API and returns timestamped sentence chunks:
|
| 82 |
+
```
|
| 83 |
+
[00:04] Welcome to today's lecture on classical mechanics.
|
| 84 |
+
[00:20] Newton's Second Law states that force equals mass times acceleration.
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
### 2. Slide Processing
|
| 88 |
+
Each PDF page is converted to an image using `pdf2image`. Each image is sent to Qwen2-VL with a prompt to extract all visible text, equations, bullet points, and concepts.
|
| 89 |
+
|
| 90 |
+
### 3. Sync Logic
|
| 91 |
+
A keyword-overlap engine indexes every slide's content into a word set. Each transcript segment is then scored against every slide β the highest overlap wins. Example output:
|
| 92 |
+
```
|
| 93 |
+
[04:20] Newton's Second Law states F = ma
|
| 94 |
+
β Slide 5 (score: 4)
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### 4. Chatbot Q&A
|
| 98 |
+
When you ask a question, the app:
|
| 99 |
+
1. Finds relevant transcript lines by keyword matching
|
| 100 |
+
2. Finds relevant slides by keyword matching
|
| 101 |
+
3. Stuffs both into a Llama-3 prompt as context
|
| 102 |
+
4. Returns a grounded answer
|
| 103 |
+
|
| 104 |
+
### 5. Quiz Generation
|
| 105 |
+
The full transcript is passed to Llama-3-8B with a strict instruction to generate MCQs *only* from the provided content β no external knowledge injected.
|
| 106 |
+
|
| 107 |
+
---
|
| 108 |
+
|
| 109 |
+
## π Running Locally
|
| 110 |
+
|
| 111 |
+
### Prerequisites
|
| 112 |
+
- Python 3.10+
|
| 113 |
+
- `poppler-utils` installed on your system:
|
| 114 |
+
```bash
|
| 115 |
+
# Ubuntu / Debian
|
| 116 |
+
sudo apt install poppler-utils
|
| 117 |
+
|
| 118 |
+
# macOS
|
| 119 |
+
brew install poppler
|
| 120 |
+
```
|
| 121 |
+
|
| 122 |
+
### Setup
|
| 123 |
+
```bash
|
| 124 |
+
git clone https://huggingface.co/spaces/YOUR_USERNAME/lecture-whisperer
|
| 125 |
+
cd lecture-whisperer
|
| 126 |
+
|
| 127 |
+
pip install -r requirements.txt
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
### Set your HF Token
|
| 131 |
+
```bash
|
| 132 |
+
export HF_TOKEN=hf_your_token_here
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
### Run
|
| 136 |
+
```bash
|
| 137 |
+
python app.py
|
| 138 |
+
```
|
| 139 |
+
|
| 140 |
+
Then open [http://localhost:7860](http://localhost:7860) in your browser.
|
| 141 |
+
|
| 142 |
+
---
|
| 143 |
+
|
| 144 |
+
## π Required Secrets (for HF Spaces)
|
| 145 |
+
|
| 146 |
+
Go to your Space β **Settings β Variables and secrets** β add:
|
| 147 |
+
|
| 148 |
+
| Secret Name | Value |
|
| 149 |
+
|---|---|
|
| 150 |
+
| `HF_TOKEN` | Your Hugging Face API token (read access) |
|
| 151 |
+
|
| 152 |
+
Make sure you have accepted the terms for gated models:
|
| 153 |
+
- [Meta Llama 3](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) β click "Agree and access repository"
|
| 154 |
+
- [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) β click "Agree and access repository"
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
## π Project Structure
|
| 159 |
+
|
| 160 |
+
```
|
| 161 |
+
lecture-whisperer/
|
| 162 |
+
βββ app.py # Main Gradio application
|
| 163 |
+
βββ requirements.txt # Python dependencies
|
| 164 |
+
βββ packages.txt # System dependencies (poppler-utils)
|
| 165 |
+
βββ README.md # This file
|
| 166 |
+
```
|
| 167 |
+
|
| 168 |
+
---
|
| 169 |
+
|
| 170 |
+
## π¦ Dependencies
|
| 171 |
+
|
| 172 |
+
```
|
| 173 |
+
gradio>=6.0.0
|
| 174 |
+
pdf2image>=1.17.0
|
| 175 |
+
Pillow>=10.0.0
|
| 176 |
+
requests>=2.31.0
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
System dependency (handled by `packages.txt` on HF Spaces):
|
| 180 |
+
```
|
| 181 |
+
poppler-utils
|
| 182 |
+
```
|
| 183 |
+
|
| 184 |
+
---
|
| 185 |
+
|
| 186 |
+
## β οΈ Known Limitations
|
| 187 |
+
|
| 188 |
+
- **Processing time** β Whisper transcription via the free Inference API can take 2β5 minutes for a 1-hour lecture. The app includes automatic retry logic for cold-start delays.
|
| 189 |
+
- **Sync accuracy** β The current sync engine uses keyword overlap scoring. It works well for technical content but may miss semantic matches (e.g. paraphrased concepts). Future versions will use sentence embeddings.
|
| 190 |
+
- **API rate limits** β The HF free Inference API has rate limits. For heavy usage, consider upgrading to a PRO token or running models locally.
|
| 191 |
+
- **Gated models** β Llama-3 and Qwen2-VL require accepting license terms on the HF model page before your token can access them.
|
| 192 |
+
|
| 193 |
+
---
|
| 194 |
+
|
| 195 |
+
## πΊοΈ Roadmap
|
| 196 |
+
|
| 197 |
+
- [ ] Sentence-embedding based sync (replace keyword overlap with `all-MiniLM-L6-v2`)
|
| 198 |
+
- [ ] One-click lecture summary (5 bullet points)
|
| 199 |
+
- [ ] Export quiz as downloadable PDF
|
| 200 |
+
- [ ] Speaker diarization (identify multiple speakers)
|
| 201 |
+
- [ ] Support for YouTube URLs as audio input
|
| 202 |
+
- [ ] Persistent chat history per session
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
## π€ Contributing
|
| 207 |
+
|
| 208 |
+
Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.
|
| 209 |
+
|
| 210 |
+
---
|
| 211 |
+
|
| 212 |
+
## π License
|
| 213 |
+
|
| 214 |
+
This project is licensed under the [MIT License](LICENSE).
|
| 215 |
+
|
| 216 |
+
---
|
| 217 |
+
|
| 218 |
+
## π Acknowledgements
|
| 219 |
|
| 220 |
+
- [OpenAI Whisper](https://github.com/openai/whisper) for the transcription model
|
| 221 |
+
- [Qwen Team at Alibaba](https://huggingface.co/Qwen) for Qwen2-VL
|
| 222 |
+
- [Meta AI](https://ai.meta.com) for Llama 3
|
| 223 |
+
- [Hugging Face](https://huggingface.co) for the Inference API and Spaces platform
|
| 224 |
+
- [Gradio](https://gradio.app) for the UI framework
|
| 225 |
---
|
| 226 |
|
| 227 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|