Raminnit commited on
Commit
79bbd98
Β·
verified Β·
1 Parent(s): 01df21e

Update README.md

Browse files

extended readme file

Files changed (1) hide show
  1. README.md +211 -0
README.md CHANGED
@@ -10,7 +10,218 @@ pinned: false
10
  license: mit
11
  short_description: The Lecture Whisperer is a multimodal AI study assistant.
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
 
 
 
 
 
14
  ---
15
 
16
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
10
  license: mit
11
  short_description: The Lecture Whisperer is a multimodal AI study assistant.
12
 
13
+ ---
14
+ title: Lecture Whisperer
15
+ emoji: πŸŽ“
16
+ colorFrom: indigo
17
+ colorTo: purple
18
+ sdk: gradio
19
+ sdk_version: "6.0"
20
+ app_file: app.py
21
+ pinned: false
22
+ license: mit
23
+ ---
24
+
25
+ # πŸŽ“ The Lecture Whisperer
26
+
27
+ > **Turn any lecture recording + slides into a full AI-powered study toolkit β€” in minutes.**
28
+
29
+ [![Hugging Face Spaces](https://img.shields.io/badge/πŸ€—%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces)
30
+ [![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)](https://python.org)
31
+ [![Gradio 6.0](https://img.shields.io/badge/Gradio-6.0-orange)](https://gradio.app)
32
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green)](LICENSE)
33
+
34
+ ---
35
+
36
+ ## πŸ“Έ Overview
37
+
38
+ The Lecture Whisperer is a multi-modal AI app that takes a raw lecture audio file and PDF slides as input and produces:
39
+
40
+ - βœ… A full timestamped transcript
41
+ - βœ… Extracted key concepts from every slide
42
+ - βœ… A smart sync report mapping *what was said* β†’ *which slide it belongs to*
43
+ - βœ… An interactive Q&A chatbot grounded in your lecture content
44
+ - βœ… A generated multiple-choice quiz for exam prep
45
+
46
+ All model inference runs via the **Hugging Face Inference API** β€” no GPU required on the Space itself.
47
+
48
+ ---
49
+
50
+ ## 🧠 Models Used
51
+
52
+ | Task | Model |
53
+ |---|---|
54
+ | Audio Transcription | [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3) |
55
+ | Slide Vision & Text Extraction | [`Qwen/Qwen2-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) |
56
+ | Quiz Generation & Q&A Chatbot | [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
57
+
58
+ ---
59
+
60
+ ## ✨ Features
61
+
62
+ ### Tab 1 β€” Upload & Process
63
+ - Upload an **MP3 or WAV** lecture recording
64
+ - Upload a **PDF** of lecture slides
65
+ - Hit **⚑ Process Lecture** to run the full pipeline
66
+ - View a live **Sync Report** showing which transcript segment maps to which slide
67
+
68
+ ### Tab 2 β€” Dashboard
69
+ - **πŸ’¬ Chatbot** β€” Ask any question about the lecture. Answers are grounded in the transcript and slide content (RAG-lite)
70
+ - **πŸ–ΌοΈ Slide Gallery** β€” Browse all extracted slide images side by side
71
+
72
+ ### Tab 3 β€” Mock Quiz
73
+ - Click **🧠 Generate Mock Quiz** to instantly produce 7 multiple-choice questions
74
+ - Questions are generated strictly from the lecture transcript β€” no hallucinated content
75
+
76
+ ---
77
+
78
+ ## πŸ”§ How It Works
79
+
80
+ ### 1. Audio Transcription
81
+ Whisper Large v3 processes the audio file via the HF Inference API and returns timestamped sentence chunks:
82
+ ```
83
+ [00:04] Welcome to today's lecture on classical mechanics.
84
+ [00:20] Newton's Second Law states that force equals mass times acceleration.
85
+ ```
86
+
87
+ ### 2. Slide Processing
88
+ Each PDF page is converted to an image using `pdf2image`. Each image is sent to Qwen2-VL with a prompt to extract all visible text, equations, bullet points, and concepts.
89
+
90
+ ### 3. Sync Logic
91
+ A keyword-overlap engine indexes every slide's content into a word set. Each transcript segment is then scored against every slide β€” the highest overlap wins. Example output:
92
+ ```
93
+ [04:20] Newton's Second Law states F = ma
94
+ β†’ Slide 5 (score: 4)
95
+ ```
96
+
97
+ ### 4. Chatbot Q&A
98
+ When you ask a question, the app:
99
+ 1. Finds relevant transcript lines by keyword matching
100
+ 2. Finds relevant slides by keyword matching
101
+ 3. Stuffs both into a Llama-3 prompt as context
102
+ 4. Returns a grounded answer
103
+
104
+ ### 5. Quiz Generation
105
+ The full transcript is passed to Llama-3-8B with a strict instruction to generate MCQs *only* from the provided content β€” no external knowledge injected.
106
+
107
+ ---
108
+
109
+ ## πŸš€ Running Locally
110
+
111
+ ### Prerequisites
112
+ - Python 3.10+
113
+ - `poppler-utils` installed on your system:
114
+ ```bash
115
+ # Ubuntu / Debian
116
+ sudo apt install poppler-utils
117
+
118
+ # macOS
119
+ brew install poppler
120
+ ```
121
+
122
+ ### Setup
123
+ ```bash
124
+ git clone https://huggingface.co/spaces/YOUR_USERNAME/lecture-whisperer
125
+ cd lecture-whisperer
126
+
127
+ pip install -r requirements.txt
128
+ ```
129
+
130
+ ### Set your HF Token
131
+ ```bash
132
+ export HF_TOKEN=hf_your_token_here
133
+ ```
134
+
135
+ ### Run
136
+ ```bash
137
+ python app.py
138
+ ```
139
+
140
+ Then open [http://localhost:7860](http://localhost:7860) in your browser.
141
+
142
+ ---
143
+
144
+ ## πŸ”‘ Required Secrets (for HF Spaces)
145
+
146
+ Go to your Space β†’ **Settings β†’ Variables and secrets** β†’ add:
147
+
148
+ | Secret Name | Value |
149
+ |---|---|
150
+ | `HF_TOKEN` | Your Hugging Face API token (read access) |
151
+
152
+ Make sure you have accepted the terms for gated models:
153
+ - [Meta Llama 3](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) β€” click "Agree and access repository"
154
+ - [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) β€” click "Agree and access repository"
155
+
156
+ ---
157
+
158
+ ## πŸ“ Project Structure
159
+
160
+ ```
161
+ lecture-whisperer/
162
+ β”œβ”€β”€ app.py # Main Gradio application
163
+ β”œβ”€β”€ requirements.txt # Python dependencies
164
+ β”œβ”€β”€ packages.txt # System dependencies (poppler-utils)
165
+ └── README.md # This file
166
+ ```
167
+
168
+ ---
169
+
170
+ ## πŸ“¦ Dependencies
171
+
172
+ ```
173
+ gradio>=6.0.0
174
+ pdf2image>=1.17.0
175
+ Pillow>=10.0.0
176
+ requests>=2.31.0
177
+ ```
178
+
179
+ System dependency (handled by `packages.txt` on HF Spaces):
180
+ ```
181
+ poppler-utils
182
+ ```
183
+
184
+ ---
185
+
186
+ ## ⚠️ Known Limitations
187
+
188
+ - **Processing time** β€” Whisper transcription via the free Inference API can take 2–5 minutes for a 1-hour lecture. The app includes automatic retry logic for cold-start delays.
189
+ - **Sync accuracy** β€” The current sync engine uses keyword overlap scoring. It works well for technical content but may miss semantic matches (e.g. paraphrased concepts). Future versions will use sentence embeddings.
190
+ - **API rate limits** β€” The HF free Inference API has rate limits. For heavy usage, consider upgrading to a PRO token or running models locally.
191
+ - **Gated models** β€” Llama-3 and Qwen2-VL require accepting license terms on the HF model page before your token can access them.
192
+
193
+ ---
194
+
195
+ ## πŸ—ΊοΈ Roadmap
196
+
197
+ - [ ] Sentence-embedding based sync (replace keyword overlap with `all-MiniLM-L6-v2`)
198
+ - [ ] One-click lecture summary (5 bullet points)
199
+ - [ ] Export quiz as downloadable PDF
200
+ - [ ] Speaker diarization (identify multiple speakers)
201
+ - [ ] Support for YouTube URLs as audio input
202
+ - [ ] Persistent chat history per session
203
+
204
+ ---
205
+
206
+ ## 🀝 Contributing
207
+
208
+ Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.
209
+
210
+ ---
211
+
212
+ ## πŸ“„ License
213
+
214
+ This project is licensed under the [MIT License](LICENSE).
215
+
216
+ ---
217
+
218
+ ## πŸ™ Acknowledgements
219
 
220
+ - [OpenAI Whisper](https://github.com/openai/whisper) for the transcription model
221
+ - [Qwen Team at Alibaba](https://huggingface.co/Qwen) for Qwen2-VL
222
+ - [Meta AI](https://ai.meta.com) for Llama 3
223
+ - [Hugging Face](https://huggingface.co) for the Inference API and Spaces platform
224
+ - [Gradio](https://gradio.app) for the UI framework
225
  ---
226
 
227
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference