Spaces:

Raminnit
/

LectureWhisperer

Sleeping

App Files Files Community

LectureWhisperer / README.md

Raminnit

Update README.md

79bbd98 verified about 1 month ago

preview code

raw

history blame contribute delete

6.97 kB

	---
	title: LectureWhisperer
	emoji: 🏆
	colorFrom: gray
	colorTo: green
	sdk: gradio
	sdk_version: 6.6.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: The Lecture Whisperer is a multimodal AI study assistant.

	---
	title: Lecture Whisperer
	emoji: 🎓
	colorFrom: indigo
	colorTo: purple
	sdk: gradio
	sdk_version: "6.0"
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🎓 The Lecture Whisperer

	> Turn any lecture recording + slides into a full AI-powered study toolkit — in minutes.

	[![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces)
	[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)](https://python.org)
	[![Gradio 6.0](https://img.shields.io/badge/Gradio-6.0-orange)](https://gradio.app)
	[![License: MIT](https://img.shields.io/badge/License-MIT-green)](LICENSE)

	---

	## 📸 Overview

	The Lecture Whisperer is a multi-modal AI app that takes a raw lecture audio file and PDF slides as input and produces:

	- ✅ A full timestamped transcript
	- ✅ Extracted key concepts from every slide
	- ✅ A smart sync report mapping what was said → which slide it belongs to
	- ✅ An interactive Q&A chatbot grounded in your lecture content
	- ✅ A generated multiple-choice quiz for exam prep

	All model inference runs via the Hugging Face Inference API — no GPU required on the Space itself.

	---

	## 🧠 Models Used

	\| Task \| Model \|
	\|---\|---\|
	\| Audio Transcription \| [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3) \|
	\| Slide Vision & Text Extraction \| [`Qwen/Qwen2-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) \|
	\| Quiz Generation & Q&A Chatbot \| [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) \|

	---

	## ✨ Features

	### Tab 1 — Upload & Process
	- Upload an MP3 or WAV lecture recording
	- Upload a PDF of lecture slides
	- Hit ⚡ Process Lecture to run the full pipeline
	- View a live Sync Report showing which transcript segment maps to which slide

	### Tab 2 — Dashboard
	- 💬 Chatbot — Ask any question about the lecture. Answers are grounded in the transcript and slide content (RAG-lite)
	- 🖼️ Slide Gallery — Browse all extracted slide images side by side

	### Tab 3 — Mock Quiz
	- Click 🧠 Generate Mock Quiz to instantly produce 7 multiple-choice questions
	- Questions are generated strictly from the lecture transcript — no hallucinated content

	---

	## 🔧 How It Works

	### 1. Audio Transcription
	Whisper Large v3 processes the audio file via the HF Inference API and returns timestamped sentence chunks:
	```
	[00:04] Welcome to today's lecture on classical mechanics.
	[00:20] Newton's Second Law states that force equals mass times acceleration.
	```

	### 2. Slide Processing
	Each PDF page is converted to an image using `pdf2image`. Each image is sent to Qwen2-VL with a prompt to extract all visible text, equations, bullet points, and concepts.

	### 3. Sync Logic
	A keyword-overlap engine indexes every slide's content into a word set. Each transcript segment is then scored against every slide — the highest overlap wins. Example output:
	```
	[04:20] Newton's Second Law states F = ma
	→ Slide 5 (score: 4)
	```

	### 4. Chatbot Q&A
	When you ask a question, the app:
	1. Finds relevant transcript lines by keyword matching
	2. Finds relevant slides by keyword matching
	3. Stuffs both into a Llama-3 prompt as context
	4. Returns a grounded answer

	### 5. Quiz Generation
	The full transcript is passed to Llama-3-8B with a strict instruction to generate MCQs only from the provided content — no external knowledge injected.

	---

	## 🚀 Running Locally

	### Prerequisites
	- Python 3.10+
	- `poppler-utils` installed on your system:
	```bash
	# Ubuntu / Debian
	sudo apt install poppler-utils

	# macOS
	brew install poppler
	```

	### Setup
	```bash
	git clone https://huggingface.co/spaces/YOUR_USERNAME/lecture-whisperer
	cd lecture-whisperer

	pip install -r requirements.txt
	```

	### Set your HF Token
	```bash
	export HF_TOKEN=hf_your_token_here
	```

	### Run
	```bash
	python app.py
	```

	Then open [http://localhost:7860](http://localhost:7860) in your browser.

	---

	## 🔑 Required Secrets (for HF Spaces)

	Go to your Space → Settings → Variables and secrets → add:

	\| Secret Name \| Value \|
	\|---\|---\|
	\| `HF_TOKEN` \| Your Hugging Face API token (read access) \|

	Make sure you have accepted the terms for gated models:
	- [Meta Llama 3](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) — click "Agree and access repository"
	- [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) — click "Agree and access repository"

	---

	## 📁 Project Structure

	```
	lecture-whisperer/
	├── app.py # Main Gradio application
	├── requirements.txt # Python dependencies
	├── packages.txt # System dependencies (poppler-utils)
	└── README.md # This file
	```

	---

	## 📦 Dependencies

	```
	gradio>=6.0.0
	pdf2image>=1.17.0
	Pillow>=10.0.0
	requests>=2.31.0
	```

	System dependency (handled by `packages.txt` on HF Spaces):
	```
	poppler-utils
	```

	---

	## ⚠️ Known Limitations

	- Processing time — Whisper transcription via the free Inference API can take 2–5 minutes for a 1-hour lecture. The app includes automatic retry logic for cold-start delays.
	- Sync accuracy — The current sync engine uses keyword overlap scoring. It works well for technical content but may miss semantic matches (e.g. paraphrased concepts). Future versions will use sentence embeddings.
	- API rate limits — The HF free Inference API has rate limits. For heavy usage, consider upgrading to a PRO token or running models locally.
	- Gated models — Llama-3 and Qwen2-VL require accepting license terms on the HF model page before your token can access them.

	---

	## 🗺️ Roadmap

	- [ ] Sentence-embedding based sync (replace keyword overlap with `all-MiniLM-L6-v2`)
	- [ ] One-click lecture summary (5 bullet points)
	- [ ] Export quiz as downloadable PDF
	- [ ] Speaker diarization (identify multiple speakers)
	- [ ] Support for YouTube URLs as audio input
	- [ ] Persistent chat history per session

	---

	## 🤝 Contributing

	Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.

	---

	## 📄 License

	This project is licensed under the [MIT License](LICENSE).

	---

	## 🙏 Acknowledgements

	- [OpenAI Whisper](https://github.com/openai/whisper) for the transcription model
	- [Qwen Team at Alibaba](https://huggingface.co/Qwen) for Qwen2-VL
	- [Meta AI](https://ai.meta.com) for Llama 3
	- [Hugging Face](https://huggingface.co) for the Inference API and Spaces platform
	- [Gradio](https://gradio.app) for the UI framework
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference