Spaces:

Raminnit
/

LectureWhisperer

Sleeping

App Files Files Community

Raminnit commited on Feb 23

Commit

79bbd98

verified ·

1 Parent(s): 01df21e

Update README.md

Browse files

extended readme file

Files changed (1) hide show

README.md +211 -0

README.md CHANGED Viewed

@@ -10,7 +10,218 @@ pinned: false
 license: mit
 short_description: The Lecture Whisperer is a multimodal AI study assistant.
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: mit
 short_description: The Lecture Whisperer is a multimodal AI study assistant.
+---
+title: Lecture Whisperer
+emoji: 🎓
+colorFrom: indigo
+colorTo: purple
+sdk: gradio
+sdk_version: "6.0"
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🎓 The Lecture Whisperer
+> **Turn any lecture recording + slides into a full AI-powered study toolkit — in minutes.**
+[![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces)
+[![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)](https://python.org)
+[![Gradio 6.0](https://img.shields.io/badge/Gradio-6.0-orange)](https://gradio.app)
+[![License: MIT](https://img.shields.io/badge/License-MIT-green)](LICENSE)
+---
+## 📸 Overview
+The Lecture Whisperer is a multi-modal AI app that takes a raw lecture audio file and PDF slides as input and produces:
+- ✅ A full timestamped transcript
+- ✅ Extracted key concepts from every slide
+- ✅ A smart sync report mapping *what was said* → *which slide it belongs to*
+- ✅ An interactive Q&A chatbot grounded in your lecture content
+- ✅ A generated multiple-choice quiz for exam prep
+All model inference runs via the **Hugging Face Inference API** — no GPU required on the Space itself.
+---
+## 🧠 Models Used
+| Task | Model |
+|---|---|
+| Audio Transcription | [`openai/whisper-large-v3`](https://huggingface.co/openai/whisper-large-v3) |
+| Slide Vision & Text Extraction | [`Qwen/Qwen2-VL-7B-Instruct`](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) |
+| Quiz Generation & Q&A Chatbot | [`meta-llama/Meta-Llama-3-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) |
+---
+## ✨ Features
+### Tab 1 — Upload & Process
+- Upload an **MP3 or WAV** lecture recording
+- Upload a **PDF** of lecture slides
+- Hit **⚡ Process Lecture** to run the full pipeline
+- View a live **Sync Report** showing which transcript segment maps to which slide
+### Tab 2 — Dashboard
+- **💬 Chatbot** — Ask any question about the lecture. Answers are grounded in the transcript and slide content (RAG-lite)
+- **🖼️ Slide Gallery** — Browse all extracted slide images side by side
+### Tab 3 — Mock Quiz
+- Click **🧠 Generate Mock Quiz** to instantly produce 7 multiple-choice questions
+- Questions are generated strictly from the lecture transcript — no hallucinated content
+---
+## 🔧 How It Works
+### 1. Audio Transcription
+Whisper Large v3 processes the audio file via the HF Inference API and returns timestamped sentence chunks:
+```
+[00:04] Welcome to today's lecture on classical mechanics.
+[00:20] Newton's Second Law states that force equals mass times acceleration.
+```
+### 2. Slide Processing
+Each PDF page is converted to an image using `pdf2image`. Each image is sent to Qwen2-VL with a prompt to extract all visible text, equations, bullet points, and concepts.
+### 3. Sync Logic
+A keyword-overlap engine indexes every slide's content into a word set. Each transcript segment is then scored against every slide — the highest overlap wins. Example output:
+```
+[04:20] Newton's Second Law states F = ma
+   → Slide 5 (score: 4)
+```
+### 4. Chatbot Q&A
+When you ask a question, the app:
+1. Finds relevant transcript lines by keyword matching
+2. Finds relevant slides by keyword matching
+3. Stuffs both into a Llama-3 prompt as context
+4. Returns a grounded answer
+### 5. Quiz Generation
+The full transcript is passed to Llama-3-8B with a strict instruction to generate MCQs *only* from the provided content — no external knowledge injected.
+---
+## 🚀 Running Locally
+### Prerequisites
+- Python 3.10+
+- `poppler-utils` installed on your system:
+  ```bash
+  # Ubuntu / Debian
+  sudo apt install poppler-utils
+  # macOS
+  brew install poppler
+  ```
+### Setup
+```bash
+git clone https://huggingface.co/spaces/YOUR_USERNAME/lecture-whisperer
+cd lecture-whisperer
+pip install -r requirements.txt
+```
+### Set your HF Token
+```bash
+export HF_TOKEN=hf_your_token_here
+```
+### Run
+```bash
+python app.py
+```
+Then open [http://localhost:7860](http://localhost:7860) in your browser.
+---
+## 🔑 Required Secrets (for HF Spaces)
+Go to your Space → **Settings → Variables and secrets** → add:
+| Secret Name | Value |
+|---|---|
+| `HF_TOKEN` | Your Hugging Face API token (read access) |
+Make sure you have accepted the terms for gated models:
+- [Meta Llama 3](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) — click "Agree and access repository"
+- [Qwen2-VL](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct) — click "Agree and access repository"
+---
+## 📁 Project Structure
+```
+lecture-whisperer/
+├── app.py              # Main Gradio application
+├── requirements.txt    # Python dependencies
+├── packages.txt        # System dependencies (poppler-utils)
+└── README.md           # This file
+```
+---
+## 📦 Dependencies
+```
+gradio>=6.0.0
+pdf2image>=1.17.0
+Pillow>=10.0.0
+requests>=2.31.0
+```
+System dependency (handled by `packages.txt` on HF Spaces):
+```
+poppler-utils
+```
+---
+## ⚠️ Known Limitations
+- **Processing time** — Whisper transcription via the free Inference API can take 2–5 minutes for a 1-hour lecture. The app includes automatic retry logic for cold-start delays.
+- **Sync accuracy** — The current sync engine uses keyword overlap scoring. It works well for technical content but may miss semantic matches (e.g. paraphrased concepts). Future versions will use sentence embeddings.
+- **API rate limits** — The HF free Inference API has rate limits. For heavy usage, consider upgrading to a PRO token or running models locally.
+- **Gated models** — Llama-3 and Qwen2-VL require accepting license terms on the HF model page before your token can access them.
+---
+## 🗺️ Roadmap
+- [ ] Sentence-embedding based sync (replace keyword overlap with `all-MiniLM-L6-v2`)
+- [ ] One-click lecture summary (5 bullet points)
+- [ ] Export quiz as downloadable PDF
+- [ ] Speaker diarization (identify multiple speakers)
+- [ ] Support for YouTube URLs as audio input
+- [ ] Persistent chat history per session
+---
+## 🤝 Contributing
+Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.
+---
+## 📄 License
+This project is licensed under the [MIT License](LICENSE).
+---
+## 🙏 Acknowledgements
+- [OpenAI Whisper](https://github.com/openai/whisper) for the transcription model
+- [Qwen Team at Alibaba](https://huggingface.co/Qwen) for Qwen2-VL
+- [Meta AI](https://ai.meta.com) for Llama 3
+- [Hugging Face](https://huggingface.co) for the Inference API and Spaces platform
+- [Gradio](https://gradio.app) for the UI framework
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference