Ryan
upgrade to gpt-4o
fd01d7b
---
title: 80,000 Hours RAG Q&A
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
---
# 🎯 80,000 Hours Career Advice Q&A
A Retrieval-Augmented Generation (RAG) system that answers career-related questions using content from [80,000 Hours](https://80000hours.org/), with validated citations.
## Features
- πŸ” **Semantic Search**: Retrieves relevant content from 80,000 Hours articles
- πŸ€– **AI-Powered Answers**: Uses GPT-4o-mini to generate comprehensive responses
- βœ… **Citation Validation**: Automatically validates that quotes exist in source material
- πŸ“š **Source Attribution**: Every answer includes validated citations with URLs
## How It Works
1. Your question is converted to a vector embedding
2. Relevant article chunks are retrieved from Qdrant vector database
3. GPT-4o generates an answer with citations
4. Citations are validated against source material
5. You get an answer with verified quotes and source links
## Configuration for Hugging Face Spaces
To deploy this app, you need to configure the following **Secrets** in your Space settings:
1. Go to your Space β†’ Settings β†’ Variables and Secrets
2. Add these secrets:
- `QDRANT_URL`: Your Qdrant cloud instance URL
- `QDRANT_API_KEY`: Your Qdrant API key
- `OPENAI_API_KEY`: Your OpenAI API key
## Local Development
### Setup
1. Install dependencies:
```bash
pip install -r requirements.txt
```
2. Create `.env` file with:
```
QDRANT_URL=your_url
QDRANT_API_KEY=your_key
OPENAI_API_KEY=your_key
```
### First Time Setup (run in order):
1. **Extract articles** β†’ `python extract_articles_cli.py`
- Scrapes 80,000 Hours articles from sitemap
- Only needed once (or to refresh content)
2. **Chunk articles** β†’ `python chunk_articles_cli.py`
- Splits articles into semantic chunks
3. **Upload to Qdrant** β†’ `python upload_to_qdrant_cli.py`
- Generates embeddings and uploads to vector DB
### Running Locally
**Web Interface:**
```bash
python app.py
```
**Command Line:**
```bash
python rag_chat.py "your question here"
python rag_chat.py "your question" --show-context
```
## Project Structure
- `app.py` - Main Gradio web interface
- `rag_chat.py` - RAG logic and CLI interface
- `citation_validator.py` - Citation validation system
- `extract_articles_cli.py` - Article scraper
- `chunk_articles_cli.py` - Article chunking
- `upload_to_qdrant_cli.py` - Vector DB uploader
- `config.py` - Shared configuration
## Tech Stack
- **Frontend**: Gradio 4.0+
- **LLM**: OpenAI GPT-4o-mini
- **Vector DB**: Qdrant Cloud
- **Embeddings**: sentence-transformers (all-MiniLM-L6-v2)
- **Citation Validation**: rapidfuzz for fuzzy matching
## Credits
Content sourced from [80,000 Hours](https://80000hours.org/), a nonprofit that provides research and support to help people find careers that effectively tackle the world's most pressing problems.