Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.5.1
metadata
title: 80,000 Hours RAG Q&A
emoji: π―
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
π― 80,000 Hours Career Advice Q&A
A Retrieval-Augmented Generation (RAG) system that answers career-related questions using content from 80,000 Hours, with validated citations.
Features
- π Semantic Search: Retrieves relevant content from 80,000 Hours articles
- π€ AI-Powered Answers: Uses GPT-4o-mini to generate comprehensive responses
- β Citation Validation: Automatically validates that quotes exist in source material
- π Source Attribution: Every answer includes validated citations with URLs
How It Works
- Your question is converted to a vector embedding
- Relevant article chunks are retrieved from Qdrant vector database
- GPT-4o generates an answer with citations
- Citations are validated against source material
- You get an answer with verified quotes and source links
Configuration for Hugging Face Spaces
To deploy this app, you need to configure the following Secrets in your Space settings:
- Go to your Space β Settings β Variables and Secrets
- Add these secrets:
QDRANT_URL: Your Qdrant cloud instance URLQDRANT_API_KEY: Your Qdrant API keyOPENAI_API_KEY: Your OpenAI API key
Local Development
Setup
- Install dependencies:
pip install -r requirements.txt
- Create
.envfile with:
QDRANT_URL=your_url
QDRANT_API_KEY=your_key
OPENAI_API_KEY=your_key
First Time Setup (run in order):
Extract articles β
python extract_articles_cli.py- Scrapes 80,000 Hours articles from sitemap
- Only needed once (or to refresh content)
Chunk articles β
python chunk_articles_cli.py- Splits articles into semantic chunks
Upload to Qdrant β
python upload_to_qdrant_cli.py- Generates embeddings and uploads to vector DB
Running Locally
Web Interface:
python app.py
Command Line:
python rag_chat.py "your question here"
python rag_chat.py "your question" --show-context
Project Structure
app.py- Main Gradio web interfacerag_chat.py- RAG logic and CLI interfacecitation_validator.py- Citation validation systemextract_articles_cli.py- Article scraperchunk_articles_cli.py- Article chunkingupload_to_qdrant_cli.py- Vector DB uploaderconfig.py- Shared configuration
Tech Stack
- Frontend: Gradio 4.0+
- LLM: OpenAI GPT-4o-mini
- Vector DB: Qdrant Cloud
- Embeddings: sentence-transformers (all-MiniLM-L6-v2)
- Citation Validation: rapidfuzz for fuzzy matching
Credits
Content sourced from 80,000 Hours, a nonprofit that provides research and support to help people find careers that effectively tackle the world's most pressing problems.