Spaces:

Automaton9
/

80000_Hours_AI_Assistant

Sleeping

App Files Files Community

80000_Hours_AI_Assistant / README.md

Ryan

upgrade to gpt-4o

fd01d7b 4 months ago

preview code

raw

history blame contribute delete

2.93 kB

A newer version of the Gradio SDK is available: 6.5.1

Upgrade

metadata

title: 80,000 Hours RAG Q&A
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false

🎯 80,000 Hours Career Advice Q&A

A Retrieval-Augmented Generation (RAG) system that answers career-related questions using content from 80,000 Hours, with validated citations.

Features

🔍 Semantic Search: Retrieves relevant content from 80,000 Hours articles
🤖 AI-Powered Answers: Uses GPT-4o-mini to generate comprehensive responses
✅ Citation Validation: Automatically validates that quotes exist in source material
📚 Source Attribution: Every answer includes validated citations with URLs

How It Works

Your question is converted to a vector embedding
Relevant article chunks are retrieved from Qdrant vector database
GPT-4o generates an answer with citations
Citations are validated against source material
You get an answer with verified quotes and source links

Configuration for Hugging Face Spaces

To deploy this app, you need to configure the following Secrets in your Space settings:

Go to your Space → Settings → Variables and Secrets
Add these secrets:
- QDRANT_URL: Your Qdrant cloud instance URL
- QDRANT_API_KEY: Your Qdrant API key
- OPENAI_API_KEY: Your OpenAI API key

Local Development

Setup

Install dependencies:

pip install -r requirements.txt

Create .env file with:

QDRANT_URL=your_url
QDRANT_API_KEY=your_key
OPENAI_API_KEY=your_key

First Time Setup (run in order):

Extract articles → python extract_articles_cli.py
- Scrapes 80,000 Hours articles from sitemap
- Only needed once (or to refresh content)
Chunk articles → python chunk_articles_cli.py
- Splits articles into semantic chunks
Upload to Qdrant → python upload_to_qdrant_cli.py
- Generates embeddings and uploads to vector DB

Running Locally

Web Interface:

python app.py

Command Line:

python rag_chat.py "your question here"
python rag_chat.py "your question" --show-context

Project Structure

app.py - Main Gradio web interface
rag_chat.py - RAG logic and CLI interface
citation_validator.py - Citation validation system
extract_articles_cli.py - Article scraper
chunk_articles_cli.py - Article chunking
upload_to_qdrant_cli.py - Vector DB uploader
config.py - Shared configuration

Tech Stack

Frontend: Gradio 4.0+
LLM: OpenAI GPT-4o-mini
Vector DB: Qdrant Cloud
Embeddings: sentence-transformers (all-MiniLM-L6-v2)
Citation Validation: rapidfuzz for fuzzy matching

Credits

Content sourced from 80,000 Hours, a nonprofit that provides research and support to help people find careers that effectively tackle the world's most pressing problems.