Ryan
upgrade to gpt-4o
fd01d7b

A newer version of the Gradio SDK is available: 6.5.1

Upgrade
metadata
title: 80,000 Hours RAG Q&A
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false

🎯 80,000 Hours Career Advice Q&A

A Retrieval-Augmented Generation (RAG) system that answers career-related questions using content from 80,000 Hours, with validated citations.

Features

  • πŸ” Semantic Search: Retrieves relevant content from 80,000 Hours articles
  • πŸ€– AI-Powered Answers: Uses GPT-4o-mini to generate comprehensive responses
  • βœ… Citation Validation: Automatically validates that quotes exist in source material
  • πŸ“š Source Attribution: Every answer includes validated citations with URLs

How It Works

  1. Your question is converted to a vector embedding
  2. Relevant article chunks are retrieved from Qdrant vector database
  3. GPT-4o generates an answer with citations
  4. Citations are validated against source material
  5. You get an answer with verified quotes and source links

Configuration for Hugging Face Spaces

To deploy this app, you need to configure the following Secrets in your Space settings:

  1. Go to your Space β†’ Settings β†’ Variables and Secrets
  2. Add these secrets:
    • QDRANT_URL: Your Qdrant cloud instance URL
    • QDRANT_API_KEY: Your Qdrant API key
    • OPENAI_API_KEY: Your OpenAI API key

Local Development

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Create .env file with:
QDRANT_URL=your_url
QDRANT_API_KEY=your_key
OPENAI_API_KEY=your_key

First Time Setup (run in order):

  1. Extract articles β†’ python extract_articles_cli.py

    • Scrapes 80,000 Hours articles from sitemap
    • Only needed once (or to refresh content)
  2. Chunk articles β†’ python chunk_articles_cli.py

    • Splits articles into semantic chunks
  3. Upload to Qdrant β†’ python upload_to_qdrant_cli.py

    • Generates embeddings and uploads to vector DB

Running Locally

Web Interface:

python app.py

Command Line:

python rag_chat.py "your question here"
python rag_chat.py "your question" --show-context

Project Structure

  • app.py - Main Gradio web interface
  • rag_chat.py - RAG logic and CLI interface
  • citation_validator.py - Citation validation system
  • extract_articles_cli.py - Article scraper
  • chunk_articles_cli.py - Article chunking
  • upload_to_qdrant_cli.py - Vector DB uploader
  • config.py - Shared configuration

Tech Stack

  • Frontend: Gradio 4.0+
  • LLM: OpenAI GPT-4o-mini
  • Vector DB: Qdrant Cloud
  • Embeddings: sentence-transformers (all-MiniLM-L6-v2)
  • Citation Validation: rapidfuzz for fuzzy matching

Credits

Content sourced from 80,000 Hours, a nonprofit that provides research and support to help people find careers that effectively tackle the world's most pressing problems.