chatbot-gitconnect / README.md
quantumbit's picture
saving db files to neon
812b65b
metadata
title: GitConnect FastAPI Service
emoji: 🚀
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 1.0.0
python_version: '3.12'
app_file: app.py
pinned: false

GitConnect FastAPI Service

FastAPI backend with two primary features:

  • Syllabus processing from PDF URLs with FAISS indexing and multilingual AI summaries.
  • Chatbot responses grounded with RAG from both syllabus content and student performance data.

Core Stack

  • API: FastAPI + Uvicorn
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  • Vector Search: FAISS (IndexFlatIP with normalized vectors)
  • LLM generation/summarization: Gemini

RAG index persistence:

  • Runtime retrieval uses semester FAISS files plus semester PKL records.
  • Optional cloud persistence to Neon is enabled with RAG_INDEX_DB_URL.
  • On syllabus processing, updated semester FAISS and PKL are uploaded to Neon.
  • On server restart, if local files are missing, the server hydrates them from Neon automatically.

About stored artifacts:

  • Needed for RAG runtime: semester_x.faiss and semester_x.pkl.
  • Raw text files under data/raw_text are not required for search-time RAG.
  • Raw text is mainly useful for debugging/auditing ingestion outputs.

Local Setup

  1. Create and activate a virtual environment.
  2. Install dependencies:
pip install -r requirements.txt
  1. Create .env from .env.example and set GEMINI_API_KEY.
  2. Run the service:
uvicorn app.main:app --reload

Endpoints

  • GET /health
  • POST /api/syllabus/process
  • POST /api/chat

Student performance source:

  • STUDENT_PERFORMANCE_URL_TEMPLATE (default)
  • https://git-connect-backend-v2.vercel.app/api/student/{student_id}/performance

Hugging Face Spaces Deployment

This repository is set up for Docker Spaces deployment.

Deployment-critical files:

  • Dockerfile
  • requirements.txt
  • app/
  • .github/workflows/deploy-hf-space.yml

Files excluded from git push:

  • .env and app/.env
  • generated data/ files
  • local caches and __pycache__/

GitHub Action deployment workflow:

  • Trigger: push to main or manual run
  • Workflow: .github/workflows/deploy-hf-space.yml
  • Required repository secrets:
    • HF_TOKEN
    • HF_SPACE_REPO_ID (format: username/space-name)

Sample Requests

Syllabus processing:

[
  {
    "course_code": "22CS501",
    "name": "Database Management Systems",
    "course_type": "theory",
    "syllabus_url": "https://example.com/dbms.pdf",
    "semester": 5
  }
]

Chat:

{
  "query": "How can I improve attendance this semester?",
  "history": [
    {"role": "user", "content": "Hi"},
    {"role": "assistant", "content": "Hello"}
  ],
  "student_id": 2,
  "lang_code": "en",
  "semester": 5
}

Configuration reference: