chatbot-gitconnect / README.md
quantumbit's picture
saving db files to neon
812b65b
---
title: GitConnect FastAPI Service
emoji: "🚀"
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: "1.0.0"
python_version: "3.12"
app_file: app.py
pinned: false
---
# GitConnect FastAPI Service
FastAPI backend with two primary features:
- Syllabus processing from PDF URLs with FAISS indexing and multilingual AI summaries.
- Chatbot responses grounded with RAG from both syllabus content and student performance data.
## Core Stack
- API: FastAPI + Uvicorn
- Embeddings: sentence-transformers/all-MiniLM-L6-v2
- Vector Search: FAISS (IndexFlatIP with normalized vectors)
- LLM generation/summarization: Gemini
RAG index persistence:
- Runtime retrieval uses semester FAISS files plus semester PKL records.
- Optional cloud persistence to Neon is enabled with RAG_INDEX_DB_URL.
- On syllabus processing, updated semester FAISS and PKL are uploaded to Neon.
- On server restart, if local files are missing, the server hydrates them from Neon automatically.
About stored artifacts:
- Needed for RAG runtime: semester_x.faiss and semester_x.pkl.
- Raw text files under data/raw_text are not required for search-time RAG.
- Raw text is mainly useful for debugging/auditing ingestion outputs.
## Local Setup
1. Create and activate a virtual environment.
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Create `.env` from `.env.example` and set `GEMINI_API_KEY`.
4. Run the service:
```bash
uvicorn app.main:app --reload
```
## Endpoints
- `GET /health`
- `POST /api/syllabus/process`
- `POST /api/chat`
Student performance source:
- `STUDENT_PERFORMANCE_URL_TEMPLATE` (default)
- `https://git-connect-backend-v2.vercel.app/api/student/{student_id}/performance`
## Hugging Face Spaces Deployment
This repository is set up for Docker Spaces deployment.
Deployment-critical files:
- `Dockerfile`
- `requirements.txt`
- `app/`
- `.github/workflows/deploy-hf-space.yml`
Files excluded from git push:
- `.env` and `app/.env`
- generated `data/` files
- local caches and `__pycache__/`
GitHub Action deployment workflow:
- Trigger: push to `main` or manual run
- Workflow: `.github/workflows/deploy-hf-space.yml`
- Required repository secrets:
- `HF_TOKEN`
- `HF_SPACE_REPO_ID` (format: `username/space-name`)
## Sample Requests
Syllabus processing:
```json
[
{
"course_code": "22CS501",
"name": "Database Management Systems",
"course_type": "theory",
"syllabus_url": "https://example.com/dbms.pdf",
"semester": 5
}
]
```
Chat:
```json
{
"query": "How can I improve attendance this semester?",
"history": [
{"role": "user", "content": "Hi"},
{"role": "assistant", "content": "Hello"}
],
"student_id": 2,
"lang_code": "en",
"semester": 5
}
```
Configuration reference:
- https://huggingface.co/docs/hub/spaces-config-reference