Spaces:
Runtime error
Runtime error
File size: 2,481 Bytes
086f690 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | ---
title: EnggSS RAG ChatBot
emoji: ⚡
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: "5.0.0"
app_file: app.py
pinned: false
license: other
---
# EnggSS RAG ChatBot
**Serving-only** HuggingFace Space — reads a pre-built private dataset, no PDF
processing at runtime. Build the dataset locally with
`preprocessing/create_dataset.py`, then deploy this Space to answer questions.
## How it works
```
Local machine (once)
PDFs → create_dataset.py → BAAI/bge-large-en-v1.5 embeddings
│
▼
Private HuggingFace Dataset
│
┌─────────────────────┘
▼ (Space startup)
Load dataset → NumPy float32 matrix (L2-normalised)
│
▼ (each query, ~20 ms)
Embed query → cosine scores → MMR top-3
│
▼
Qwen2.5-7B-Instruct (HF Inference API) → answer
│
▼
Gradio UI
```
## Tabs
| Tab | Purpose |
|-----|---------|
| 💬 Q&A | Ask questions; see top-3 retrieved contexts + generated answer |
| 📊 Analytics | Total chunks, documents processed, per-file breakdown |
## Required Space Secrets
Set in **Settings → Variables and Secrets**:
| Secret | Description |
|--------|-------------|
| `HF_TOKEN` | HuggingFace token — needs **read** access to the dataset repo |
| `HF_DATASET_REPO` | e.g. `your-org/enggss-rag-dataset` (created by preprocessing script) |
## Setup order
1. **Run preprocessing locally** (once, or when you add new PDFs):
```bash
cd preprocessing
pip install -r requirements.txt
python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset
```
2. **Deploy this Space** — upload `app.py` + `requirements.txt` + `README.md`
3. **Set the two secrets** above in Space Settings → Secrets
4. Space restarts, loads the dataset, and is ready to answer questions
To add new PDFs later without rebuilding everything:
```bash
python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset --update
```
## Local development
```bash
git clone https://huggingface.co/spaces/your-org/enggss-rag-chatbot
cd enggss-rag-chatbot
pip install -r requirements.txt
# create .env with HF_TOKEN and HF_DATASET_REPO
python app.py
```
|