Spaces:
Runtime error
Runtime error
| title: EnggSS RAG ChatBot | |
| emoji: ⚡ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: "5.0.0" | |
| app_file: app.py | |
| pinned: false | |
| license: other | |
| # EnggSS RAG ChatBot | |
| **Serving-only** HuggingFace Space — reads a pre-built private dataset, no PDF | |
| processing at runtime. Build the dataset locally with | |
| `preprocessing/create_dataset.py`, then deploy this Space to answer questions. | |
| ## How it works | |
| ``` | |
| Local machine (once) | |
| PDFs → create_dataset.py → BAAI/bge-large-en-v1.5 embeddings | |
| │ | |
| ▼ | |
| Private HuggingFace Dataset | |
| │ | |
| ┌─────────────────────┘ | |
| ▼ (Space startup) | |
| Load dataset → NumPy float32 matrix (L2-normalised) | |
| │ | |
| ▼ (each query, ~20 ms) | |
| Embed query → cosine scores → MMR top-3 | |
| │ | |
| ▼ | |
| Qwen2.5-7B-Instruct (HF Inference API) → answer | |
| │ | |
| ▼ | |
| Gradio UI | |
| ``` | |
| ## Tabs | |
| | Tab | Purpose | | |
| |-----|---------| | |
| | 💬 Q&A | Ask questions; see top-3 retrieved contexts + generated answer | | |
| | 📊 Analytics | Total chunks, documents processed, per-file breakdown | | |
| ## Required Space Secrets | |
| Set in **Settings → Variables and Secrets**: | |
| | Secret | Description | | |
| |--------|-------------| | |
| | `HF_TOKEN` | HuggingFace token — needs **read** access to the dataset repo | | |
| | `HF_DATASET_REPO` | e.g. `your-org/enggss-rag-dataset` (created by preprocessing script) | | |
| ## Setup order | |
| 1. **Run preprocessing locally** (once, or when you add new PDFs): | |
| ```bash | |
| cd preprocessing | |
| pip install -r requirements.txt | |
| python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset | |
| ``` | |
| 2. **Deploy this Space** — upload `app.py` + `requirements.txt` + `README.md` | |
| 3. **Set the two secrets** above in Space Settings → Secrets | |
| 4. Space restarts, loads the dataset, and is ready to answer questions | |
| To add new PDFs later without rebuilding everything: | |
| ```bash | |
| python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset --update | |
| ``` | |
| ## Local development | |
| ```bash | |
| git clone https://huggingface.co/spaces/your-org/enggss-rag-chatbot | |
| cd enggss-rag-chatbot | |
| pip install -r requirements.txt | |
| # create .env with HF_TOKEN and HF_DATASET_REPO | |
| python app.py | |
| ``` | |