abhivsh commited on
Commit
5ccd66d
·
verified ·
1 Parent(s): 577eb82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -75
README.md CHANGED
@@ -2,82 +2,11 @@
2
  title: EnggSS RAG ChatBot
3
  emoji: ⚡
4
  colorFrom: blue
5
- colorTo: indigo
6
  sdk: gradio
7
- sdk_version: "5.0.0"
8
  app_file: app.py
9
  pinned: false
10
  license: other
11
- ---
12
-
13
- # EnggSS RAG ChatBot
14
-
15
- **Serving-only** HuggingFace Space — reads a pre-built private dataset, no PDF
16
- processing at runtime. Build the dataset locally with
17
- `preprocessing/create_dataset.py`, then deploy this Space to answer questions.
18
-
19
- ## How it works
20
-
21
- ```
22
- Local machine (once)
23
- PDFs → create_dataset.py → BAAI/bge-large-en-v1.5 embeddings
24
-
25
-
26
- Private HuggingFace Dataset
27
-
28
- ┌─────────────────────┘
29
- ▼ (Space startup)
30
- Load dataset → NumPy float32 matrix (L2-normalised)
31
-
32
- ▼ (each query, ~20 ms)
33
- Embed query → cosine scores → MMR top-3
34
-
35
-
36
- Qwen2.5-7B-Instruct (HF Inference API) → answer
37
-
38
-
39
- Gradio UI
40
- ```
41
-
42
- ## Tabs
43
-
44
- | Tab | Purpose |
45
- |-----|---------|
46
- | 💬 Q&A | Ask questions; see top-3 retrieved contexts + generated answer |
47
- | 📊 Analytics | Total chunks, documents processed, per-file breakdown |
48
-
49
- ## Required Space Secrets
50
-
51
- Set in **Settings → Variables and Secrets**:
52
-
53
- | Secret | Description |
54
- |--------|-------------|
55
- | `HF_TOKEN` | HuggingFace token — needs **read** access to the dataset repo |
56
- | `HF_DATASET_REPO` | e.g. `your-org/enggss-rag-dataset` (created by preprocessing script) |
57
-
58
- ## Setup order
59
-
60
- 1. **Run preprocessing locally** (once, or when you add new PDFs):
61
- ```bash
62
- cd preprocessing
63
- pip install -r requirements.txt
64
- python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset
65
- ```
66
- 2. **Deploy this Space** — upload `app.py` + `requirements.txt` + `README.md`
67
- 3. **Set the two secrets** above in Space Settings → Secrets
68
- 4. Space restarts, loads the dataset, and is ready to answer questions
69
-
70
- To add new PDFs later without rebuilding everything:
71
- ```bash
72
- python create_dataset.py ./pdfs --repo your-org/enggss-rag-dataset --update
73
- ```
74
-
75
- ## Local development
76
-
77
- ```bash
78
- git clone https://huggingface.co/spaces/your-org/enggss-rag-chatbot
79
- cd enggss-rag-chatbot
80
- pip install -r requirements.txt
81
- # create .env with HF_TOKEN and HF_DATASET_REPO
82
- python app.py
83
- ```
 
2
  title: EnggSS RAG ChatBot
3
  emoji: ⚡
4
  colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 5.0.0
8
  app_file: app.py
9
  pinned: false
10
  license: other
11
+ python_version: 3.1
12
+ ---