File size: 1,516 Bytes
9ae77d7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# data/test_set/

Golden evaluation queries for ArunCore. This folder will contain a curated set of real questions used to validate retrieval quality and answer accuracy before the agent is deployed.

---

## Purpose
Before writing the ingestion pipeline or the agent layer, a test set must exist. Without it, there is no way to know if the retrieval system is working — whether the right chunks are being returned for the right questions, and whether the LLM is synthesising accurate answers from them.

---

## What Goes Here
A JSON file (`eval_set.json`) containing 30–50 questions across these categories:

| Category | Example Questions |
|---|---|
| **Identity** | "Who are you?" / "What do you do?" |
| **Project-specific** | "How does your legal RAG system handle exact section lookups?" |
| **Tech stack** | "What databases have you worked with?" |
| **Decision reasoning** | "Why did you use ChromaDB?" |
| **Personal background** | "How did you get into engineering?" |
| **Cross-project** | "Which projects use Groq?" |
| **Negative (out-of-scope)** | "What is your salary expectation?" ← should get a graceful fallback |

---

## Format (Planned)
```json
[
  {
    "id": "q001",
    "question": "What is your most complex project?",
    "expected_source": "legal_RAG_system/readme.md",
    "expected_topics": ["RAG", "legal documents", "ChromaDB"],
    "category": "project-specific"
  }
]
```

---

## Current Status
🔲 Not yet built — to be created after the ingestion pipeline is complete.