Spaces:
Sleeping
Sleeping
Neural Arun
Identity Expansion: Integrated massive ArunCore documentation and master portfolio summary into live Vector DB
97f4848 | # data/test_set/ | |
| Golden evaluation queries for ArunCore. This folder will contain a curated set of real questions used to validate retrieval quality and answer accuracy before the agent is deployed. | |
| --- | |
| ## Purpose | |
| Before writing the ingestion pipeline or the agent layer, a test set must exist. Without it, there is no way to know if the retrieval system is working β whether the right chunks are being returned for the right questions, and whether the LLM is synthesising accurate answers from them. | |
| --- | |
| ## What Goes Here | |
| A JSON file (`eval_set.json`) containing 30β50 questions across these categories: | |
| | Category | Example Questions | | |
| |---|---| | |
| | **Identity** | "Who are you?" / "What do you do?" | | |
| | **Project-specific** | "How does your legal RAG system handle exact section lookups?" | | |
| | **Tech stack** | "What databases have you worked with?" | | |
| | **Decision reasoning** | "Why did you use ChromaDB?" | | |
| | **Personal background** | "How did you get into engineering?" | | |
| | **Cross-project** | "Which projects use Groq?" | | |
| | **Negative (out-of-scope)** | "What is your salary expectation?" β should get a graceful fallback | | |
| --- | |
| ## Format (Planned) | |
| ```json | |
| [ | |
| { | |
| "id": "q001", | |
| "question": "What is your most complex project?", | |
| "expected_source": "legal_RAG_system/readme.md", | |
| "expected_topics": ["RAG", "legal documents", "ChromaDB"], | |
| "category": "project-specific" | |
| } | |
| ] | |
| ``` | |
| --- | |
| ## Current Status | |
| π² Not yet built β to be created after the ingestion pipeline is complete. | |