File size: 7,951 Bytes
967d35a
 
 
 
49f9e95
967d35a
7497600
967d35a
 
 
 
 
f9e2c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7773c49
f9e2c6d
 
 
 
 
7773c49
 
f9e2c6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
---
title: DevDocs AI
emoji: πŸ€–
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: "5.9.1"
python_version: "3.10"
app_file: app.py
pinned: false
---

# DevDocsAI

# DevDocs AI β€” Codebase RAG Assistant

A production-quality **Retrieval-Augmented Generation** system for querying codebases with natural language. Upload any ZIP archive, index it once, and ask questions about the code.
![alt text](one.png)

## Architecture

```
User Query
    β”‚
    β–Ό
[Query Rewriter]  ← optional rule-based or LLM rewrite
    β”‚
    β–Ό
[Retriever]  ← similarity search OR MMR (configurable)
    β”‚         ChromaDB + HuggingFace all-MiniLM-L6-v2 embeddings
    β–Ό
[Retrieved Chunks]
    β”‚
    β”œβ”€β”€β†’ [LLM Generator]  β†’ Answer  (gpt-4.1-nano, 1 call)
    β”‚
    └──→ [Evaluator]
              β”œβ”€β”€ Retrieval Metrics (Recall@K, MRR, nDCG) β€” FREE
              └── LLM Judge (Accuracy, Completeness, Relevance, Groundedness) β€” 1 call
```

## Cost Model

| Operation            | Cost             |
|----------------------|------------------|
| Embedding (indexing) | **FREE** (local) |
| Embedding (query)    | **FREE** (local) |
| Answer generation    | ~$0.0001 / query |
| LLM judge evaluation | ~$0.0001 / query |
| Query rewriting (LLM)| ~$0.00005 / query|

> At $5 budget you can run ~25,000 queries with full evaluation enabled.


## Project Structure

```
devdocs-ai/
β”œβ”€β”€ app.py                    # Gradio UI (3 tabs)
β”œβ”€β”€ config.py                 # All configuration in one place
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”‚
β”œβ”€β”€ ingestion/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ loader.py             # ZIP extraction + file reading
β”‚   β”œβ”€β”€ chunker.py            # AST-aware Python chunking + generic splitter
β”‚   └── indexer.py            # HuggingFace embeddings + ChromaDB persistence
β”‚
β”œβ”€β”€ retrieval/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ retriever.py          # Similarity + MMR search
β”‚   └── query_rewriter.py     # Rule-based + optional LLM rewrite
β”‚
β”œβ”€β”€ llm/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── generator.py          # Grounded answer generation via litellm
β”‚
β”œβ”€β”€ evaluation/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ metrics.py            # Recall@K, MRR, nDCG (free, keyword-based)
β”‚   └── judge.py              # LLM-as-judge (Accuracy/Completeness/Relevance/Groundedness)
β”‚
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── helpers.py            # Logging, display formatters
β”‚
└── data/
    β”œβ”€β”€ uploads/              # Extracted ZIP contents (auto-created)
    └── vector_db/            # ChromaDB persistent storage (auto-created)
```

## Quick Start

### 1. Clone / download the project

```bash
cd devdocs-ai
```

### 2. Create virtual environment

```bash
python -m venv venv
source venv/bin/activate        # Linux/macOS
# venv\Scripts\activate         # Windows
```

### 3. Install dependencies

```bash
pip install -r requirements.txt
```

> First run will download the `all-MiniLM-L6-v2` model (~90 MB) automatically.

### 4. Set your OpenAI API key

```bash
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...
```

Or export directly:

```bash
export OPENAI_API_KEY="sk-your-key-here"
```

### 5. Launch the app

```bash
python app.py
```

Open **http://localhost:7860** in your browser.

---

## Usage Guide

### Tab 1 β€” Index Repository
![alt text](two.png)
1. Click **Upload ZIP file** and select your repository archive.
2. Click **πŸš€ Index Repository**.
3. Wait for the status message β€” indexing is one-time per repository.

> Re-indexing a new ZIP clears the previous index automatically.

### Tab 2 β€” Ask Questions

1. Type a natural language question.
2. Configure retrieval options:
   - **Top-K**: number of chunks to retrieve (default 5)
   - **Use MMR**: diversity-aware retrieval (avoids redundant chunks)
   - **Use query rewriting**: expands abbreviations before retrieval
   - **Run evaluation**: computes all metrics (costs 1 extra LLM call)
3. Click **πŸ” Ask**.
4. View the **Answer**, **Retrieved Chunks**, and **Metrics Panel**.
 ![alt text](<three.png>)
 

### Tab 3 β€” Compare Modes

Run both **Similarity** and **MMR** retrieval side-by-side for the same question to compare answer quality and chunk diversity.
 ![alt text](<four.png>) 
---

## Configuration Reference

All parameters are in `config.py`:

| Parameter              | Default               | Description                              |
|------------------------|-----------------------|------------------------------------------|
| `EMBEDDING_MODEL`      | `all-MiniLM-L6-v2`   | HuggingFace sentence-transformer model   |
| `CHUNK_SIZE`           | `400` tokens          | Target chunk size                        |
| `CHUNK_OVERLAP`        | `60` tokens           | Overlap between consecutive chunks      |
| `DEFAULT_TOP_K`        | `5`                   | Chunks retrieved per query               |
| `MMR_FETCH_K`          | `20`                  | Candidate pool size for MMR              |
| `MMR_LAMBDA_MULT`      | `0.5`                 | MMR diversity/relevance balance (0–1)    |
| `LLM_MODEL`            | `openai/gpt-4.1-nano` | LLM for answer generation                |
| `LLM_MAX_TOKENS`       | `1024`                | Max tokens in LLM response               |
| `ALLOWED_EXTENSIONS`   | `.py .js .ts .md ...` | File types included in indexing          |
| `MAX_FILE_SIZE_MB`     | `2`                   | Files larger than this are skipped       |

---

## Evaluation Metrics Explained

### Retrieval Metrics (free, keyword-based proxy)

| Metric     | Formula                                          | Range |
|------------|--------------------------------------------------|-------|
| Recall@K   | relevant retrieved / K                           | 0–1   |
| MRR        | 1 / rank of first relevant doc                   | 0–1   |
| nDCG@K     | DCG / IDCG using binary relevance                | 0–1   |

> Relevance is determined by keyword overlap between query and chunk (β‰₯2 shared tokens).

### Answer Quality (LLM judge, 1 call)

| Dimension     | Meaning                                           | Scale |
|---------------|---------------------------------------------------|-------|
| Accuracy      | Every claim is factually correct given context    | 1–5   |
| Completeness  | All parts of the question are addressed           | 1–5   |
| Relevance     | Answer is focused and on-topic                    | 1–5   |
| Groundedness  | All claims are directly supported by context      | 1–5   |
| Overall       | Mean of the four scores                           | 1–5   |

---
![alt text](<Screenshot 2026-03-28 113804.png>) 
## Supported File Types

`.py` `.js` `.ts` `.jsx` `.tsx` `.md` `.txt` `.java` `.go` `.rs` `.cpp` `.c` `.h`

---

## Chunking Strategy

| File Type     | Strategy                                                        |
|---------------|-----------------------------------------------------------------|
| `.py`         | AST-based: one chunk per top-level function/class               |
| All others    | Recursive character splitter (400-token chunks, 60-token overlap)|

Python files that fail AST parsing (e.g. syntax errors) fall back to the generic splitter automatically.

---

## Troubleshooting

**"Vector store is empty" error**
β†’ Index a repository first via Tab 1.

**Slow first query**
β†’ The embedding model is downloaded on first use (~90 MB). Subsequent runs are fast.

**"No API key" warnings**
β†’ Set `OPENAI_API_KEY` in `.env` or as an environment variable.

**ChromaDB dimension mismatch error**
β†’ Delete `data/vector_db/` and re-index. This happens if you switch embedding models mid-session.

```bash
rm -rf data/vector_db/
```

**Out of memory on large repos**
β†’ Lower `MAX_FILE_SIZE_MB` in `config.py` or reduce `CHUNK_SIZE`.