A newer version of the Gradio SDK is available: 6.14.0
title: DevDocs AI
emoji: π€
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 5.9.1
python_version: '3.10'
app_file: app.py
pinned: false
DevDocsAI
DevDocs AI β Codebase RAG Assistant
A production-quality Retrieval-Augmented Generation system for querying codebases with natural language. Upload any ZIP archive, index it once, and ask questions about the code.

Architecture
User Query
β
βΌ
[Query Rewriter] β optional rule-based or LLM rewrite
β
βΌ
[Retriever] β similarity search OR MMR (configurable)
β ChromaDB + HuggingFace all-MiniLM-L6-v2 embeddings
βΌ
[Retrieved Chunks]
β
ββββ [LLM Generator] β Answer (gpt-4.1-nano, 1 call)
β
ββββ [Evaluator]
βββ Retrieval Metrics (Recall@K, MRR, nDCG) β FREE
βββ LLM Judge (Accuracy, Completeness, Relevance, Groundedness) β 1 call
Cost Model
| Operation | Cost |
|---|---|
| Embedding (indexing) | FREE (local) |
| Embedding (query) | FREE (local) |
| Answer generation | ~$0.0001 / query |
| LLM judge evaluation | ~$0.0001 / query |
| Query rewriting (LLM) | ~$0.00005 / query |
At $5 budget you can run ~25,000 queries with full evaluation enabled.
Project Structure
devdocs-ai/
βββ app.py # Gradio UI (3 tabs)
βββ config.py # All configuration in one place
βββ requirements.txt
βββ .env.example
β
βββ ingestion/
β βββ __init__.py
β βββ loader.py # ZIP extraction + file reading
β βββ chunker.py # AST-aware Python chunking + generic splitter
β βββ indexer.py # HuggingFace embeddings + ChromaDB persistence
β
βββ retrieval/
β βββ __init__.py
β βββ retriever.py # Similarity + MMR search
β βββ query_rewriter.py # Rule-based + optional LLM rewrite
β
βββ llm/
β βββ __init__.py
β βββ generator.py # Grounded answer generation via litellm
β
βββ evaluation/
β βββ __init__.py
β βββ metrics.py # Recall@K, MRR, nDCG (free, keyword-based)
β βββ judge.py # LLM-as-judge (Accuracy/Completeness/Relevance/Groundedness)
β
βββ utils/
β βββ __init__.py
β βββ helpers.py # Logging, display formatters
β
βββ data/
βββ uploads/ # Extracted ZIP contents (auto-created)
βββ vector_db/ # ChromaDB persistent storage (auto-created)
Quick Start
1. Clone / download the project
cd devdocs-ai
2. Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/macOS
# venv\Scripts\activate # Windows
3. Install dependencies
pip install -r requirements.txt
First run will download the
all-MiniLM-L6-v2model (~90 MB) automatically.
4. Set your OpenAI API key
cp .env.example .env
# Edit .env and set OPENAI_API_KEY=sk-...
Or export directly:
export OPENAI_API_KEY="sk-your-key-here"
5. Launch the app
python app.py
Open http://localhost:7860 in your browser.
Usage Guide
Tab 1 β Index Repository
- Click Upload ZIP file and select your repository archive.
- Click π Index Repository.
- Wait for the status message β indexing is one-time per repository.
Re-indexing a new ZIP clears the previous index automatically.
Tab 2 β Ask Questions
- Type a natural language question.
- Configure retrieval options:
- Top-K: number of chunks to retrieve (default 5)
- Use MMR: diversity-aware retrieval (avoids redundant chunks)
- Use query rewriting: expands abbreviations before retrieval
- Run evaluation: computes all metrics (costs 1 extra LLM call)
- Click π Ask.
- View the Answer, Retrieved Chunks, and Metrics Panel.

Tab 3 β Compare Modes
Run both Similarity and MMR retrieval side-by-side for the same question to compare answer quality and chunk diversity.
Configuration Reference
All parameters are in config.py:
| Parameter | Default | Description |
|---|---|---|
EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
HuggingFace sentence-transformer model |
CHUNK_SIZE |
400 tokens |
Target chunk size |
CHUNK_OVERLAP |
60 tokens |
Overlap between consecutive chunks |
DEFAULT_TOP_K |
5 |
Chunks retrieved per query |
MMR_FETCH_K |
20 |
Candidate pool size for MMR |
MMR_LAMBDA_MULT |
0.5 |
MMR diversity/relevance balance (0β1) |
LLM_MODEL |
openai/gpt-4.1-nano |
LLM for answer generation |
LLM_MAX_TOKENS |
1024 |
Max tokens in LLM response |
ALLOWED_EXTENSIONS |
.py .js .ts .md ... |
File types included in indexing |
MAX_FILE_SIZE_MB |
2 |
Files larger than this are skipped |
Evaluation Metrics Explained
Retrieval Metrics (free, keyword-based proxy)
| Metric | Formula | Range |
|---|---|---|
| Recall@K | relevant retrieved / K | 0β1 |
| MRR | 1 / rank of first relevant doc | 0β1 |
| nDCG@K | DCG / IDCG using binary relevance | 0β1 |
Relevance is determined by keyword overlap between query and chunk (β₯2 shared tokens).
Answer Quality (LLM judge, 1 call)
| Dimension | Meaning | Scale |
|---|---|---|
| Accuracy | Every claim is factually correct given context | 1β5 |
| Completeness | All parts of the question are addressed | 1β5 |
| Relevance | Answer is focused and on-topic | 1β5 |
| Groundedness | All claims are directly supported by context | 1β5 |
| Overall | Mean of the four scores | 1β5 |
Supported File Types
.py .js .ts .jsx .tsx .md .txt .java .go .rs .cpp .c .h
Chunking Strategy
| File Type | Strategy |
|---|---|
.py |
AST-based: one chunk per top-level function/class |
| All others | Recursive character splitter (400-token chunks, 60-token overlap) |
Python files that fail AST parsing (e.g. syntax errors) fall back to the generic splitter automatically.
Troubleshooting
"Vector store is empty" error β Index a repository first via Tab 1.
Slow first query β The embedding model is downloaded on first use (~90 MB). Subsequent runs are fast.
"No API key" warnings
β Set OPENAI_API_KEY in .env or as an environment variable.
ChromaDB dimension mismatch error
β Delete data/vector_db/ and re-index. This happens if you switch embedding models mid-session.
rm -rf data/vector_db/
Out of memory on large repos
β Lower MAX_FILE_SIZE_MB in config.py or reduce CHUNK_SIZE.

