Spaces:
Sleeping
Sleeping
metadata
title: DeveloperDocs RAG
emoji: ๐ง
colorFrom: blue
colorTo: green
sdk: docker
app_file: app.py
pinned: false
Production-grade RAG system that answers questions using official techstack documentation (eg:fastapi)
๐ฏ What This Project Demonstrates
This is a production-style RAG (Retrieval-Augmented Generation) system that showcases:
- โ Professional documentation ingestion pipeline with chunking strategies
- โ Semantic search using vector embeddings (ChromaDB)
- โ Source attribution with clickable citations
- โ RAG evaluation metrics (RAGAS framework)
- โ Dockerized deployment ready for cloud platforms
- โ Production-grade error handling and logging
๐๏ธ Architecture
โโโโโโโโโโโโโโโ
โ User โ
โ Question โ
โโโโโโโโฌโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. Query Embedding โ
โ (sentence-transformers) โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 2. Vector Search (ChromaDB) โ
โ - Top 5 relevant chunks โ
โ - Metadata: source, section โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 3. Context Assembly โ
โ - Format chunks โ
โ - Add instructions โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 4. LLM Generation (HF Inference) โ
โ - Answer with citations โ
โ - Code examples preserved โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 5. Response + Source Links โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Local Setup
# Clone the repository
git clone https://github.com/aishwarya30998/DeveloperDocs-AI-Copilot-RAG.git
cd DeveloperDocs-AI-Copilot-RAG
# Create virtual environment
python -m venv venv
source venv/bin/activate
# On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# create .env and add your HF_TOKEN
# Run the application
python app.py
Visit http://localhost:7860 in your browser.
๐ฆ Project Structure
fastapi-docs-copilot/
โโโ app.py # Gradio UI application
โโโ Dockerfile # Container configuration
โโโ docker-compose.yml # Local container orchestration
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment variables template
โ
โโโ src/
โ โโโ __init__.py
โ โโโ config.py # Configuration management
โ โโโ chunking.py # Document chunking strategies
โ โโโ embeddings.py # Embedding generation
โ โโโ retriever.py # Vector search logic
โ โโโ rag_pipeline.py # Main RAG orchestration
โ โโโ prompts.py # Prompt templates
โ
โโโ scripts/
โ โโโ ingest_docs.py # Documentation ingestion
โ โโโ evaluate_rag.py # RAG metrics evaluation
โ โโโ test_retrieval.py # Test retrieval quality
โ
โโโ data/
โ โโโ raw/ # Downloaded documentation
โ โโโ processed/ # Chunked documents
โ โโโ vectordb/ # ChromaDB storage
โ
โโโ tests/
โ โโโ test_chunking.py
โ โโโ test_retriever.py
โ โโโ test_rag_pipeline.py
โ
โโโ evals/
โโโ test_queries.json # Evaluation dataset
โโโ results/ # Evaluation outputs
๐ฏ Key Features
1. Smart Chunking
- Semantic chunking with overlap for context preservation
- Metadata enrichment (section titles, URLs, code blocks)
- Configurable chunk sizes (300-800 tokens)
2. Retrieval Quality
- Hybrid search (semantic + keyword)
- Reranking for improved relevance
- Source attribution with confidence scores
3. Answer Generation
- Code-aware formatting (preserves indentation)
- Inline citations with source links
- Fallback handling for low-confidence results
4. Production Features
- Health check endpoint (
/health) - Query logging for analytics
- Rate limiting (basic throttling)
- Error recovery with graceful degradation
๐ RAG Evaluation
We use RAGAS framework to measure:
| Metric | Description | Target Score |
|---|---|---|
| Faithfulness | Answer accuracy vs. context | > 0.8 |
| Answer Relevancy | Response relevance to query | > 0.7 |
| Context Precision | Retrieval accuracy | > 0.75 |
| Context Recall | Context completeness | > 0.8 |
Run evaluations:
python evaluate_rag.py
๐ณ Docker Deployment
Build and run locally:
docker build -t developerdocs-rag
docker run -p 7860:7860 --name developerdocs-rag-container developerdocs-rag
Deploy to HuggingFace Spaces:
- Create a new Space on HuggingFace
- Enable Docker SDK
- Push this repository
- Add
HF_TOKENas a Space secret - Deploy automatically
๐งช Testing
# Run all tests
# Test chunking strategy
pytest test_chunking.py -v
# Test retrieval quality
python test_retrieval.py
๐ Performance Benchmarks
On HuggingFace Spaces (free tier):
- Query latency: ~2-3 seconds
- Vector DB size: ~150MB (FastAPI docs)
- Memory usage: ~800MB
- Concurrent users: 5-10
๐ ๏ธ Technology Stack
| Component | Technology | Why? |
|---|---|---|
| Embeddings | sentence-transformers/all-MiniLM-L6-v2 |
Fast, lightweight, good quality |
| Vector DB | ChromaDB | Easy setup, persistent storage |
| LLM | HuggingFace Inference API (Mistral-7B) | Free tier, good code understanding |
| Framework | LangChain | Industry standard, modular |
| UI | Gradio | Rapid prototyping, HF integration |
| Deployment | Docker + HF Spaces | Free, scalable, shareable |
๐ฎ Future Enhancements
- Multi-documentation support (React, Django, etc.)
- Conversation memory for follow-up questions
- Advanced retrieval (HyDE, Multi-Query)
- User feedback loop for continuous improvement
- Analytics dashboard for query patterns
๐ License
MIT License - feel free to use for your portfolio!
๐ค Contributing
This is a portfolio project, but suggestions are welcome via issues.
๐ง Contact
Built by Aishwarya as a portfolio demonstration of production RAG systems.
- Portfolio: https://aishwarya30998.github.io/projects.html
- LinkedIn: https://www.linkedin.com/in/aishwarya-pentyala/
โญ If this helped you understand production RAG, give it a star!