--- title: DeveloperDocs RAG emoji: ๐Ÿง  colorFrom: blue colorTo: green sdk: docker app_file: app.py pinned: false --- > Production-grade RAG system that answers questions using official techstack documentation (eg:fastapi) [![Deployed on HuggingFace](https://img.shields.io/badge/๐Ÿค—-HuggingFace%20Spaces-blue)](https://huggingface.co/spaces) [![Docker](https://img.shields.io/badge/Docker-Ready-2496ED?logo=docker&logoColor=white)](https://www.docker.com/) [![Python 3.10+](https://img.shields.io/badge/Python-3.10+-3776AB?logo=python&logoColor=white)](https://www.python.org/) ## ๐ŸŽฏ What This Project Demonstrates This is a **production-style RAG (Retrieval-Augmented Generation)** system that showcases: - โœ… **Professional documentation ingestion pipeline** with chunking strategies - โœ… **Semantic search** using vector embeddings (ChromaDB) - โœ… **Source attribution** with clickable citations - โœ… **RAG evaluation metrics** (RAGAS framework) - โœ… **Dockerized deployment** ready for cloud platforms - โœ… **Production-grade error handling** and logging ## ๐Ÿ—๏ธ Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ User โ”‚ โ”‚ Question โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 1. Query Embedding โ”‚ โ”‚ (sentence-transformers) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 2. Vector Search (ChromaDB) โ”‚ โ”‚ - Top 5 relevant chunks โ”‚ โ”‚ - Metadata: source, section โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 3. Context Assembly โ”‚ โ”‚ - Format chunks โ”‚ โ”‚ - Add instructions โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 4. LLM Generation (HF Inference) โ”‚ โ”‚ - Answer with citations โ”‚ โ”‚ - Code examples preserved โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ 5. Response + Source Links โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ### Local Setup ```bash # Clone the repository git clone https://github.com/aishwarya30998/DeveloperDocs-AI-Copilot-RAG.git cd DeveloperDocs-AI-Copilot-RAG # Create virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # create .env and add your HF_TOKEN # Run the application python app.py ``` Visit `http://localhost:7860` in your browser. ## ๐Ÿ“ฆ Project Structure ``` fastapi-docs-copilot/ โ”œโ”€โ”€ app.py # Gradio UI application โ”œโ”€โ”€ Dockerfile # Container configuration โ”œโ”€โ”€ docker-compose.yml # Local container orchestration โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ .env.example # Environment variables template โ”‚ โ”œโ”€โ”€ src/ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ config.py # Configuration management โ”‚ โ”œโ”€โ”€ chunking.py # Document chunking strategies โ”‚ โ”œโ”€โ”€ embeddings.py # Embedding generation โ”‚ โ”œโ”€โ”€ retriever.py # Vector search logic โ”‚ โ”œโ”€โ”€ rag_pipeline.py # Main RAG orchestration โ”‚ โ””โ”€โ”€ prompts.py # Prompt templates โ”‚ โ”œโ”€โ”€ scripts/ โ”‚ โ”œโ”€โ”€ ingest_docs.py # Documentation ingestion โ”‚ โ”œโ”€โ”€ evaluate_rag.py # RAG metrics evaluation โ”‚ โ””โ”€โ”€ test_retrieval.py # Test retrieval quality โ”‚ โ”œโ”€โ”€ data/ โ”‚ โ”œโ”€โ”€ raw/ # Downloaded documentation โ”‚ โ”œโ”€โ”€ processed/ # Chunked documents โ”‚ โ””โ”€โ”€ vectordb/ # ChromaDB storage โ”‚ โ”œโ”€โ”€ tests/ โ”‚ โ”œโ”€โ”€ test_chunking.py โ”‚ โ”œโ”€โ”€ test_retriever.py โ”‚ โ””โ”€โ”€ test_rag_pipeline.py โ”‚ โ””โ”€โ”€ evals/ โ”œโ”€โ”€ test_queries.json # Evaluation dataset โ””โ”€โ”€ results/ # Evaluation outputs ``` ## ๐ŸŽฏ Key Features ### 1. Smart Chunking - **Semantic chunking** with overlap for context preservation - **Metadata enrichment** (section titles, URLs, code blocks) - **Configurable chunk sizes** (300-800 tokens) ### 2. Retrieval Quality - **Hybrid search** (semantic + keyword) - **Reranking** for improved relevance - **Source attribution** with confidence scores ### 3. Answer Generation - **Code-aware formatting** (preserves indentation) - **Inline citations** with source links - **Fallback handling** for low-confidence results ### 4. Production Features - **Health check endpoint** (`/health`) - **Query logging** for analytics - **Rate limiting** (basic throttling) - **Error recovery** with graceful degradation ## ๐Ÿ“Š RAG Evaluation We use **RAGAS** framework to measure: | Metric | Description | Target Score | | --------------------- | --------------------------- | ------------ | | **Faithfulness** | Answer accuracy vs. context | > 0.8 | | **Answer Relevancy** | Response relevance to query | > 0.7 | | **Context Precision** | Retrieval accuracy | > 0.75 | | **Context Recall** | Context completeness | > 0.8 | Run evaluations: ```bash python evaluate_rag.py ``` ## ๐Ÿณ Docker Deployment ### Build and run locally: ```bash docker build -t developerdocs-rag docker run -p 7860:7860 --name developerdocs-rag-container developerdocs-rag ``` ### Deploy to HuggingFace Spaces: 1. Create a new Space on HuggingFace 2. Enable Docker SDK 3. Push this repository 4. Add `HF_TOKEN` as a Space secret 5. Deploy automatically ## ๐Ÿงช Testing ```bash # Run all tests # Test chunking strategy pytest test_chunking.py -v # Test retrieval quality python test_retrieval.py ``` ## ๐Ÿ“ˆ Performance Benchmarks On HuggingFace Spaces (free tier): - **Query latency**: ~2-3 seconds - **Vector DB size**: ~150MB (FastAPI docs) - **Memory usage**: ~800MB - **Concurrent users**: 5-10 ## ๐Ÿ› ๏ธ Technology Stack | Component | Technology | Why? | | -------------- | ---------------------------------------- | ---------------------------------- | | **Embeddings** | `sentence-transformers/all-MiniLM-L6-v2` | Fast, lightweight, good quality | | **Vector DB** | ChromaDB | Easy setup, persistent storage | | **LLM** | HuggingFace Inference API (Mistral-7B) | Free tier, good code understanding | | **Framework** | LangChain | Industry standard, modular | | **UI** | Gradio | Rapid prototyping, HF integration | | **Deployment** | Docker + HF Spaces | Free, scalable, shareable | ## ๐Ÿ”ฎ Future Enhancements - [ ] Multi-documentation support (React, Django, etc.) - [ ] Conversation memory for follow-up questions - [ ] Advanced retrieval (HyDE, Multi-Query) - [ ] User feedback loop for continuous improvement - [ ] Analytics dashboard for query patterns ## ๐Ÿ“ License MIT License - feel free to use for your portfolio! ## ๐Ÿค Contributing This is a portfolio project, but suggestions are welcome via issues. ## ๐Ÿ“ง Contact Built by Aishwarya as a portfolio demonstration of production RAG systems. - Portfolio: https://aishwarya30998.github.io/projects.html - LinkedIn: https://www.linkedin.com/in/aishwarya-pentyala/ --- โญ If this helped you understand production RAG, give it a star!