DeveloperDocs_RAG / README.md
Aishwarya30998's picture
Add Hugging Face Spaces configuration metadata
753f466
metadata
title: DeveloperDocs RAG
emoji: ๐Ÿง 
colorFrom: blue
colorTo: green
sdk: docker
app_file: app.py
pinned: false

Production-grade RAG system that answers questions using official techstack documentation (eg:fastapi)

Deployed on HuggingFace Docker Python 3.10+

๐ŸŽฏ What This Project Demonstrates

This is a production-style RAG (Retrieval-Augmented Generation) system that showcases:

  • โœ… Professional documentation ingestion pipeline with chunking strategies
  • โœ… Semantic search using vector embeddings (ChromaDB)
  • โœ… Source attribution with clickable citations
  • โœ… RAG evaluation metrics (RAGAS framework)
  • โœ… Dockerized deployment ready for cloud platforms
  • โœ… Production-grade error handling and logging

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   User      โ”‚
โ”‚  Question   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  1. Query Embedding                 โ”‚
โ”‚     (sentence-transformers)         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  2. Vector Search (ChromaDB)        โ”‚
โ”‚     - Top 5 relevant chunks         โ”‚
โ”‚     - Metadata: source, section     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  3. Context Assembly                โ”‚
โ”‚     - Format chunks                 โ”‚
โ”‚     - Add instructions              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  4. LLM Generation (HF Inference)   โ”‚
โ”‚     - Answer with citations         โ”‚
โ”‚     - Code examples preserved       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  5. Response + Source Links         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Local Setup

# Clone the repository
git clone https://github.com/aishwarya30998/DeveloperDocs-AI-Copilot-RAG.git
cd DeveloperDocs-AI-Copilot-RAG

# Create virtual environment
python -m venv venv
source venv/bin/activate
# On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt


# create .env and add your HF_TOKEN


# Run the application
python app.py

Visit http://localhost:7860 in your browser.

๐Ÿ“ฆ Project Structure

fastapi-docs-copilot/
โ”œโ”€โ”€ app.py                      # Gradio UI application
โ”œโ”€โ”€ Dockerfile                  # Container configuration
โ”œโ”€โ”€ docker-compose.yml          # Local container orchestration
โ”œโ”€โ”€ requirements.txt            # Python dependencies
โ”œโ”€โ”€ .env.example               # Environment variables template
โ”‚
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ config.py              # Configuration management
โ”‚   โ”œโ”€โ”€ chunking.py            # Document chunking strategies
โ”‚   โ”œโ”€โ”€ embeddings.py          # Embedding generation
โ”‚   โ”œโ”€โ”€ retriever.py           # Vector search logic
โ”‚   โ”œโ”€โ”€ rag_pipeline.py        # Main RAG orchestration
โ”‚   โ””โ”€โ”€ prompts.py             # Prompt templates
โ”‚
โ”œโ”€โ”€ scripts/
โ”‚   โ”œโ”€โ”€ ingest_docs.py         # Documentation ingestion
โ”‚   โ”œโ”€โ”€ evaluate_rag.py        # RAG metrics evaluation
โ”‚   โ””โ”€โ”€ test_retrieval.py      # Test retrieval quality
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ raw/                   # Downloaded documentation
โ”‚   โ”œโ”€โ”€ processed/             # Chunked documents
โ”‚   โ””โ”€โ”€ vectordb/              # ChromaDB storage
โ”‚
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_chunking.py
โ”‚   โ”œโ”€โ”€ test_retriever.py
โ”‚   โ””โ”€โ”€ test_rag_pipeline.py
โ”‚
โ””โ”€โ”€ evals/
    โ”œโ”€โ”€ test_queries.json      # Evaluation dataset
    โ””โ”€โ”€ results/               # Evaluation outputs

๐ŸŽฏ Key Features

1. Smart Chunking

  • Semantic chunking with overlap for context preservation
  • Metadata enrichment (section titles, URLs, code blocks)
  • Configurable chunk sizes (300-800 tokens)

2. Retrieval Quality

  • Hybrid search (semantic + keyword)
  • Reranking for improved relevance
  • Source attribution with confidence scores

3. Answer Generation

  • Code-aware formatting (preserves indentation)
  • Inline citations with source links
  • Fallback handling for low-confidence results

4. Production Features

  • Health check endpoint (/health)
  • Query logging for analytics
  • Rate limiting (basic throttling)
  • Error recovery with graceful degradation

๐Ÿ“Š RAG Evaluation

We use RAGAS framework to measure:

Metric Description Target Score
Faithfulness Answer accuracy vs. context > 0.8
Answer Relevancy Response relevance to query > 0.7
Context Precision Retrieval accuracy > 0.75
Context Recall Context completeness > 0.8

Run evaluations:

python evaluate_rag.py

๐Ÿณ Docker Deployment

Build and run locally:

docker build -t developerdocs-rag
docker run -p 7860:7860 --name developerdocs-rag-container developerdocs-rag

Deploy to HuggingFace Spaces:

  1. Create a new Space on HuggingFace
  2. Enable Docker SDK
  3. Push this repository
  4. Add HF_TOKEN as a Space secret
  5. Deploy automatically

๐Ÿงช Testing

# Run all tests


# Test chunking strategy
pytest test_chunking.py -v

# Test retrieval quality
python test_retrieval.py

๐Ÿ“ˆ Performance Benchmarks

On HuggingFace Spaces (free tier):

  • Query latency: ~2-3 seconds
  • Vector DB size: ~150MB (FastAPI docs)
  • Memory usage: ~800MB
  • Concurrent users: 5-10

๐Ÿ› ๏ธ Technology Stack

Component Technology Why?
Embeddings sentence-transformers/all-MiniLM-L6-v2 Fast, lightweight, good quality
Vector DB ChromaDB Easy setup, persistent storage
LLM HuggingFace Inference API (Mistral-7B) Free tier, good code understanding
Framework LangChain Industry standard, modular
UI Gradio Rapid prototyping, HF integration
Deployment Docker + HF Spaces Free, scalable, shareable

๐Ÿ”ฎ Future Enhancements

  • Multi-documentation support (React, Django, etc.)
  • Conversation memory for follow-up questions
  • Advanced retrieval (HyDE, Multi-Query)
  • User feedback loop for continuous improvement
  • Analytics dashboard for query patterns

๐Ÿ“ License

MIT License - feel free to use for your portfolio!

๐Ÿค Contributing

This is a portfolio project, but suggestions are welcome via issues.

๐Ÿ“ง Contact

Built by Aishwarya as a portfolio demonstration of production RAG systems.


โญ If this helped you understand production RAG, give it a star!