π Gemini RAG Backend System (FastAPI)
Production-grade Retrieval-Augmented Generation (RAG) backend built with FastAPI, FAISS (ANN), and Google Gemini β featuring hybrid retrieval, HNSW indexing, cross-encoder reranking, evaluation logging, and analytics.
This repository demonstrates how modern AI backend systems are actually built in industry.
π What This Project Is
This is a full RAG backend system that:
Ingests large PDF/TXT documents
Builds vector indexes with Approximate Nearest Neighbor (ANN) search
Answers questions using grounded LLM responses
Tracks confidence, known/unknown answers, and usage analytics
Supports production constraints (file limits, caching, logging)
The project evolved from RAG v1 β RAG v2, adding real-world scalability and observability.
β¨ Key Features (RAG v2)
π₯ Document Ingestion
Upload PDF and TXT files
Sentence-aware chunking with overlap
Page-level metadata for citations
π Retrieval (Hybrid + ANN)
FAISS HNSW ANN index for scalable similarity search
Cosine similarity via normalized embeddings
Keyword boosting for lexical relevance
π§ Reranking (Quality Boost)
Cross-Encoder (ms-marco-MiniLM) reranking
Improves relevance beyond raw vector similarity
Mimics production search stacks (retrieve β rerank)
π€ LLM Generation
Google Gemini 2.5 Flash
Strict grounding: answers only from retrieved context
Honest fallback: "I don't know" when unsupported
π Evaluation & Monitoring
Logs every query:
retrieved chunk count
confidence score
known vs unknown answers
JSONL logs for offline analysis
Built-in analytics dashboard
π Analytics Dashboard
Total queries
Knowledge rate
Average confidence
Unknown query tracking
Recent query history
Dark / Light mode UI
π‘οΈ Production Safeguards
File upload size limits (configurable)
API quota handling
Caching to reduce LLM calls
Clean error handling
Persistent vector store
ποΈ System Architecture
Frontend (HTML / JS) β
FastAPI Backend β
Document Ingestion (PDF / TXT) β
Sentence Chunking + Metadata β
Embeddings (SentenceTransformers) β
FAISS ANN Index (HNSW) β
Hybrid Retrieval (Vector + Keyword) β
Cross-Encoder Reranking β
Prompt Assembly β
Google Gemini LLM β
Answer + Confidence + Citations β
Evaluation Logging + Analytics
π§ Core Concepts Demonstrated
Retrieval-Augmented Generation (RAG)
Why pure LLMs hallucinate
How grounding fixes factual accuracy
Vector search vs keyword search
Hybrid retrieval strategies
Approximate Nearest Neighbor (ANN)
Why brute-force search fails at scale
HNSW indexing for fast similarity search
efConstruction vs efSearch trade-offs
Reranking
Why top-K vectors β best answers
Cross-encoder reranking for relevance
Industry-standard retrieval pipelines
Evaluation & Observability
Measuring known vs unknown
Confidence as a heuristic, not truth
Logging for iterative improvement
Analytics-driven RAG tuning
Real Backend Engineering
API limits & retries
Persistent storage
Clean Git hygiene
Incremental system evolution
π οΈ Tech Stack
Backend
Python
FastAPI
FAISS (HNSW ANN)
SentenceTransformers
Cross-Encoder (MS MARCO)
Google Gemini API
PyPDF
python-dotenv
Frontend
HTML
CSS
Vanilla JavaScript (Fetch API)
Tooling & Platform
VS Code
Git & GitHub
Docker
Hugging Face Spaces (deployment)
Virtual Environments (venv)
βοΈ Setup & Run Locally
1οΈβ£ Clone Repository
git clone https://github.com/LVVignesh/gemini-rag-fastapi.git
cd gemini-rag-fastapi
2οΈβ£ Create Virtual Environment
python -m venv venv
venv\Scripts\activate
3οΈβ£ Install Dependencies
pip install -r requirements.txt
4οΈβ£ Configure Environment Variables
GEMINI_API_KEY=your_api_key_here
5οΈβ£ Run Server
uvicorn main:app --reload
β οΈ Known Limitations
Scanned/image-only PDFs require OCR (not included)
Confidence score is heuristic
Very large corpora may require:
batch ingestion
sharding
background workers
π Live Demo
π Hugging Face Spaces https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro
π License
MIT License