| π Gemini RAG Backend System (FastAPI) | |
| Production-grade Retrieval-Augmented Generation (RAG) backend built with FastAPI, FAISS (ANN), and Google Gemini β featuring hybrid retrieval, HNSW indexing, cross-encoder reranking, evaluation logging, and analytics. | |
| This repository demonstrates how modern AI backend systems are actually built in industry. | |
| π What This Project Is | |
| This is a full RAG backend system that: | |
| Ingests large PDF/TXT documents | |
| Builds vector indexes with Approximate Nearest Neighbor (ANN) search | |
| Answers questions using grounded LLM responses | |
| Tracks confidence, known/unknown answers, and usage analytics | |
| Supports production constraints (file limits, caching, logging) | |
| The project evolved from RAG v1 β RAG v2, adding real-world scalability and observability. | |
| β¨ Key Features (RAG v2) | |
| π₯ Document Ingestion | |
| Upload PDF and TXT files | |
| Sentence-aware chunking with overlap | |
| Page-level metadata for citations | |
| π Retrieval (Hybrid + ANN) | |
| FAISS HNSW ANN index for scalable similarity search | |
| Cosine similarity via normalized embeddings | |
| Keyword boosting for lexical relevance | |
| π§ Reranking (Quality Boost) | |
| Cross-Encoder (ms-marco-MiniLM) reranking | |
| Improves relevance beyond raw vector similarity | |
| Mimics production search stacks (retrieve β rerank) | |
| π€ LLM Generation | |
| Google Gemini 2.5 Flash | |
| Strict grounding: answers only from retrieved context | |
| Honest fallback: "I don't know" when unsupported | |
| π Evaluation & Monitoring | |
| Logs every query: | |
| retrieved chunk count | |
| confidence score | |
| known vs unknown answers | |
| JSONL logs for offline analysis | |
| Built-in analytics dashboard | |
| π Analytics Dashboard | |
| Total queries | |
| Knowledge rate | |
| Average confidence | |
| Unknown query tracking | |
| Recent query history | |
| Dark / Light mode UI | |
| π‘οΈ Production Safeguards | |
| File upload size limits (configurable) | |
| API quota handling | |
| Caching to reduce LLM calls | |
| Clean error handling | |
| Persistent vector store | |
| ποΈ System Architecture | |
| Frontend (HTML / JS) | |
| β | |
| FastAPI Backend | |
| β | |
| Document Ingestion (PDF / TXT) | |
| β | |
| Sentence Chunking + Metadata | |
| β | |
| Embeddings (SentenceTransformers) | |
| β | |
| FAISS ANN Index (HNSW) | |
| β | |
| Hybrid Retrieval (Vector + Keyword) | |
| β | |
| Cross-Encoder Reranking | |
| β | |
| Prompt Assembly | |
| β | |
| Google Gemini LLM | |
| β | |
| Answer + Confidence + Citations | |
| β | |
| Evaluation Logging + Analytics | |
| π§ Core Concepts Demonstrated | |
| Retrieval-Augmented Generation (RAG) | |
| Why pure LLMs hallucinate | |
| How grounding fixes factual accuracy | |
| Vector search vs keyword search | |
| Hybrid retrieval strategies | |
| Approximate Nearest Neighbor (ANN) | |
| Why brute-force search fails at scale | |
| HNSW indexing for fast similarity search | |
| efConstruction vs efSearch trade-offs | |
| Reranking | |
| Why top-K vectors β best answers | |
| Cross-encoder reranking for relevance | |
| Industry-standard retrieval pipelines | |
| Evaluation & Observability | |
| Measuring known vs unknown | |
| Confidence as a heuristic, not truth | |
| Logging for iterative improvement | |
| Analytics-driven RAG tuning | |
| Real Backend Engineering | |
| API limits & retries | |
| Persistent storage | |
| Clean Git hygiene | |
| Incremental system evolution | |
| π οΈ Tech Stack | |
| Backend | |
| Python | |
| FastAPI | |
| FAISS (HNSW ANN) | |
| SentenceTransformers | |
| Cross-Encoder (MS MARCO) | |
| Google Gemini API | |
| PyPDF | |
| python-dotenv | |
| Frontend | |
| HTML | |
| CSS | |
| Vanilla JavaScript (Fetch API) | |
| Tooling & Platform | |
| VS Code | |
| Git & GitHub | |
| Docker | |
| Hugging Face Spaces (deployment) | |
| Virtual Environments (venv) | |
| βοΈ Setup & Run Locally | |
| 1οΈβ£ Clone Repository | |
| git clone https://github.com/LVVignesh/gemini-rag-fastapi.git | |
| cd gemini-rag-fastapi | |
| 2οΈβ£ Create Virtual Environment | |
| python -m venv venv | |
| venv\Scripts\activate | |
| 3οΈβ£ Install Dependencies | |
| pip install -r requirements.txt | |
| 4οΈβ£ Configure Environment Variables | |
| GEMINI_API_KEY=your_api_key_here | |
| 5οΈβ£ Run Server | |
| uvicorn main:app --reload | |
| β οΈ Known Limitations | |
| Scanned/image-only PDFs require OCR (not included) | |
| Confidence score is heuristic | |
| Very large corpora may require: | |
| batch ingestion | |
| sharding | |
| background workers | |
| π Live Demo | |
| π Hugging Face Spaces | |
| https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro | |
| π License | |
| MIT License | |