Spaces:

lvvignesh2122
/

Gemini-Rag-Fastapi-Pro

Sleeping

App Files Files Community

Gemini-Rag-Fastapi-Pro / README.md

lvvignesh2122

Update README.md

06ee524 unverified about 2 months ago

preview code

raw

history blame

4.24 kB

	📄 Gemini RAG Backend System (FastAPI)

	Production-grade Retrieval-Augmented Generation (RAG) backend built with FastAPI, FAISS (ANN), and Google Gemini — featuring hybrid retrieval, HNSW indexing, cross-encoder reranking, evaluation logging, and analytics.

	This repository demonstrates how modern AI backend systems are actually built in industry.

	🚀 What This Project Is

	This is a full RAG backend system that:

	Ingests large PDF/TXT documents

	Builds vector indexes with Approximate Nearest Neighbor (ANN) search

	Answers questions using grounded LLM responses

	Tracks confidence, known/unknown answers, and usage analytics

	Supports production constraints (file limits, caching, logging)

	The project evolved from RAG v1 → RAG v2, adding real-world scalability and observability.

	✨ Key Features (RAG v2)

	📥 Document Ingestion

	Upload PDF and TXT files

	Sentence-aware chunking with overlap

	Page-level metadata for citations

	🔍 Retrieval (Hybrid + ANN)

	FAISS HNSW ANN index for scalable similarity search

	Cosine similarity via normalized embeddings

	Keyword boosting for lexical relevance

	🧠 Reranking (Quality Boost)

	Cross-Encoder (ms-marco-MiniLM) reranking

	Improves relevance beyond raw vector similarity

	Mimics production search stacks (retrieve → rerank)

	🤖 LLM Generation

	Google Gemini 2.5 Flash

	Strict grounding: answers only from retrieved context

	Honest fallback: "I don't know" when unsupported

	📊 Evaluation & Monitoring

	Logs every query:

	retrieved chunk count

	confidence score

	known vs unknown answers

	JSONL logs for offline analysis

	Built-in analytics dashboard

	📈 Analytics Dashboard

	Total queries

	Knowledge rate

	Average confidence

	Unknown query tracking

	Recent query history

	Dark / Light mode UI

	🛡️ Production Safeguards

	File upload size limits (configurable)

	API quota handling

	Caching to reduce LLM calls

	Clean error handling

	Persistent vector store


	🏗️ System Architecture


	Frontend (HTML / JS)
	↓

	FastAPI Backend
	↓

	Document Ingestion (PDF / TXT)
	↓

	Sentence Chunking + Metadata
	↓

	Embeddings (SentenceTransformers)
	↓

	FAISS ANN Index (HNSW)
	↓

	Hybrid Retrieval (Vector + Keyword)
	↓

	Cross-Encoder Reranking
	↓

	Prompt Assembly
	↓

	Google Gemini LLM
	↓

	Answer + Confidence + Citations
	↓

	Evaluation Logging + Analytics



	🧠 Core Concepts Demonstrated

	Retrieval-Augmented Generation (RAG)

	Why pure LLMs hallucinate

	How grounding fixes factual accuracy

	Vector search vs keyword search

	Hybrid retrieval strategies

	Approximate Nearest Neighbor (ANN)

	Why brute-force search fails at scale

	HNSW indexing for fast similarity search

	efConstruction vs efSearch trade-offs

	Reranking

	Why top-K vectors ≠ best answers

	Cross-encoder reranking for relevance

	Industry-standard retrieval pipelines

	Evaluation & Observability

	Measuring known vs unknown

	Confidence as a heuristic, not truth

	Logging for iterative improvement

	Analytics-driven RAG tuning

	Real Backend Engineering

	API limits & retries

	Persistent storage

	Clean Git hygiene

	Incremental system evolution


	🛠️ Tech Stack

	Backend

	Python

	FastAPI

	FAISS (HNSW ANN)

	SentenceTransformers

	Cross-Encoder (MS MARCO)

	Google Gemini API

	PyPDF

	python-dotenv

	Frontend

	HTML

	CSS

	Vanilla JavaScript (Fetch API)

	Tooling & Platform

	VS Code

	Git & GitHub

	Docker

	Hugging Face Spaces (deployment)

	Virtual Environments (venv)



	⚙️ Setup & Run Locally

	1️⃣ Clone Repository

	git clone https://github.com/LVVignesh/gemini-rag-fastapi.git

	cd gemini-rag-fastapi

	2️⃣ Create Virtual Environment

	python -m venv venv

	venv\Scripts\activate

	3️⃣ Install Dependencies

	pip install -r requirements.txt

	4️⃣ Configure Environment Variables

	GEMINI_API_KEY=your_api_key_here

	5️⃣ Run Server

	uvicorn main:app --reload



	⚠️ Known Limitations

	Scanned/image-only PDFs require OCR (not included)

	Confidence score is heuristic

	Very large corpora may require:

	batch ingestion

	sharding

	background workers



	🚀 Live Demo

	👉 Hugging Face Spaces
	https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro

	📜 License

	MIT License