File size: 4,236 Bytes
a7badf3 b435981 a7badf3 b435981 987cf50 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 06ee524 a7badf3 b435981 a7badf3 b435981 a7badf3 987cf50 a7badf3 b435981 987cf50 b435981 987cf50 b435981 987cf50 a7badf3 987cf50 b435981 987cf50 a7badf3 b435981 987cf50 a7badf3 987cf50 a7badf3 b435981 987cf50 b435981 987cf50 b435981 987cf50 a7badf3 987cf50 a7badf3 b435981 987cf50 06ee524 a7badf3 06ee524 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 06ee524 b435981 987cf50 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 06ee524 a7badf3 b435981 a7badf3 987cf50 a7badf3 987cf50 a7badf3 987cf50 b435981 987cf50 a7badf3 987cf50 b435981 987cf50 b435981 987cf50 b435981 987cf50 06ee524 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 06ee524 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 a7badf3 b435981 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 | π Gemini RAG Backend System (FastAPI)
Production-grade Retrieval-Augmented Generation (RAG) backend built with FastAPI, FAISS (ANN), and Google Gemini β featuring hybrid retrieval, HNSW indexing, cross-encoder reranking, evaluation logging, and analytics.
This repository demonstrates how modern AI backend systems are actually built in industry.
π What This Project Is
This is a full RAG backend system that:
Ingests large PDF/TXT documents
Builds vector indexes with Approximate Nearest Neighbor (ANN) search
Answers questions using grounded LLM responses
Tracks confidence, known/unknown answers, and usage analytics
Supports production constraints (file limits, caching, logging)
The project evolved from RAG v1 β RAG v2, adding real-world scalability and observability.
β¨ Key Features (RAG v2)
π₯ Document Ingestion
Upload PDF and TXT files
Sentence-aware chunking with overlap
Page-level metadata for citations
π Retrieval (Hybrid + ANN)
FAISS HNSW ANN index for scalable similarity search
Cosine similarity via normalized embeddings
Keyword boosting for lexical relevance
π§ Reranking (Quality Boost)
Cross-Encoder (ms-marco-MiniLM) reranking
Improves relevance beyond raw vector similarity
Mimics production search stacks (retrieve β rerank)
π€ LLM Generation
Google Gemini 2.5 Flash
Strict grounding: answers only from retrieved context
Honest fallback: "I don't know" when unsupported
π Evaluation & Monitoring
Logs every query:
retrieved chunk count
confidence score
known vs unknown answers
JSONL logs for offline analysis
Built-in analytics dashboard
π Analytics Dashboard
Total queries
Knowledge rate
Average confidence
Unknown query tracking
Recent query history
Dark / Light mode UI
π‘οΈ Production Safeguards
File upload size limits (configurable)
API quota handling
Caching to reduce LLM calls
Clean error handling
Persistent vector store
ποΈ System Architecture
Frontend (HTML / JS)
β
FastAPI Backend
β
Document Ingestion (PDF / TXT)
β
Sentence Chunking + Metadata
β
Embeddings (SentenceTransformers)
β
FAISS ANN Index (HNSW)
β
Hybrid Retrieval (Vector + Keyword)
β
Cross-Encoder Reranking
β
Prompt Assembly
β
Google Gemini LLM
β
Answer + Confidence + Citations
β
Evaluation Logging + Analytics
π§ Core Concepts Demonstrated
Retrieval-Augmented Generation (RAG)
Why pure LLMs hallucinate
How grounding fixes factual accuracy
Vector search vs keyword search
Hybrid retrieval strategies
Approximate Nearest Neighbor (ANN)
Why brute-force search fails at scale
HNSW indexing for fast similarity search
efConstruction vs efSearch trade-offs
Reranking
Why top-K vectors β best answers
Cross-encoder reranking for relevance
Industry-standard retrieval pipelines
Evaluation & Observability
Measuring known vs unknown
Confidence as a heuristic, not truth
Logging for iterative improvement
Analytics-driven RAG tuning
Real Backend Engineering
API limits & retries
Persistent storage
Clean Git hygiene
Incremental system evolution
π οΈ Tech Stack
Backend
Python
FastAPI
FAISS (HNSW ANN)
SentenceTransformers
Cross-Encoder (MS MARCO)
Google Gemini API
PyPDF
python-dotenv
Frontend
HTML
CSS
Vanilla JavaScript (Fetch API)
Tooling & Platform
VS Code
Git & GitHub
Docker
Hugging Face Spaces (deployment)
Virtual Environments (venv)
βοΈ Setup & Run Locally
1οΈβ£ Clone Repository
git clone https://github.com/LVVignesh/gemini-rag-fastapi.git
cd gemini-rag-fastapi
2οΈβ£ Create Virtual Environment
python -m venv venv
venv\Scripts\activate
3οΈβ£ Install Dependencies
pip install -r requirements.txt
4οΈβ£ Configure Environment Variables
GEMINI_API_KEY=your_api_key_here
5οΈβ£ Run Server
uvicorn main:app --reload
β οΈ Known Limitations
Scanned/image-only PDFs require OCR (not included)
Confidence score is heuristic
Very large corpora may require:
batch ingestion
sharding
background workers
π Live Demo
π Hugging Face Spaces
https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro
π License
MIT License
|