lvvignesh2122 commited on
Commit
a7badf3
·
unverified ·
1 Parent(s): 4af310b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -109
README.md CHANGED
@@ -1,116 +1,171 @@
1
- 📄 Gemini RAG Assistant (FastAPI)
2
 
3
- A production-style Retrieval-Augmented Generation (RAG) application built with FastAPI, Google Gemini, and FAISS, capable of answering questions and generating summaries from uploaded documents (PDF/TXT) with grounded responses, citations, and confidence scoring.
4
 
5
- This project evolved iteratively from a simple FastAPI API into a robust, end-to-end AI system, covering real-world challenges like PDF ingestion, vector search, LLM rate limits, and Git hygiene.
6
 
7
- 🚀 Features
8
 
9
- 📤 Upload PDF and TXT documents
10
 
11
- 🔍 Retrieval-Augmented Q&A using FAISS
12
 
13
- 🧠 Grounded answers powered by Google Gemini
14
 
15
- 📝 Document summarization using the same RAG pipeline
16
 
17
- 📚 Page-level citations for transparency
18
 
19
- 📊 Confidence scoring based on retrieval strength
20
 
21
- Async FastAPI backend (non-blocking I/O)
22
 
23
- 🧪 Mock mode for UI testing when API quota is exhausted
 
24
 
25
- 🧹 Clean Git history with generated files ignored
26
 
27
- 🏗️ Architecture Overview
28
- Frontend (HTML + JS)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  FastAPI Backend
31
 
32
  Document Ingestion (PDF / TXT)
33
 
 
 
34
  Embeddings (SentenceTransformers)
35
 
36
- FAISS Vector Store
37
 
38
- Retriever (Top-K Similarity Search)
 
 
39
 
40
  Prompt Assembly
41
 
42
  Google Gemini LLM
43
 
44
- Grounded Response + Citations + Confidence
45
-
46
- 🧠 Key Concepts Learned
47
- 1. FastAPI Fundamentals
48
-
49
- GET and POST endpoints
50
-
51
- Request/response lifecycle
52
-
53
- Input validation using Pydantic models
54
-
55
- Async endpoints for non-blocking LLM calls
56
-
57
- 2. Real LLM Integration
58
-
59
- Secure API key handling via environment variables
60
-
61
- Structured prompts for strict input/output control
62
-
63
- Handling rate limits and safety-filtered responses
64
-
65
- Graceful error handling and fallbacks
66
-
67
- 3. Retrieval-Augmented Generation (RAG)
68
 
69
- Why LLMs alone are unreliable for factual answers
 
70
 
71
- Converting documents into embeddings
72
 
73
- Similarity search using FAISS
74
 
75
- Injecting retrieved context into prompts for grounded answers
76
 
77
- 4. Document Ingestion Reality
78
 
79
- Not all PDFs are text-based
80
 
81
- Scanned/screenshot PDFs require OCR
82
 
83
- RAG quality depends on data quality
84
 
85
- Silent failures often come from missing extractable text
86
 
87
- 5. Summarization vs Q&A
88
 
89
- Summarization is not the same as question answering
90
 
91
- Naive summarization can fail due to token limits
92
 
93
- Simpler pipelines are often more stable for small documents
94
 
95
- 6. Confidence & Trust
96
 
97
- Confidence score reflects retrieval strength, not “truth”
98
 
99
- Honest responses (“I don’t know”) improve trust
100
 
101
- Citations are critical for verification
102
 
103
- 7. Engineering Best Practices
104
 
105
- Start with a stable baseline before adding complexity
106
 
107
- Mock LLM responses during development
108
 
109
- Handle API quotas and rate limits explicitly
110
 
111
- Keep generated files out of Git (.gitignore)
112
 
113
- Resolve Git branch divergence safely using rebase
114
 
115
  🛠️ Tech Stack
116
  Backend
@@ -119,10 +174,12 @@ Python
119
 
120
  FastAPI
121
 
122
- FAISS
123
 
124
  SentenceTransformers
125
 
 
 
126
  Google Gemini API
127
 
128
  PyPDF
@@ -137,73 +194,49 @@ CSS
137
 
138
  Vanilla JavaScript (Fetch API)
139
 
140
- Platform & Tooling
141
 
142
  VS Code
143
 
144
  Git & GitHub
145
 
 
 
146
  Hugging Face Spaces (deployment)
147
 
148
  Virtual Environments (venv)
149
 
150
- ⚙️ Setup Instructions
151
- 1️⃣ Clone the repository
152
- git clone https://github.com/your-username/your-repo-name.git
153
- cd your-repo-name
154
 
155
- 2️⃣ Create & activate virtual environment
 
 
156
  python -m venv venv
157
- source venv/bin/activate # Linux/Mac
158
- venv\Scripts\activate # Windows
159
-
160
- 3️⃣ Install dependencies
161
  pip install -r requirements.txt
162
-
163
- 4️⃣ Set environment variables
164
-
165
- Create a .env file:
166
-
167
  GEMINI_API_KEY=your_api_key_here
168
-
169
- 5️⃣ Run the server
170
  uvicorn main:app --reload
171
 
172
-
173
- Open in browser:
174
-
175
- http://127.0.0.1:8000
176
-
177
- Test and use my RAG project on Hugging face : https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro
178
-
179
- 🧪 Mock Mode (Development)
180
-
181
- To test the UI without consuming Gemini API quota:
182
-
183
- Enable mock responses in main.py
184
-
185
- Allows frontend and flow testing without LLM calls
186
-
187
- This mirrors real production workflows.
188
-
189
  ⚠️ Known Limitations
190
 
191
- Scanned/image-based PDFs are not supported (OCR required)
 
 
192
 
193
- Confidence score is heuristic, not a guarantee of correctness
194
 
195
- Large documents may require map-reduce summarization (future work)
196
 
197
- 🔮 Future Improvements
198
 
199
- OCR integration for scanned PDFs
200
 
201
- Chunk-based retrieval for large documents
202
 
203
- Streaming LLM responses
 
204
 
205
- Evaluation metrics for answer quality
206
 
207
- Multi-document cross-referencing
208
 
209
- Auth & user-specific document stores
 
1
+ 📄 Gemini RAG Backend System (FastAPI)
2
 
3
+ Production-grade Retrieval-Augmented Generation (RAG) backend built with FastAPI, FAISS (ANN), and Google Gemini featuring hybrid retrieval, HNSW indexing, cross-encoder reranking, evaluation logging, and analytics.
4
 
5
+ This repository demonstrates how modern AI backend systems are actually built in industry, not toy demos.
6
 
7
+ 🚀 What This Project Is
8
 
9
+ This is a full RAG backend system that:
10
 
11
+ Ingests large PDF/TXT documents
12
 
13
+ Builds vector indexes with Approximate Nearest Neighbor (ANN) search
14
 
15
+ Answers questions using grounded LLM responses
16
 
17
+ Tracks confidence, known/unknown answers, and usage analytics
18
 
19
+ Supports production constraints (file limits, caching, logging)
20
 
21
+ The project evolved from RAG v1 → RAG v2, adding real-world scalability and observability.
22
 
23
+ Key Features (RAG v2)
24
+ 📥 Document Ingestion
25
 
26
+ Upload PDF and TXT files
27
 
28
+ Sentence-aware chunking with overlap
29
+
30
+ Page-level metadata for citations
31
+
32
+ 🔍 Retrieval (Hybrid + ANN)
33
+
34
+ FAISS HNSW ANN index for scalable similarity search
35
+
36
+ Cosine similarity via normalized embeddings
37
+
38
+ Keyword boosting for lexical relevance
39
+
40
+ 🧠 Reranking (Quality Boost)
41
+
42
+ Cross-Encoder (ms-marco-MiniLM) reranking
43
+
44
+ Improves relevance beyond raw vector similarity
45
+
46
+ Mimics production search stacks (retrieve → rerank)
47
+
48
+ 🤖 LLM Generation
49
+
50
+ Google Gemini 2.5 Flash
51
+
52
+ Strict grounding: answers only from retrieved context
53
+
54
+ Honest fallback: "I don't know" when unsupported
55
+
56
+ 📊 Evaluation & Monitoring
57
+
58
+ Logs every query:
59
+
60
+ retrieved chunk count
61
+
62
+ confidence score
63
+
64
+ known vs unknown answers
65
+
66
+ JSONL logs for offline analysis
67
+
68
+ Built-in analytics dashboard
69
+
70
+ 📈 Analytics Dashboard
71
+
72
+ Total queries
73
+
74
+ Knowledge rate
75
+
76
+ Average confidence
77
+
78
+ Unknown query tracking
79
+
80
+ Recent query history
81
+
82
+ Dark / Light mode UI
83
+
84
+ 🛡️ Production Safeguards
85
+
86
+ File upload size limits (configurable)
87
+
88
+ API quota handling
89
+
90
+ Caching to reduce LLM calls
91
+
92
+ Clean error handling
93
+
94
+ Persistent vector store
95
+
96
+
97
+ 🏗️ System Architecture
98
+
99
+ Frontend (HTML / JS)
100
 
101
  FastAPI Backend
102
 
103
  Document Ingestion (PDF / TXT)
104
 
105
+ Sentence Chunking + Metadata
106
+
107
  Embeddings (SentenceTransformers)
108
 
109
+ FAISS ANN Index (HNSW)
110
 
111
+ Hybrid Retrieval (Vector + Keyword)
112
+
113
+ Cross-Encoder Reranking
114
 
115
  Prompt Assembly
116
 
117
  Google Gemini LLM
118
 
119
+ Answer + Confidence + Citations
120
+
121
+ Evaluation Logging + Analytics
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
+ 🧠 Core Concepts Demonstrated
124
+ Retrieval-Augmented Generation (RAG)
125
 
126
+ Why pure LLMs hallucinate
127
 
128
+ How grounding fixes factual accuracy
129
 
130
+ Vector search vs keyword search
131
 
132
+ Hybrid retrieval strategies
133
 
134
+ Approximate Nearest Neighbor (ANN)
135
 
136
+ Why brute-force search fails at scale
137
 
138
+ HNSW indexing for fast similarity search
139
 
140
+ efConstruction vs efSearch trade-offs
141
 
142
+ Reranking
143
 
144
+ Why top-K vectors best answers
145
 
146
+ Cross-encoder reranking for relevance
147
 
148
+ Industry-standard retrieval pipelines
149
 
150
+ Evaluation & Observability
151
 
152
+ Measuring known vs unknown
153
 
154
+ Confidence as a heuristic, not truth
155
 
156
+ Logging for iterative improvement
157
 
158
+ Analytics-driven RAG tuning
159
 
160
+ Real Backend Engineering
161
 
162
+ API limits & retries
163
 
164
+ Persistent storage
165
 
166
+ Clean Git hygiene
167
 
168
+ Incremental system evolution
169
 
170
  🛠️ Tech Stack
171
  Backend
 
174
 
175
  FastAPI
176
 
177
+ FAISS (HNSW ANN)
178
 
179
  SentenceTransformers
180
 
181
+ Cross-Encoder (MS MARCO)
182
+
183
  Google Gemini API
184
 
185
  PyPDF
 
194
 
195
  Vanilla JavaScript (Fetch API)
196
 
197
+ Tooling & Platform
198
 
199
  VS Code
200
 
201
  Git & GitHub
202
 
203
+ Docker
204
+
205
  Hugging Face Spaces (deployment)
206
 
207
  Virtual Environments (venv)
208
 
209
+ ⚙️ Setup & Run Locally
 
 
 
210
 
211
+ 1️⃣ Clone Repository
212
+ git clone https://github.com/LVVignesh/gemini-rag-fastapi.git
213
+ cd gemini-rag-fastapi
214
  python -m venv venv
215
+ venv\Scripts\activate
 
 
 
216
  pip install -r requirements.txt
 
 
 
 
 
217
  GEMINI_API_KEY=your_api_key_here
 
 
218
  uvicorn main:app --reload
219
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
220
  ⚠️ Known Limitations
221
 
222
+ Scanned/image-only PDFs require OCR (not included)
223
+
224
+ Confidence score is heuristic
225
 
226
+ Very large corpora may require:
227
 
228
+ batch ingestion
229
 
230
+ sharding
231
 
232
+ background workers
233
 
234
+ 🚀 Live Demo
235
 
236
+ 👉 Hugging Face Spaces
237
+ https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro
238
 
239
+ 📜 License
240
 
241
+ MIT License
242