lvvignesh2122 commited on
Commit
b435981
·
unverified ·
1 Parent(s): 9e36463

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +207 -28
README.md CHANGED
@@ -1,28 +1,207 @@
1
- ---
2
- title: Gemini Rag FastAPI Pro
3
- emoji: 🤖
4
- colorFrom: purple
5
- colorTo: red
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- short_description: Document-based Q&A using FastAPI, FAISS, and Google Gemini
10
- ---
11
-
12
- # Gemini RAG FastAPI Pro
13
-
14
- Production-ready Retrieval-Augmented Generation backend built with **FastAPI**, **FAISS**, and **Google Gemini**.
15
-
16
- ## Tech Stack
17
- - FastAPI
18
- - FAISS
19
- - Google Gemini API
20
- - Docker
21
- - Hugging Face Spaces
22
-
23
- ## Usage
24
- 1. Upload documents
25
- 2. Ask questions
26
- 3. Get grounded answers
27
-
28
- Built with ❤️ by Vignesh LV
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ 📄 Gemini RAG Assistant (FastAPI)
2
+
3
+ A production-style Retrieval-Augmented Generation (RAG) application built with FastAPI, Google Gemini, and FAISS, capable of answering questions and generating summaries from uploaded documents (PDF/TXT) with grounded responses, citations, and confidence scoring.
4
+
5
+ This project evolved iteratively from a simple FastAPI API into a robust, end-to-end AI system, covering real-world challenges like PDF ingestion, vector search, LLM rate limits, and Git hygiene.
6
+
7
+ 🚀 Features
8
+
9
+ 📤 Upload PDF and TXT documents
10
+
11
+ 🔍 Retrieval-Augmented Q&A using FAISS
12
+
13
+ 🧠 Grounded answers powered by Google Gemini
14
+
15
+ 📝 Document summarization using the same RAG pipeline
16
+
17
+ 📚 Page-level citations for transparency
18
+
19
+ 📊 Confidence scoring based on retrieval strength
20
+
21
+ Async FastAPI backend (non-blocking I/O)
22
+
23
+ 🧪 Mock mode for UI testing when API quota is exhausted
24
+
25
+ 🧹 Clean Git history with generated files ignored
26
+
27
+ 🏗️ Architecture Overview
28
+ Frontend (HTML + JS)
29
+
30
+ FastAPI Backend
31
+
32
+ Document Ingestion (PDF / TXT)
33
+
34
+ Embeddings (SentenceTransformers)
35
+
36
+ FAISS Vector Store
37
+
38
+ Retriever (Top-K Similarity Search)
39
+
40
+ Prompt Assembly
41
+
42
+ Google Gemini LLM
43
+
44
+ Grounded Response + Citations + Confidence
45
+
46
+ 🧠 Key Concepts Learned
47
+ 1. FastAPI Fundamentals
48
+
49
+ GET and POST endpoints
50
+
51
+ Request/response lifecycle
52
+
53
+ Input validation using Pydantic models
54
+
55
+ Async endpoints for non-blocking LLM calls
56
+
57
+ 2. Real LLM Integration
58
+
59
+ Secure API key handling via environment variables
60
+
61
+ Structured prompts for strict input/output control
62
+
63
+ Handling rate limits and safety-filtered responses
64
+
65
+ Graceful error handling and fallbacks
66
+
67
+ 3. Retrieval-Augmented Generation (RAG)
68
+
69
+ Why LLMs alone are unreliable for factual answers
70
+
71
+ Converting documents into embeddings
72
+
73
+ Similarity search using FAISS
74
+
75
+ Injecting retrieved context into prompts for grounded answers
76
+
77
+ 4. Document Ingestion Reality
78
+
79
+ Not all PDFs are text-based
80
+
81
+ Scanned/screenshot PDFs require OCR
82
+
83
+ RAG quality depends on data quality
84
+
85
+ Silent failures often come from missing extractable text
86
+
87
+ 5. Summarization vs Q&A
88
+
89
+ Summarization is not the same as question answering
90
+
91
+ Naive summarization can fail due to token limits
92
+
93
+ Simpler pipelines are often more stable for small documents
94
+
95
+ 6. Confidence & Trust
96
+
97
+ Confidence score reflects retrieval strength, not “truth”
98
+
99
+ Honest responses (“I don’t know”) improve trust
100
+
101
+ Citations are critical for verification
102
+
103
+ 7. Engineering Best Practices
104
+
105
+ Start with a stable baseline before adding complexity
106
+
107
+ Mock LLM responses during development
108
+
109
+ Handle API quotas and rate limits explicitly
110
+
111
+ Keep generated files out of Git (.gitignore)
112
+
113
+ Resolve Git branch divergence safely using rebase
114
+
115
+ 🛠️ Tech Stack
116
+ Backend
117
+
118
+ Python
119
+
120
+ FastAPI
121
+
122
+ FAISS
123
+
124
+ SentenceTransformers
125
+
126
+ Google Gemini API
127
+
128
+ PyPDF
129
+
130
+ python-dotenv
131
+
132
+ Frontend
133
+
134
+ HTML
135
+
136
+ CSS
137
+
138
+ Vanilla JavaScript (Fetch API)
139
+
140
+ Platform & Tooling
141
+
142
+ VS Code
143
+
144
+ Git & GitHub
145
+
146
+ Hugging Face Spaces (deployment)
147
+
148
+ Virtual Environments (venv)
149
+
150
+ ⚙️ Setup Instructions
151
+ 1️⃣ Clone the repository
152
+ git clone https://github.com/your-username/your-repo-name.git
153
+ cd your-repo-name
154
+
155
+ 2️⃣ Create & activate virtual environment
156
+ python -m venv venv
157
+ source venv/bin/activate # Linux/Mac
158
+ venv\Scripts\activate # Windows
159
+
160
+ 3️⃣ Install dependencies
161
+ pip install -r requirements.txt
162
+
163
+ 4️⃣ Set environment variables
164
+
165
+ Create a .env file:
166
+
167
+ GEMINI_API_KEY=your_api_key_here
168
+
169
+ 5️⃣ Run the server
170
+ uvicorn main:app --reload
171
+
172
+
173
+ Open in browser:
174
+
175
+ http://127.0.0.1:8000
176
+
177
+ 🧪 Mock Mode (Development)
178
+
179
+ To test the UI without consuming Gemini API quota:
180
+
181
+ Enable mock responses in main.py
182
+
183
+ Allows frontend and flow testing without LLM calls
184
+
185
+ This mirrors real production workflows.
186
+
187
+ ⚠️ Known Limitations
188
+
189
+ Scanned/image-based PDFs are not supported (OCR required)
190
+
191
+ Confidence score is heuristic, not a guarantee of correctness
192
+
193
+ Large documents may require map-reduce summarization (future work)
194
+
195
+ 🔮 Future Improvements
196
+
197
+ OCR integration for scanned PDFs
198
+
199
+ Chunk-based retrieval for large documents
200
+
201
+ Streaming LLM responses
202
+
203
+ Evaluation metrics for answer quality
204
+
205
+ Multi-document cross-referencing
206
+
207
+ Auth & user-specific document stores