Spaces:

lvvignesh2122
/

Gemini-Rag-Fastapi-Pro

Sleeping

App Files Files Community

lvvignesh2122 commited on Jan 7

Commit

a7badf3

unverified ·

1 Parent(s): 4af310b

Update README.md

Browse files

Files changed (1) hide show

README.md +142 -109

README.md CHANGED Viewed

@@ -1,116 +1,171 @@
-📄 Gemini RAG Assistant (FastAPI)
-A production-style Retrieval-Augmented Generation (RAG) application built with FastAPI, Google Gemini, and FAISS, capable of answering questions and generating summaries from uploaded documents (PDF/TXT) with grounded responses, citations, and confidence scoring.
-This project evolved iteratively from a simple FastAPI API into a robust, end-to-end AI system, covering real-world challenges like PDF ingestion, vector search, LLM rate limits, and Git hygiene.
-🚀 Features
-📤 Upload PDF and TXT documents
-🔍 Retrieval-Augmented Q&A using FAISS
-🧠 Grounded answers powered by Google Gemini
-📝 Document summarization using the same RAG pipeline
-📚 Page-level citations for transparency
-📊 Confidence scoring based on retrieval strength
-⚡ Async FastAPI backend (non-blocking I/O)
-🧪 Mock mode for UI testing when API quota is exhausted
-🧹 Clean Git history with generated files ignored
-🏗️ Architecture Overview
-Frontend (HTML + JS)
         ↓
 FastAPI Backend
         ↓
 Document Ingestion (PDF / TXT)
         ↓
 Embeddings (SentenceTransformers)
         ↓
-FAISS Vector Store
         ↓
-Retriever (Top-K Similarity Search)
         ↓
 Prompt Assembly
         ↓
 Google Gemini LLM
         ↓
-Grounded Response + Citations + Confidence
-🧠 Key Concepts Learned
-1. FastAPI Fundamentals
-GET and POST endpoints
-Request/response lifecycle
-Input validation using Pydantic models
-Async endpoints for non-blocking LLM calls
-2. Real LLM Integration
-Secure API key handling via environment variables
-Structured prompts for strict input/output control
-Handling rate limits and safety-filtered responses
-Graceful error handling and fallbacks
-3. Retrieval-Augmented Generation (RAG)
-Why LLMs alone are unreliable for factual answers
-Converting documents into embeddings
-Similarity search using FAISS
-Injecting retrieved context into prompts for grounded answers
-4. Document Ingestion Reality
-Not all PDFs are text-based
-Scanned/screenshot PDFs require OCR
-RAG quality depends on data quality
-Silent failures often come from missing extractable text
-5. Summarization vs Q&A
-Summarization is not the same as question answering
-Naive summarization can fail due to token limits
-Simpler pipelines are often more stable for small documents
-6. Confidence & Trust
-Confidence score reflects retrieval strength, not “truth”
-Honest responses (“I don’t know”) improve trust
-Citations are critical for verification
-7. Engineering Best Practices
-Start with a stable baseline before adding complexity
-Mock LLM responses during development
-Handle API quotas and rate limits explicitly
-Keep generated files out of Git (.gitignore)
-Resolve Git branch divergence safely using rebase
 🛠️ Tech Stack
 Backend
@@ -119,10 +174,12 @@ Python
 FastAPI
-FAISS
 SentenceTransformers
 Google Gemini API
 PyPDF
@@ -137,73 +194,49 @@ CSS
 Vanilla JavaScript (Fetch API)
-Platform & Tooling
 VS Code
 Git & GitHub
 Hugging Face Spaces (deployment)
 Virtual Environments (venv)
-⚙️ Setup Instructions
-1️⃣ Clone the repository
-git clone https://github.com/your-username/your-repo-name.git
-cd your-repo-name
-2️⃣ Create & activate virtual environment
 python -m venv venv
-source venv/bin/activate  # Linux/Mac
-venv\Scripts\activate     # Windows
-3️⃣ Install dependencies
 pip install -r requirements.txt
-4️⃣ Set environment variables
-Create a .env file:
 GEMINI_API_KEY=your_api_key_here
-5️⃣ Run the server
 uvicorn main:app --reload
-Open in browser:
-http://127.0.0.1:8000
-Test and use my RAG project on Hugging face : https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro
-🧪 Mock Mode (Development)
-To test the UI without consuming Gemini API quota:
-Enable mock responses in main.py
-Allows frontend and flow testing without LLM calls
-This mirrors real production workflows.
 ⚠️ Known Limitations
-Scanned/image-based PDFs are not supported (OCR required)
-Confidence score is heuristic, not a guarantee of correctness
-Large documents may require map-reduce summarization (future work)
-🔮 Future Improvements
-OCR integration for scanned PDFs
-Chunk-based retrieval for large documents
-Streaming LLM responses
-Evaluation metrics for answer quality
-Multi-document cross-referencing
-Auth & user-specific document stores

+📄 Gemini RAG Backend System (FastAPI)
+Production-grade Retrieval-Augmented Generation (RAG) backend built with FastAPI, FAISS (ANN), and Google Gemini — featuring hybrid retrieval, HNSW indexing, cross-encoder reranking, evaluation logging, and analytics.
+This repository demonstrates how modern AI backend systems are actually built in industry, not toy demos.
+🚀 What This Project Is
+This is a full RAG backend system that:
+Ingests large PDF/TXT documents
+Builds vector indexes with Approximate Nearest Neighbor (ANN) search
+Answers questions using grounded LLM responses
+Tracks confidence, known/unknown answers, and usage analytics
+Supports production constraints (file limits, caching, logging)
+The project evolved from RAG v1 → RAG v2, adding real-world scalability and observability.
+✨ Key Features (RAG v2)
+📥 Document Ingestion
+Upload PDF and TXT files
+Sentence-aware chunking with overlap
+Page-level metadata for citations
+🔍 Retrieval (Hybrid + ANN)
+FAISS HNSW ANN index for scalable similarity search
+Cosine similarity via normalized embeddings
+Keyword boosting for lexical relevance
+🧠 Reranking (Quality Boost)
+Cross-Encoder (ms-marco-MiniLM) reranking
+Improves relevance beyond raw vector similarity
+Mimics production search stacks (retrieve → rerank)
+🤖 LLM Generation
+Google Gemini 2.5 Flash
+Strict grounding: answers only from retrieved context
+Honest fallback: "I don't know" when unsupported
+📊 Evaluation & Monitoring
+Logs every query:
+retrieved chunk count
+confidence score
+known vs unknown answers
+JSONL logs for offline analysis
+Built-in analytics dashboard
+📈 Analytics Dashboard
+Total queries
+Knowledge rate
+Average confidence
+Unknown query tracking
+Recent query history
+Dark / Light mode UI
+🛡️ Production Safeguards
+File upload size limits (configurable)
+API quota handling
+Caching to reduce LLM calls
+Clean error handling
+Persistent vector store
+🏗️ System Architecture
+Frontend (HTML / JS)
         ↓
 FastAPI Backend
         ↓
 Document Ingestion (PDF / TXT)
         ↓
+Sentence Chunking + Metadata
+        ↓
 Embeddings (SentenceTransformers)
         ↓
+FAISS ANN Index (HNSW)
         ↓
+Hybrid Retrieval (Vector + Keyword)
+        ↓
+Cross-Encoder Reranking
         ↓
 Prompt Assembly
         ↓
 Google Gemini LLM
         ↓
+Answer + Confidence + Citations
+        ↓
+Evaluation Logging + Analytics
+🧠 Core Concepts Demonstrated
+Retrieval-Augmented Generation (RAG)
+Why pure LLMs hallucinate
+How grounding fixes factual accuracy
+Vector search vs keyword search
+Hybrid retrieval strategies
+Approximate Nearest Neighbor (ANN)
+Why brute-force search fails at scale
+HNSW indexing for fast similarity search
+efConstruction vs efSearch trade-offs
+Reranking
+Why top-K vectors ≠ best answers
+Cross-encoder reranking for relevance
+Industry-standard retrieval pipelines
+Evaluation & Observability
+Measuring known vs unknown
+Confidence as a heuristic, not truth
+Logging for iterative improvement
+Analytics-driven RAG tuning
+Real Backend Engineering
+API limits & retries
+Persistent storage
+Clean Git hygiene
+Incremental system evolution
 🛠️ Tech Stack
 Backend
 FastAPI
+FAISS (HNSW ANN)
 SentenceTransformers
+Cross-Encoder (MS MARCO)
 Google Gemini API
 PyPDF
 Vanilla JavaScript (Fetch API)
+Tooling & Platform
 VS Code
 Git & GitHub
+Docker
 Hugging Face Spaces (deployment)
 Virtual Environments (venv)
+⚙️ Setup & Run Locally
+1️⃣ Clone Repository
+git clone https://github.com/LVVignesh/gemini-rag-fastapi.git
+cd gemini-rag-fastapi
 python -m venv venv
+venv\Scripts\activate
 pip install -r requirements.txt
 GEMINI_API_KEY=your_api_key_here
 uvicorn main:app --reload
 ⚠️ Known Limitations
+Scanned/image-only PDFs require OCR (not included)
+Confidence score is heuristic
+Very large corpora may require:
+batch ingestion
+sharding
+background workers
+🚀 Live Demo
+👉 Hugging Face Spaces
+https://huggingface.co/spaces/lvvignesh2122/Gemini-Rag-Fastapi-Pro
+📜 License
+MIT License