Mrigank005 commited on
Commit
0e99494
·
verified ·
1 Parent(s): 8f84afa

Upload 12 files

Browse files
.gitignore ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Environment
2
+ .env
3
+ .venv/
4
+ venv/
5
+ __pycache__/
6
+ *.pyc
7
+
8
+ # IDE
9
+ .vscode/
10
+ .idea/
11
+
12
+ # OS
13
+ .DS_Store
14
+ Thumbs.db
README.md CHANGED
@@ -1,11 +1,228 @@
1
- ---
2
- title: Portfolio Backend
3
- emoji: 📚
4
- colorFrom: purple
5
- colorTo: yellow
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Portfolio Chatbot API
2
+
3
+ A RAG (Retrieval-Augmented Generation) based chatbot API for Mrigank Singh's portfolio, powered by FastAPI, Google Gemini, and Pinecone.
4
+
5
+ ## 🚀 Features
6
+
7
+ - **FastAPI Backend**: High-performance async API with automatic OpenAPI documentation
8
+ - **RAG Architecture**: Combines semantic search with LLM generation for accurate, context-aware responses
9
+ - **Vector Search**: Uses Pinecone for efficient similarity search over portfolio knowledge base
10
+ - **Google Gemini Integration**: Leverages Gemini 2.0 Flash for responses and embedding generation
11
+ - **CORS Enabled**: Ready for frontend integration from any origin
12
+ - **Production Ready**: Deployed on Render with health check endpoints
13
+
14
+ ## 📋 Prerequisites
15
+
16
+ - Python 3.9+
17
+ - Pinecone account and API key
18
+ - Google AI Studio API key
19
+ - (Optional) Render account for deployment
20
+
21
+ ## 🛠️ Installation
22
+
23
+ 1. **Clone the repository**
24
+ ```bash
25
+ git clone <repository-url>
26
+ cd Portfolio-Backend
27
+ ```
28
+
29
+ 2. **Create a virtual environment**
30
+ ```bash
31
+ python -m venv venv
32
+ source venv/bin/activate # On Windows: venv\Scripts\activate
33
+ ```
34
+
35
+ 3. **Install dependencies**
36
+ ```bash
37
+ pip install -r requirements.txt
38
+ ```
39
+
40
+ 4. **Set up environment variables**
41
+
42
+ Create a `.env` file in the root directory:
43
+ ```env
44
+ GOOGLE_API_KEY=your_google_api_key_here
45
+ PINECONE_API_KEY=your_pinecone_api_key_here
46
+ ```
47
+
48
+ ## 📊 Project Structure
49
+
50
+ ```
51
+ Portfolio-Backend/
52
+ ├── app/
53
+ │ ├── __init__.py # Application initialization
54
+ │ ├── main.py # FastAPI app and endpoints
55
+ │ └── utils.py # RAG pipeline and helper functions
56
+ ├── data/
57
+ │ └── knowledge_base.txt # Portfolio information source
58
+ ├── scripts/
59
+ │ └── ingest.py # Data ingestion script for Pinecone
60
+ ├── render.yaml # Render deployment configuration
61
+ ├── requirements.txt # Python dependencies
62
+ └── README.md # This file
63
+ ```
64
+
65
+ ## 🔧 Configuration
66
+
67
+ ### Pinecone Setup
68
+
69
+ 1. Create a Pinecone index named `portfolio-chat`
70
+ 2. Configure the index with:
71
+ - Dimension: 768 (matches Gemini embedding model)
72
+ - Metric: Cosine similarity
73
+
74
+ ### Knowledge Base
75
+
76
+ The knowledge base is stored in [data/knowledge_base.txt](data/knowledge_base.txt). The file should contain information about the portfolio owner, structured in chunks separated by double newlines.
77
+
78
+ ## 📤 Data Ingestion
79
+
80
+ Before running the API, you need to ingest the knowledge base into Pinecone:
81
+
82
+ ```bash
83
+ python scripts/ingest.py
84
+ ```
85
+
86
+ This script will:
87
+ 1. Load and chunk the knowledge base text file
88
+ 2. Generate embeddings using Google Gemini
89
+ 3. Upsert vectors to Pinecone in batches
90
+
91
+ ## 🚀 Running the Application
92
+
93
+ ### Development
94
+
95
+ ```bash
96
+ uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
97
+ ```
98
+
99
+ The API will be available at `http://localhost:8000`
100
+
101
+ ### Production
102
+
103
+ ```bash
104
+ uvicorn app.main:app --host 0.0.0.0 --port 10000
105
+ ```
106
+
107
+ ## 📡 API Endpoints
108
+
109
+ ### Health Check
110
+ ```http
111
+ GET /
112
+ ```
113
+ Returns the API status.
114
+
115
+ **Response:**
116
+ ```json
117
+ {
118
+ "status": "ok"
119
+ }
120
+ ```
121
+
122
+ ### Chat
123
+ ```http
124
+ POST /chat
125
+ ```
126
+
127
+ Send a message to the chatbot and receive a context-aware response.
128
+
129
+ **Request Body:**
130
+ ```json
131
+ {
132
+ "message": "Tell me about Mrigank's projects"
133
+ }
134
+ ```
135
+
136
+ **Response:**
137
+ ```json
138
+ {
139
+ "response": "Mrigank has worked on several impressive projects including..."
140
+ }
141
+ ```
142
+
143
+ **Error Responses:**
144
+ - `400 Bad Request`: Empty message
145
+ - `500 Internal Server Error`: Error generating response
146
+
147
+ ## 🧠 How RAG Works
148
+
149
+ 1. **Query Embedding**: User's question is converted to a 768-dimensional vector using Gemini
150
+ 2. **Semantic Search**: Top 5 most relevant chunks are retrieved from Pinecone
151
+ 3. **Context Assembly**: Retrieved chunks are combined into a context string
152
+ 4. **LLM Generation**: Context and query are sent to Gemini 2.0 Flash for response generation
153
+ 5. **Response**: Contextually accurate answer is returned to the user
154
+
155
+ ## 🌐 Deployment
156
+
157
+ This project is configured for deployment on Render. The [render.yaml](render.yaml) file contains the deployment configuration.
158
+
159
+ ### Deploy to Render
160
+
161
+ 1. Connect your GitHub repository to Render
162
+ 2. Render will automatically detect the `render.yaml` configuration
163
+ 3. Add environment variables in Render dashboard:
164
+ - `GOOGLE_API_KEY`
165
+ - `PINECONE_API_KEY`
166
+ 4. Deploy!
167
+
168
+ ## 🔑 Environment Variables
169
+
170
+ | Variable | Description | Required |
171
+ |----------|-------------|----------|
172
+ | `GOOGLE_API_KEY` | Google AI Studio API key | Yes |
173
+ | `PINECONE_API_KEY` | Pinecone API key | Yes |
174
+
175
+ ## 📦 Dependencies
176
+
177
+ - **FastAPI** (0.115.0): Modern web framework for building APIs
178
+ - **Uvicorn** (0.30.6): ASGI server for running FastAPI
179
+ - **Google GenAI** (1.0.0): Google's generative AI client library
180
+ - **Pinecone Client** (5.0.1): Vector database client
181
+ - **Pydantic** (2.9.2): Data validation using Python type annotations
182
+ - **Python-dotenv** (1.0.1): Environment variable management
183
+
184
+ ## 📝 API Documentation
185
+
186
+ Once the server is running, visit:
187
+ - Swagger UI: `http://localhost:8000/docs`
188
+ - ReDoc: `http://localhost:8000/redoc`
189
+
190
+ ## 🛡️ Security Considerations
191
+
192
+ - Store API keys securely in environment variables, never commit them to version control
193
+ - In production, restrict CORS origins to trusted domains
194
+ - Consider rate limiting for the `/chat` endpoint
195
+ - Implement authentication for sensitive deployments
196
+
197
+ ## 🐛 Troubleshooting
198
+
199
+ ### Common Issues
200
+
201
+ **Issue**: `PINECONE_API_KEY not found`
202
+ - **Solution**: Ensure `.env` file exists and contains the required API keys
203
+
204
+ **Issue**: `Index 'portfolio-chat' not found`
205
+ - **Solution**: Create the Pinecone index before running the ingestion script
206
+
207
+ **Issue**: `No context chunks found`
208
+ - **Solution**: Run the ingestion script to populate the Pinecone index
209
+
210
+ ## 🤝 Contributing
211
+
212
+ 1. Fork the repository
213
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
214
+ 3. Commit your changes (`git commit -m 'Add amazing feature'`)
215
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
216
+ 5. Open a Pull Request
217
+
218
+ ## 📄 License
219
+
220
+ This project is part of a personal portfolio. All rights reserved.
221
+
222
+ ## 📧 Contact
223
+
224
+ For questions or collaboration opportunities, reach out to Mrigank Singh through his portfolio website.
225
+
226
+ ---
227
+
228
+ **Built with** ❤️ **using FastAPI, Google Gemini, and Pinecone**
app/__init__.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Portfolio Chatbot Backend
3
+
4
+ A RAG-based chatbot API using FastAPI, Google Gemini, and Pinecone.
5
+ """
6
+
7
+ __version__ = "1.0.0"
8
+ __author__ = "Mrigank Singh"
app/__pycache__/__init__.cpython-312.pyc ADDED
Binary file (332 Bytes). View file
 
app/__pycache__/main.cpython-312.pyc ADDED
Binary file (2.19 kB). View file
 
app/__pycache__/utils.cpython-312.pyc ADDED
Binary file (5.24 kB). View file
 
app/main.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException
2
+ from fastapi.middleware.cors import CORSMiddleware
3
+ from pydantic import BaseModel
4
+
5
+ from app.utils import get_rag_response
6
+
7
+ app = FastAPI(
8
+ title="Portfolio Chatbot API",
9
+ description="RAG-based chatbot for Mrigank Singh's portfolio",
10
+ version="1.0.0"
11
+ )
12
+
13
+ # CORS Middleware
14
+ app.add_middleware(
15
+ CORSMiddleware,
16
+ allow_origins=["https://mrigank-portfolio-website.vercel.app", "https://mrigank-portfolio-website.vercel.app/"],
17
+ allow_credentials=True,
18
+ allow_methods=["*"],
19
+ allow_headers=["*"],
20
+ )
21
+
22
+
23
+ class ChatRequest(BaseModel):
24
+ message: str
25
+
26
+
27
+ class ChatResponse(BaseModel):
28
+ response: str
29
+
30
+
31
+ @app.get("/")
32
+ async def health_check():
33
+ """Health check endpoint for Render."""
34
+ return {"status": "ok"}
35
+
36
+
37
+ @app.post("/chat", response_model=ChatResponse)
38
+ async def chat(request: ChatRequest):
39
+ """
40
+ Chat endpoint - processes user message through RAG pipeline.
41
+ """
42
+ if not request.message.strip():
43
+ raise HTTPException(status_code=400, detail="Message cannot be empty")
44
+
45
+ try:
46
+ response = get_rag_response(request.message)
47
+ return ChatResponse(response=response)
48
+ except Exception as e:
49
+ raise HTTPException(status_code=500, detail=f"Error generating response: {str(e)}")
app/utils.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from dotenv import load_dotenv
3
+ from google import genai
4
+ from google.genai import types
5
+ from pinecone import Pinecone
6
+
7
+ load_dotenv()
8
+
9
+ # Initialize Pinecone
10
+ pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
11
+ index = pc.Index("portfolio-chat")
12
+
13
+ # Initialize Google GenAI Client
14
+ client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))
15
+
16
+ # Constants
17
+ EMBEDDING_MODEL = "gemini-embedding-001"
18
+ LLM_MODEL = "gemini-2.5-flash-lite"
19
+ EMBEDDING_DIMENSION = 768
20
+
21
+ def get_embedding(text: str) -> list[float]:
22
+ """Generate embedding for a given text using Gemini embedding model."""
23
+ try:
24
+ response = client.models.embed_content(
25
+ model=EMBEDDING_MODEL,
26
+ contents=text,
27
+ config=types.EmbedContentConfig(
28
+ output_dimensionality=EMBEDDING_DIMENSION
29
+ )
30
+ )
31
+ return response.embeddings[0].values
32
+ except Exception as e:
33
+ print(f"Error generating embedding: {e}")
34
+ return []
35
+
36
+ def get_rag_response(query: str) -> str:
37
+ """
38
+ RAG pipeline: embed query, retrieve context from Pinecone, generate response.
39
+ """
40
+ try:
41
+ # Step 1: Embed the query
42
+ query_embedding = get_embedding(query)
43
+ if not query_embedding:
44
+ return "I'm having a little trouble accessing my brain right now. Please try again!"
45
+
46
+ # Step 2: Query Pinecone for top 5 matches
47
+ results = index.query(
48
+ vector=query_embedding,
49
+ top_k=10,
50
+ include_metadata=True
51
+ )
52
+
53
+ # Step 3: Extract context from matches
54
+ context_chunks = []
55
+ for match in results.matches:
56
+ if match.metadata and "text" in match.metadata:
57
+ context_chunks.append(match.metadata["text"])
58
+
59
+ # Handle case where no context is found
60
+ if not context_chunks:
61
+ return "I couldn't find any specific details about that in Mrigank's portfolio, but feel free to ask about his patents, DASES, or other projects!"
62
+
63
+ # Join chunks to create the context text
64
+ context_text = "\n\n---\n\n".join(context_chunks)
65
+
66
+ # Step 4: Construct the system prompt
67
+ system_prompt = f"""You are the Advanced AI Assistant for **Mrigank Singh**, a Full Stack AI Developer and Innovator.
68
+ Your goal is to impress recruiters and engineers by accurately showcasing Mrigank's technical depth, innovation, and leadership.
69
+
70
+ ### CORE INSTRUCTIONS:
71
+ 1. **Identity:** You are NOT Mrigank. You are his digital assistant. Refer to him as "Mrigank" or "he".
72
+ 2. **Tone:** Professional, confident, and technically precise. Sound like a Software Engineer, not a marketing brochure.
73
+ 3. **Formatting:** Use **Markdown** to make answers readable.
74
+ - Use **bold** for key technologies or metrics.
75
+ - Use `bullet points` for lists (skills, projects).
76
+ - Do not output large walls of text; break it up.
77
+ 4. **Source of Truth:** Answer ONLY based on the "CONTEXT" provided below. Do not make up facts.
78
+ - If the answer isn't in the context, say: "I don't have that specific detail, but I can tell you about his patents, his projects or more about him."
79
+
80
+ ### CRITICAL BEHAVIORS:
81
+ - **Recruiters:** If asked about hiring, availability, or contact info, explicitly provide his **Email** and **LinkedIn** from the context.
82
+ - **Patents:** If asked about innovation, ALWAYS mention his 3 filed patents (Terms & Conditions AI, LexiBot, MealMatch).
83
+ - **Group Projects:** Credit **Konal Puri and Aviral Khanna** for DASES/UPES Career Platform. Specify Mrigank's role (Mobile App/Frontend).
84
+ - **Technical Depth:** Mention specific algorithms (e.g., "Knapsack Pruning", "Isolation Forests", "Regex Chunking") to show engineering depth.
85
+
86
+ ### CONTEXT FROM KNOWLEDGE BASE:
87
+ {context_text}
88
+ """
89
+
90
+ # Step 5: Generate response using Gemini
91
+ response = client.models.generate_content(
92
+ model=LLM_MODEL,
93
+ contents=[
94
+ types.Content(
95
+ role="user",
96
+ parts=[
97
+ types.Part.from_text(text=system_prompt + "\n\nUser Question: " + query)
98
+ ]
99
+ )
100
+ ],
101
+ config=types.GenerateContentConfig(
102
+ temperature=0.7,
103
+ max_output_tokens=500
104
+ )
105
+ )
106
+
107
+ return response.text
108
+
109
+ except Exception as e:
110
+ print(f"Error in RAG pipeline: {e}")
111
+ return "I'm encountering a temporary issue connecting to the knowledge base. Please try again in a moment."
data/knowledge_base.txt ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ### PROJECT_MASTER_INDEX
2
+ **Full Project List for Mrigank Singh:**
3
+
4
+ **SOLO PROJECTS (Built Individually):**
5
+ 1. **MealMatch AI:** A serverless food ordering app using Knapsack algorithms for budget/calorie optimization.
6
+ 2. **JobFit:** An AI Agentic pipeline for resume-job matching using LangGraph and Gemini.
7
+ 3. **LexiBot:** A RAG-based legal assistant for Indian Law using Intelligent Chunking.
8
+ 4. **F&B Process Anomaly Detection:** An industrial ML pipeline using Isolation Forests and Autoencoders.
9
+ 5. **Better LinkedIn:** A frontend architectural redesign focusing on performance and UX.
10
+
11
+ **GROUP PROJECTS (Collaborations):**
12
+ 1. **DASES (Descriptive Answer Sheet Evaluation System):** Built with mentors Konal Puri & Aviral Khanna. Mrigank built the Mobile App (React Native) and contributed to Web Frontend.
13
+ 2. **UPES Career Services Platform:** Built with Konal Puri & Aviral Khanna. Mrigank built the Frontend (React/Vite) and AI Prompts.
14
+
15
+ **PATENTS (3 Filed):**
16
+ 1. AI-Assisted Terms & Conditions System.
17
+ 2. LexiBot (Legal Assistant).
18
+ 3. MealMatch AI (Food Optimization).
19
+
20
+ ### GENERAL_FAQ_AND_FACTS
21
+ **Availability & Contact:**
22
+ - **Status:** Actively seeking **Summer Internships for 2026**. Open to **Remote roles** (if compatible with college hours) and **Domestic Relocation**.
23
+ - **Contact:** Email: `mriganksingh005@gmail.com` | Phone: `+91 82734 37398`
24
+ - **Socials:** LinkedIn: `linkedin.com/in/mrigank005` | GitHub: `github.com/Mrigank005`
25
+ - **Location:** Dehradun/Kanpur, India (Timezone: IST +5:30).
26
+ - **Graduation:** May/June 2028.
27
+
28
+ **Technical Snapshot:**
29
+ - **Strongest Stack:** React (Frontend) + FastAPI (Backend) + Supabase (DB/Auth).
30
+ - **Languages:** Python, JavaScript, TypeScript. (Uses C/C++ and Java for DSA).
31
+ - **AI/ML:** Gemini API, LangChain, LangGraph, TensorFlow, PyTorch, RAG Pipelines.
32
+ - **Databases:** PostgreSQL, MongoDB, Pinecone, Qdrant, Supabase.
33
+ - **Tools:** Docker, Git, VS Code.
34
+
35
+ **Current Focus:**
36
+ - **Learning:** Mastering Advanced System Design, Agentic AI Workflows, and Open Source contributions.
37
+ - **Certifications:** Currently pursuing Machine Learning Specialization on DeepLearning.AI.
38
+ - **Hobbies:** Table Tennis, Cricket, Gaming (BGMI), and Music.
39
+
40
+ ### PROJECT_1_DEEP_DIVE: DASES (Flagship Project)
41
+ **Full Name:** Descriptive Answer Sheet Evaluation System
42
+ **Type:** Group Project (Teammates: Konal Puri, Aviral Khanna).
43
+ **Mrigank's Role:** Built the **Mobile App** from scratch (React Native/Expo) and contributed to Web Frontend.
44
+ **Status:** Mobile App is in final stages; Web App deployed at `dases.esun.solutions`.
45
+ **Summary:** An AI-driven system using OCR and LLMs to grade handwritten descriptive answers 90x faster than manual methods with 98% accuracy.
46
+ **Key Technical Features (Mobile App):**
47
+ - **Locked Exam Mode:** Implemented a "High Security" environment. Uses `AppState` listeners to detect background switching. If a student leaves the app for >15s, the exam auto-submits.
48
+ - **Secure Scanning:** Custom camera interface (VisionKit/LMS) that disables gallery uploads, forcing live capture to prevent cheating.
49
+ - **Auth Persistence:** Built a custom `SecureStoreAdapter` to bridge Supabase Auth with the device's encrypted Keychain/Keystore, solving standard localStorage security risks on mobile.
50
+ **Impact:** Reduced grading cost from ₹25/sheet (Cloud) to ₹2/sheet (In-house GPU).
51
+
52
+ ### PROJECT_2_DEEP_DIVE: LexiBot (AI Legal Assistant)
53
+ **Type:** Solo Project (Patent Filed).
54
+ **Deployed Link:** `lexibot-ai.vercel.app`
55
+ **Summary:** A Telegram chatbot for Indian legal queries (Consumer, Traffic, Harassment law) using RAG.
56
+ **Technical "Flex":**
57
+ - **Intelligent Chunking:** Does NOT use fixed-size chunks. Uses an LLM + Regex pipeline to split documents at logical semantic boundaries (e.g., "Article 21", "Section 4"), preserving legal context.
58
+ - **Infrastructure:** Fully dockerized. Uses a `healthcheck` in Docker Compose to ensure the Qdrant Vector DB is fully ready before the bot application starts.
59
+ - **Memory:** Uses `ConversationBufferWindowMemory` (k=5) to handle follow-up questions ("What is the penalty for *that*?").
60
+ - **Safety:** Prevents hallucinations by strictly grounding answers in retrieved context.
61
+
62
+ ### PROJECT_3_DEEP_DIVE: MealMatch AI
63
+ **Type:** Solo Project (Patent Filed).
64
+ **Deployed Link:** `mealmatch-ai.vercel.app`
65
+ **Summary:** Serverless food ordering app that generates meal combos strictly fitting *both* Budget & Calorie limits.
66
+ **Commercial Potential:** Mrigank believes this has the highest commercial potential due to mass consumer appeal.
67
+ **Technical "Flex":**
68
+ - **Algorithm:** Uses a "Knapsack-style" optimization algorithm with **Heuristic Pruning**. It pre-sorts items by "efficiency" (calories/price) and stops recursion after finding the top 6 combos to prevent UI freezes (solving the O(n³) complexity issue).
69
+ - **Compatibility Matrix:** Implemented a rule-based system to prevent culinary clashes (e.g., ensuring it doesn't suggest Rice + Pasta in the same combo).
70
+ - **Architecture:** Client-side only (Serverless). Logic runs entirely in the browser.
71
+
72
+ ### PROJECT_4_DEEP_DIVE: JobFit (Resume Analyzer)
73
+ **Type:** Solo Project.
74
+ **Deployed Link:** `jobfit-analysis-ai.vercel.app`
75
+ **Summary:** An agentic AI pipeline that screens resumes against job descriptions.
76
+ **Technical "Flex":**
77
+ - **Architecture:** Uses **LangGraph** to model the analysis as a State Machine (Extraction -> Job Analysis -> Profiling -> Compatibility Scoring).
78
+ - **Security:** Implements "Direct-to-S3" uploads using AWS Presigned URLs with 5-minute expiry to bypass server bottlenecks and ensure security.
79
+ - **Handling Hallucinations:** Uses Regex-based JSON extraction (`safe_parse_json`) to clean LLM outputs before rendering.
80
+
81
+ ### PROJECT_5_DEEP_DIVE: F&B Process Anomaly Detection
82
+ **Type:** Solo Project (Industrial ML).
83
+ **Repo:** `github.com/Mrigank005/F-B-Process-Anomaly-Detection-System`
84
+ **Summary:** Industrial ML pipeline to detect defects in food batches (analyzing 1500+ batches across 11 parameters).
85
+ **Technical "Flex":**
86
+ - **Ensemble Model:** Combines 4 algorithms (Isolation Forest, One-Class SVM, Local Outlier Factor, Autoencoder). A batch is flagged only if ≥2 models agree (Consensus Voting).
87
+ - **Explainability:** Integrated **SHAP** (Shapley Additive Explanations) to tell operators exactly *which* sensor (e.g., "Humidity") caused the alarm.
88
+ - **Auto-Thresholding:** The Autoencoder dynamically sets its error threshold based on the 95th percentile of reconstruction error.
89
+
90
+ ### PROJECT_6_DEEP_DIVE: UPES Career Services Platform
91
+ **Type:** Group Project (Teammates: Konal Puri, Aviral Khanna).
92
+ **Mrigank's Role:** Built the Frontend (React/Vite) and designed AI Prompts.
93
+ **Deployed Link:** `upes-samarth-internship.vercel.app`
94
+ **Technical Details:**
95
+ - **Attendance:** Interfaces with `navigator.mediaDevices` to capture live camera frames for attendance verification.
96
+ - **AI Assessments:** Uses "Few-Shot Chain-of-Thought" prompts to synthesize a student's daily reports into tailored interview questions.
97
+ - **UX:** Implemented "Mock" async behavior (simulated latency) to polish loading states before backend integration.
98
+
99
+ ### PROJECT_7_DEEP_DIVE: Better LinkedIn
100
+ **Type:** Solo Project.
101
+ **Deployed Link:** `better-linked-in.vercel.app`
102
+ **Technical "Flex":**
103
+ - **Performance:** Uses `react-window` for list virtualization (rendering only visible posts) to handle infinite feeds without lag.
104
+ - **Custom Hooks:** Built `useStickySidebar` to mathematically calculate sticky positioning when CSS `position: sticky` fails in complex layouts.
105
+
106
+ ### LEADERSHIP_AND_SOFT_SKILLS
107
+ **Leadership:**
108
+ - **Role:** Joint Events Head at UPES ACM-W Student Chapter.
109
+ - **Impact:** Organized 10+ events. Co-Convener for "Prodigy'25" Tech Fest, managing 1400+ participants and introducing 500+ freshers to tech.
110
+ - **Philosophy:** "Everyone wants to be heard and feel like they are contributing to a goal."
111
+ - **Conflict Resolution:** Approaches disagreements by presenting a logical case with supporting reasons.
112
+
113
+ **Internship Experience:**
114
+ - **Shramik Bharti NGO:** Maintained website and internal tools. Gained appreciation for grassroots social impact.
115
+
116
+ **Why 7.8 CGPA?**
117
+ - Mrigank prioritized hands-on innovation (building 7+ projects, filing 3 patents) over rote academic memorization.
models.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ from dotenv import load_dotenv
3
+ from google import genai
4
+
5
+ load_dotenv()
6
+
7
+ client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))
8
+
9
+ print("Fetching available models...")
10
+ try:
11
+ # List all models and just print their names
12
+ # (The SDK returns an iterator, so we loop through it)
13
+ for m in client.models.list():
14
+ print(f"found: {m.name}")
15
+
16
+ except Exception as e:
17
+ print(f"❌ Error: {e}")
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ fastapi==0.115.0
2
+ uvicorn==0.30.6
3
+ python-dotenv==1.0.1
4
+ google-genai==1.0.0
5
+ pinecone-client==5.0.1
6
+ pydantic==2.9.2
scripts/ingest.py ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ from pathlib import Path
4
+
5
+ # Add parent directory to path for imports
6
+ sys.path.insert(0, str(Path(__file__).parent.parent))
7
+
8
+ from dotenv import load_dotenv
9
+ from google import genai
10
+ from pinecone import Pinecone
11
+
12
+ load_dotenv()
13
+
14
+ # Initialize clients
15
+ pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
16
+ index = pc.Index("portfolio-chat")
17
+ client = genai.Client(api_key=os.getenv("GOOGLE_API_KEY"))
18
+
19
+ # Constants
20
+ EMBEDDING_MODEL = "gemini-embedding-001"
21
+ EMBEDDING_DIMENSION = 768
22
+ DATA_FILE = Path(__file__).parent.parent / "data" / "knowledge_base.txt"
23
+
24
+
25
+ def get_embedding(text: str) -> list[float]:
26
+ """Generate embedding for a given text."""
27
+ response = client.models.embed_content(
28
+ model=EMBEDDING_MODEL,
29
+ contents=text,
30
+ config={
31
+ "output_dimensionality": EMBEDDING_DIMENSION
32
+ }
33
+ )
34
+ return response.embeddings[0].values
35
+
36
+
37
+ def load_and_chunk(file_path: Path) -> list[str]:
38
+ """Load text file and split into chunks by double newlines."""
39
+ with open(file_path, "r", encoding="utf-8") as f:
40
+ content = f.read()
41
+
42
+ # Split by double newlines
43
+ chunks = [chunk.strip() for chunk in content.split("\n\n") if chunk.strip()]
44
+ return chunks
45
+
46
+
47
+ def main():
48
+ print("=" * 50)
49
+ print("Portfolio Knowledge Base Ingestion Script")
50
+ print("=" * 50)
51
+
52
+ # Step 1: Load and chunk the data
53
+ print(f"\n[1/3] Loading data from: {DATA_FILE}")
54
+
55
+ if not DATA_FILE.exists():
56
+ print(f"ERROR: File not found: {DATA_FILE}")
57
+ sys.exit(1)
58
+
59
+ chunks = load_and_chunk(DATA_FILE)
60
+ print(f" Loaded {len(chunks)} chunks")
61
+
62
+ # Step 2: Generate embeddings and prepare vectors
63
+ print(f"\n[2/3] Generating embeddings...")
64
+ vectors = []
65
+
66
+ for i, chunk in enumerate(chunks):
67
+ print(f" Processing chunk {i + 1}/{len(chunks)}...", end="\r")
68
+
69
+ embedding = get_embedding(chunk)
70
+ vectors.append({
71
+ "id": str(i),
72
+ "values": embedding,
73
+ "metadata": {"text": chunk}
74
+ })
75
+
76
+ print(f" Generated {len(vectors)} embeddings" + " " * 20)
77
+
78
+ # Step 3: Upsert to Pinecone
79
+ print(f"\n[3/3] Upserting to Pinecone...")
80
+
81
+ # Upsert in batches of 100 (Pinecone best practice)
82
+ batch_size = 100
83
+ for i in range(0, len(vectors), batch_size):
84
+ batch = vectors[i:i + batch_size]
85
+ index.upsert(vectors=batch)
86
+ print(f" Upserted batch {i // batch_size + 1}")
87
+
88
+ print("\n" + "=" * 50)
89
+ print("SUCCESS: Knowledge base ingested!")
90
+ print(f"Total vectors: {len(vectors)}")
91
+ print("=" * 50)
92
+
93
+
94
+ if __name__ == "__main__":
95
+ main()