sliitguy commited on
Commit
9c17f52
·
0 Parent(s):

updated corrections

Browse files
Files changed (10) hide show
  1. .github/workflows/main.yaml +46 -0
  2. .gitignore +4 -0
  3. Dockerfile +14 -0
  4. README.md +10 -0
  5. README2.md +116 -0
  6. app.py +467 -0
  7. max.py +199 -0
  8. model.py +6 -0
  9. nivakaran.pdf +0 -0
  10. requirements.txt +15 -0
.github/workflows/main.yaml ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Sync to Hugging Face Space
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+ - master
8
+ workflow_dispatch:
9
+
10
+ jobs:
11
+ sync-to-hub:
12
+ runs-on: ubuntu-latest
13
+
14
+ steps:
15
+ - name: Checkout repository
16
+ uses: actions/checkout@v3
17
+ with:
18
+ fetch-depth: 0
19
+ lfs: true
20
+
21
+ - name: Setup Git LFS
22
+ run: |
23
+ git lfs install
24
+ git lfs pull
25
+
26
+ - name: Configure Git
27
+ run: |
28
+ git config --global user.email "github-actions[bot]@users.noreply.github.com"
29
+ git config --global user.name "github-actions[bot]"
30
+
31
+ - name: Push to Hugging Face Space
32
+ env:
33
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
34
+ HF_USERNAME: nivakaran
35
+ HF_SPACE: max
36
+ run: |
37
+ git remote add space https://$HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/$HF_USERNAME/$HF_SPACE || true
38
+ git push --force space HEAD:main
39
+
40
+ - name: Verify Sync
41
+ if: success()
42
+ run: echo "Successfully synced to Hugging Face Space!"
43
+
44
+ - name: Sync Failed
45
+ if: failure()
46
+ run: echo "Failed to sync to Hugging Face Space. Check logs above."
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ .env
2
+ venv
3
+ venv/
4
+ portfolio.db/
Dockerfile ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ WORKDIR /app
4
+
5
+ COPY requirements.txt .
6
+ RUN pip install --no-cache-dir -r requirements.txt
7
+
8
+ COPY . .
9
+
10
+ ENV TRANSFORMERS_CACHE=/tmp
11
+
12
+ EXPOSE 7860
13
+
14
+ CMD ["uvicorn", "max:app", "--host", "0.0.0.0", "--port", "7860"]
README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Max
3
+ emoji: 📈
4
+ colorFrom: gray
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ short_description: This is my portfolio assistant
10
+ ---
README2.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Max: AI-Powered Developer Portfolio Chatbot
2
+
3
+ ![Python](https://img.shields.io/badge/python-3.8%2B-blue) ![FastAPI](https://img.shields.io/badge/FastAPI-%5E0.75-green) ![LangChain](https://img.shields.io/badge/LangChain-latest-orange) ![License](https://img.shields.io/badge/license-MIT-lightgrey)
4
+
5
+ ---
6
+
7
+ ## 🚀 Project Overview
8
+
9
+ **Max Portfolio Assistant** is an advanced, production-ready AI chatbot API designed to deliver *contextual, precise, and dynamic* answers about Nivakaran’s professional portfolio. Leveraging the power of retrieval-augmented generation (RAG) combined with vector-based semantic search, this system intelligently interprets user queries and fetches relevant information directly from portfolio documents, including detailed PDFs.
10
+
11
+ This is not just another chatbot — it’s a *smart assistant* engineered to showcase developer expertise with real-world, scalable architecture.
12
+
13
+ ---
14
+
15
+ ## 🔥 Why This Project Matters
16
+
17
+ - **Real-world AI Integration:** Implements cutting-edge LLM orchestration using LangChain with HuggingFace embeddings and Groq’s hosted large language model.
18
+ - **Production-Grade API:** Built on FastAPI with clear REST endpoints, session management, and CORS ready for seamless frontend integration.
19
+ - **Multi-turn Dialogue:** Maintains chat history context for fluid conversations — no robotic one-off answers.
20
+ - **Efficient Document Retrieval:** Processes and indexes large PDFs into vector embeddings enabling lightning-fast semantic search.
21
+ - **Clean Code & Logging:** Structured with robust error handling and logging — ready for maintenance and scaling.
22
+ - **Portfolio Showcase:** Serves as a unique interactive gateway into the developer’s skills, projects, and professional story.
23
+
24
+ ---
25
+
26
+ ## 🛠 Technical Highlights
27
+
28
+ | Feature | Details |
29
+ |--------------------------------|------------------------------------------------------------------------------------------|
30
+ | Language Model | Groq ChatGroq (Deepseek-R1-Distill-Llama-70b) |
31
+ | Embeddings | HuggingFace `all-MiniLM-L6-v2` for semantic vector search |
32
+ | Vector Store | Chroma DB for fast, persistent vector retrieval |
33
+ | Document Loader & Splitter | `PyPDFLoader` + `RecursiveCharacterTextSplitter` for handling large portfolio PDFs |
34
+ | API Framework | FastAPI with async support, CORS, and automatic Swagger docs |
35
+ | Session Management | Session-aware chat history using LangChain's `ChatMessageHistory` |
36
+ | Environment & Config | `.env` managed tokens and paths for secure, flexible deployment |
37
+ | Logging & Error Handling | Python `logging` with clear error responses for production debugging |
38
+
39
+ ---
40
+
41
+ ## 📦 Installation & Setup
42
+
43
+ Setup Project
44
+ ```bash
45
+ git clone https://github.com/yourusername/max-portfolio-assistant.git
46
+ cd max-portfolio-assistant
47
+ python -m venv venv
48
+ source venv/bin/activate # Windows: venv\Scripts\activate
49
+ pip install -r requirements.txt
50
+ ```
51
+
52
+ Create .env file with the following variables:
53
+ ```bash
54
+ HF_TOKEN=your_huggingface_api_token
55
+ GROQ_API_KEY=your_groq_api_key
56
+ PDF_PATH=path/to/your/portfolio.pdf
57
+ HOST=0.0.0.0
58
+ PORT=5000
59
+
60
+ ```
61
+
62
+ Run the server:
63
+ ```bash
64
+ uvicorn main:app --host $HOST --port $PORT --reload
65
+ ```
66
+
67
+ ## 🔍 How to Use
68
+ - Ask the assistant: Send POST requests to /ask with JSON body:
69
+ ```bash
70
+
71
+ {
72
+ "session_id": "unique-session-uuid",
73
+ "question": "What are Nivakaran's main technical skills?"
74
+ }
75
+
76
+ ```
77
+ - Get answers: The AI responds with precise, context-aware answers sourced directly from the portfolio.
78
+ - Maintain sessions: Use consistent session_id values to keep chat history context intact.
79
+
80
+ ## 🧠 Architecture Overview
81
+ 1. PDF Processing: Portfolio PDF is loaded and split into manageable chunks.
82
+ 2. Vectorization: Text chunks converted to semantic vectors via HuggingFace embeddings.
83
+ 3. Indexing: Chroma database stores vectors for similarity search.
84
+ 4. Query Handling: Incoming questions are reformulated based on chat history.
85
+ 5. Retrieval & Generation: System retrieves relevant document chunks and generates an answer using Groq LLM.
86
+ 6. Session Management: Multi-turn dialogue history tracked to ensure coherent conversations.
87
+
88
+ ## 🎯 Impact & Use Cases
89
+ - Personal Branding: Transform your static portfolio into an interactive, AI-powered experience.
90
+ - Recruiter Friendly: Instant access to precise answers about skills and projects—no browsing required.
91
+ - Tech Demonstration: Showcases expertise in AI integration, API design, and modern NLP pipelines.
92
+ - Scalable Architecture: Easily extend to multiple domains or add new data sources.
93
+
94
+ ## 📚 Technologies & Tools
95
+ - Python 3.8+
96
+ - FastAPI (ASGI web framework)
97
+ - LangChain (LLM orchestration)
98
+ - Groq ChatGroq (LLM inference)
99
+ - HuggingFace Embeddings (all-MiniLM-L6-v2)
100
+ - Chroma Vector Database
101
+ - PyPDFLoader & RecursiveCharacterTextSplitter
102
+ - Pydantic for request validation
103
+ - Uvicorn ASGI server
104
+ - Python Logging
105
+
106
+ ## 🤝 Contributing
107
+ Contributions and improvements are welcome! Feel free to open issues or submit PRs for:
108
+ - Adding new document types
109
+ - Enhancing conversation flow
110
+ - Improving deployment (Docker/K8s)
111
+ - Optimizing vector search performance
112
+
113
+
114
+ ## ⚡ Final Notes
115
+ Max Portfolio Assistant is not just a chatbot, it’s a showcase of how to leverage modern AI, NLP, and backend engineering skills to create real, usable developer portfolio experiences that recruiters notice and remember.
116
+
app.py ADDED
@@ -0,0 +1,467 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import io
3
+ import json
4
+ import re
5
+ import logging
6
+ import tempfile
7
+ import base64
8
+ from uuid import uuid4
9
+ from typing import Optional, List
10
+ from fastapi import FastAPI, UploadFile, File, HTTPException
11
+ from fastapi.responses import JSONResponse
12
+ from fastapi.middleware.cors import CORSMiddleware
13
+ from pydantic import BaseModel
14
+ from dotenv import load_dotenv
15
+ from langchain.chains import create_history_aware_retriever, create_retrieval_chain
16
+ from langchain.chains.combine_documents import create_stuff_documents_chain
17
+ from langchain_mongodb.chat_message_histories import MongoDBChatMessageHistory
18
+ from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
19
+ from langchain_core.documents import Document
20
+ from langchain_groq import ChatGroq
21
+ from langchain_huggingface import HuggingFaceEmbeddings
22
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
23
+ from langchain_community.document_loaders import PyPDFLoader
24
+ from langchain_chroma import Chroma
25
+ from pymongo import MongoClient
26
+
27
+
28
+ # Alternative PDF libraries for fallback
29
+ try:
30
+ from pypdf import PdfReader
31
+ PYPDF_AVAILABLE = True
32
+ except ImportError:
33
+ PYPDF_AVAILABLE = False
34
+
35
+ try:
36
+ import fitz # PyMuPDF
37
+ PYMUPDF_AVAILABLE = True
38
+ except ImportError:
39
+ PYMUPDF_AVAILABLE = False
40
+
41
+
42
+ # Configure logging
43
+ logging.basicConfig(level=logging.INFO)
44
+ logger = logging.getLogger(__name__)
45
+
46
+
47
+ # Load environment variables
48
+ load_dotenv()
49
+ HF_TOKEN = os.getenv("HF_TOKEN")
50
+ GROQ_API_KEY = os.getenv("GROQ_API_KEY")
51
+ MONGODB_URL = os.getenv("MONGODB_URL")
52
+ MONGODB_DATABASE = os.getenv("MONGODB_DATABASE", "test")
53
+
54
+ # Parse collections as a list from comma-separated string in .env
55
+ collections_env = os.getenv("MONGODB_COLLECTION", "blogs")
56
+ MONGODB_COLLECTIONS = [col.strip() for col in collections_env.split(",") if col.strip()]
57
+
58
+ HOST = os.getenv("HOST", "0.0.0.0")
59
+ PORT = int(os.getenv("PORT", 5000))
60
+ PDF_PATH = os.getenv("PDF_PATH", "./nivakaran.pdf")
61
+
62
+
63
+ # Validate environment variables
64
+ if not all([HF_TOKEN, GROQ_API_KEY, PDF_PATH, MONGODB_URL]):
65
+ logger.error("Missing required environment variables")
66
+ raise RuntimeError("Environment variables not set. Please check HF_TOKEN, GROQ_API_KEY, PDF_PATH, and MONGODB_URL")
67
+
68
+
69
+ # Initialize MongoDB client
70
+ try:
71
+ mongo_client = MongoClient(MONGODB_URL)
72
+ mongo_client.admin.command('ping')
73
+ logger.info("MongoDB connection successful")
74
+ except Exception as e:
75
+ logger.error(f"Failed to connect to MongoDB: {str(e)}")
76
+ raise RuntimeError("MongoDB connection failed")
77
+
78
+
79
+ # Initialize FastAPI app
80
+ app = FastAPI(
81
+ title="Portfolio API",
82
+ description="Chatbot for Nivakaran's Portfolio.",
83
+ version="1.0.0",
84
+ )
85
+
86
+
87
+ # Configure CORS
88
+ app.add_middleware(
89
+ CORSMiddleware,
90
+ allow_origins=["*"],
91
+ allow_credentials=True,
92
+ allow_methods=["GET", "POST", "DELETE"],
93
+ allow_headers=["*"],
94
+ )
95
+
96
+
97
+ # Initialize RAG components
98
+ embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
99
+ llm = ChatGroq(model_name="openai/gpt-oss-20b")
100
+
101
+
102
+ def extract_text_with_pypdf(file_path: str) -> List[Document]:
103
+ """Extract text using pypdf library directly"""
104
+ try:
105
+ reader = PdfReader(file_path)
106
+ documents = []
107
+
108
+ for page_num, page in enumerate(reader.pages):
109
+ text = page.extract_text()
110
+ if text.strip(): # Only add non-empty pages
111
+ doc = Document(
112
+ page_content=text,
113
+ metadata={"source": file_path, "page": page_num}
114
+ )
115
+ documents.append(doc)
116
+
117
+ logger.info(f"pypdf extracted text from {len(documents)} pages")
118
+ return documents
119
+ except Exception as e:
120
+ logger.error(f"pypdf extraction failed: {str(e)}")
121
+ return []
122
+
123
+
124
+ def extract_text_with_pymupdf(file_path: str) -> List[Document]:
125
+ """Extract text using PyMuPDF (fitz) library - often better for complex PDFs"""
126
+ try:
127
+ doc = fitz.open(file_path)
128
+ documents = []
129
+
130
+ for page_num in range(len(doc)):
131
+ page = doc.load_page(page_num)
132
+ text = page.get_text()
133
+ if text.strip(): # Only add non-empty pages
134
+ document = Document(
135
+ page_content=text,
136
+ metadata={"source": file_path, "page": page_num}
137
+ )
138
+ documents.append(document)
139
+
140
+ doc.close()
141
+ logger.info(f"PyMuPDF extracted text from {len(documents)} pages")
142
+ return documents
143
+ except Exception as e:
144
+ logger.error(f"PyMuPDF extraction failed: {str(e)}")
145
+ return []
146
+
147
+
148
+ def process_pdf(file_path: str):
149
+ """Process PDF with multiple fallback methods for robust text extraction"""
150
+ try:
151
+ # Check if file exists
152
+ if not os.path.exists(file_path):
153
+ raise FileNotFoundError(f"PDF file not found at: {file_path}")
154
+
155
+ logger.info(f"Processing PDF from: {file_path}")
156
+ documents = []
157
+
158
+ # Method 1: Try LangChain's PyPDFLoader (uses pypdf internally)
159
+ try:
160
+ logger.info("Attempting extraction with PyPDFLoader...")
161
+ loader = PyPDFLoader(file_path)
162
+ documents = loader.load()
163
+
164
+ if documents and any(doc.page_content.strip() for doc in documents):
165
+ logger.info(f"PyPDFLoader successfully loaded {len(documents)} pages")
166
+ else:
167
+ documents = []
168
+ logger.warning("PyPDFLoader returned empty documents")
169
+ except Exception as e:
170
+ logger.warning(f"PyPDFLoader failed: {str(e)}")
171
+
172
+ # Method 2: Try direct pypdf if available and previous method failed
173
+ if not documents and PYPDF_AVAILABLE:
174
+ logger.info("Attempting extraction with pypdf directly...")
175
+ documents = extract_text_with_pypdf(file_path)
176
+
177
+ # Method 3: Try PyMuPDF as fallback (often best for complex PDFs)
178
+ if not documents and PYMUPDF_AVAILABLE:
179
+ logger.info("Attempting extraction with PyMuPDF (fitz)...")
180
+ documents = extract_text_with_pymupdf(file_path)
181
+
182
+ # Validate that documents were loaded
183
+ if not documents:
184
+ raise ValueError(
185
+ "Failed to extract text from PDF with all available methods. "
186
+ "The PDF might be:\n"
187
+ "1. Empty or corrupted\n"
188
+ "2. Password-protected\n"
189
+ "3. Scanned images without OCR (consider using pytesseract)\n"
190
+ "4. Using unsupported encryption"
191
+ )
192
+
193
+ # Check if any text was actually extracted
194
+ total_text = "".join([doc.page_content for doc in documents])
195
+ if not total_text.strip():
196
+ raise ValueError("No text content found in PDF. It may contain only images.")
197
+
198
+ logger.info(f"Successfully extracted {len(total_text)} characters from {len(documents)} pages")
199
+
200
+ # Split documents into chunks
201
+ text_splitter = RecursiveCharacterTextSplitter(
202
+ chunk_size=5000,
203
+ chunk_overlap=500,
204
+ length_function=len,
205
+ separators=["\n\n", "\n", ". ", " ", ""]
206
+ )
207
+ splits = text_splitter.split_documents(documents)
208
+
209
+ # Filter out empty chunks
210
+ splits = [doc for doc in splits if doc.page_content.strip()]
211
+
212
+ if not splits:
213
+ raise ValueError("Text splitting resulted in zero valid chunks.")
214
+
215
+ logger.info(f"Created {len(splits)} text chunks for vectorization")
216
+
217
+ # Create vectorstore
218
+ vectorstore = Chroma.from_documents(
219
+ documents=splits,
220
+ embedding=embeddings,
221
+ persist_directory="./portfolio.db"
222
+ )
223
+
224
+ logger.info("Vectorstore created successfully")
225
+ return vectorstore
226
+
227
+ except FileNotFoundError as e:
228
+ logger.error(f"File not found: {str(e)}")
229
+ raise RuntimeError(f"PDF file not found: {str(e)}")
230
+ except ValueError as e:
231
+ logger.error(f"Invalid PDF content: {str(e)}")
232
+ raise RuntimeError(f"PDF processing failed: {str(e)}")
233
+ except Exception as e:
234
+ logger.error(f"Unexpected error processing PDF: {str(e)}", exc_info=True)
235
+ raise RuntimeError(f"PDF processing failed: {str(e)}")
236
+
237
+
238
+ def get_session_histories(session_id: str) -> List[MongoDBChatMessageHistory]:
239
+ """Get list of MongoDB chat message histories for a session from all collections"""
240
+ histories = []
241
+ for col in MONGODB_COLLECTIONS:
242
+ history = MongoDBChatMessageHistory(
243
+ connection_string=MONGODB_URL,
244
+ session_id=session_id,
245
+ database_name=MONGODB_DATABASE,
246
+ collection_name=col,
247
+ create_index=True
248
+ )
249
+ histories.append(history)
250
+ return histories
251
+
252
+
253
+ def merge_histories(histories: List[MongoDBChatMessageHistory]) -> List:
254
+ """Merge messages from multiple histories sorted by creation time if available"""
255
+ all_messages = []
256
+ for history in histories:
257
+ all_messages.extend(history.messages)
258
+ # Sort by timestamp or insertion order if 'created_at' attribute exists
259
+ all_messages.sort(key=lambda msg: getattr(msg, 'created_at', 0))
260
+ return all_messages
261
+
262
+
263
+ # Initialize vectorstore
264
+ try:
265
+ logger.info(f"Initializing vectorstore from PDF: {PDF_PATH}")
266
+ vectorstore = process_pdf(PDF_PATH)
267
+ retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
268
+ logger.info("Vectorstore initialized successfully")
269
+ except Exception as e:
270
+ logger.error(f"Vectorstore initialization failed: {str(e)}")
271
+ logger.error("\nTroubleshooting steps:")
272
+ logger.error("1. Verify PDF file exists at the specified path")
273
+ logger.error("2. Ensure PDF contains extractable text (not just scanned images)")
274
+ logger.error("3. Check if PDF is password-protected")
275
+ logger.error("4. Try opening the PDF manually to verify it's not corrupted")
276
+ logger.error("\nInstall additional libraries for better PDF support:")
277
+ logger.error(" pip install pypdf pymupdf")
278
+ raise RuntimeError(f"Vectorstore initialization failed: {str(e)}")
279
+
280
+
281
+ class QuestionRequest(BaseModel):
282
+ session_id: str
283
+ question: str
284
+
285
+
286
+ class QuestionResponse(BaseModel):
287
+ answer: str
288
+
289
+
290
+ class SessionHistoryRequest(BaseModel):
291
+ session_id: str
292
+
293
+
294
+ class SessionHistoryResponse(BaseModel):
295
+ session_id: str
296
+ message_count: int
297
+ messages: List[dict]
298
+
299
+
300
+ @app.post(
301
+ "/ask",
302
+ response_model=QuestionResponse,
303
+ summary="Ask the Nivakaran's portfolio assistant",
304
+ description="Submit a question to learn about nivakaran's projects, and so on."
305
+ )
306
+ async def ask_question(request: QuestionRequest):
307
+ """Handle question and maintain chat history in MongoDB across multiple collections"""
308
+ session_id = request.session_id
309
+ question = request.question
310
+ logger.info(f"Received question for session {session_id}: {question}")
311
+
312
+ try:
313
+ # Get chat histories from all collections
314
+ histories = get_session_histories(session_id)
315
+ all_messages = merge_histories(histories)
316
+
317
+ # Keep last 6 messages for chat history context
318
+ last_messages = all_messages[-6:] if len(all_messages) > 6 else all_messages
319
+
320
+ # Extract full session context text from all messages
321
+ session_context_text = "\n".join(
322
+ [msg.content for msg in all_messages if hasattr(msg, "content") and msg.content.strip()]
323
+ )
324
+
325
+ # System prompt now expects {context} as input variable
326
+ system_prompt = """You are Max, a friendly and professional chatbot designed to
327
+ assist visitors to Nivakaran’s portfolio website. Your primary goal
328
+ is to provide accurate, clear, and helpful information about Nivakaran, based
329
+ on the following context:
330
+
331
+ {context}
332
+
333
+ Your responses should be:
334
+ 1. Informative and relevant, directly addressing the visitor’s questions about Nivakaran’s skills,
335
+ projects, experience, and background.
336
+ 2. Concise but thorough enough to give visitors a clear understanding of Nivakaran’s expertise.
337
+ 3. Engaging and approachable, maintaining a professional yet conversational tone.
338
+ 4. Honest about what is available in the provided context; if you don’t know an answer, politely
339
+ say so and suggest the visitor explore other sections of the portfolio or contact Nivakaran directly.
340
+ 5. Focused on helping visitors understand Nivakaran’s capabilities and what makes him stand out
341
+ as a developer and professional.
342
+ 6. Ready to provide examples, explanations, or links to portfolio projects when relevant.
343
+
344
+ Avoid providing generic or unrelated information. Always tailor your answers to
345
+ highlight Nivakaran’s strengths and the unique value he brings.
346
+ """
347
+
348
+ # Create ChatPromptTemplate with variables {context} and {input}, plus chat_history placeholder
349
+ qa_prompt = ChatPromptTemplate.from_messages([
350
+ ("system", system_prompt),
351
+ MessagesPlaceholder("chat_history"),
352
+ ("human", "{input}")
353
+ ])
354
+
355
+ question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
356
+
357
+ history_aware_retriever = create_history_aware_retriever(
358
+ llm, retriever, ChatPromptTemplate.from_messages([
359
+ ("system", "Rephrase the user's question considering the chat history to provide better context."),
360
+ MessagesPlaceholder("chat_history"),
361
+ ("human", "{input}")
362
+ ])
363
+ )
364
+
365
+ rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
366
+
367
+ # Invoke RAG chain passing question, full context text, and last 6 chat messages
368
+ result = rag_chain.invoke({
369
+ "input": question,
370
+ "context": session_context_text,
371
+ "chat_history": last_messages
372
+ })
373
+ raw_answer = result["answer"]
374
+
375
+ # Clean answer by removing any <think>...</think> blocks
376
+ cleaned_answer = re.sub(r"<think>.*?</think>\s*", "", raw_answer, flags=re.DOTALL).strip()
377
+
378
+ # Add user question and AI response to all histories (all collections)
379
+ for history in histories:
380
+ history.add_user_message(question)
381
+ history.add_ai_message(cleaned_answer)
382
+
383
+ logger.info(f"Response saved to MongoDB for session {session_id}")
384
+ return QuestionResponse(answer=cleaned_answer)
385
+
386
+ except Exception as e:
387
+ logger.error(f"Error processing question: {str(e)}")
388
+ raise HTTPException(status_code=500, detail=f"Processing failed: {str(e)}")
389
+
390
+
391
+ @app.post("/history", response_model=SessionHistoryResponse)
392
+ async def get_history(request: SessionHistoryRequest):
393
+ """Retrieve chat history for a session from all collections"""
394
+ try:
395
+ histories = get_session_histories(request.session_id)
396
+ all_messages = merge_histories(histories)
397
+ messages_dict = [{"type": msg.type, "content": msg.content} for msg in all_messages]
398
+ return SessionHistoryResponse(
399
+ session_id=request.session_id,
400
+ message_count=len(all_messages),
401
+ messages=messages_dict
402
+ )
403
+ except Exception as e:
404
+ logger.error(f"Error retrieving history: {str(e)}")
405
+ raise HTTPException(status_code=500, detail=f"Failed to retrieve history: {str(e)}")
406
+
407
+
408
+ @app.delete("/history/{session_id}")
409
+ async def clear_history(session_id: str):
410
+ """Clear chat history for a session from all collections"""
411
+ try:
412
+ histories = get_session_histories(session_id)
413
+ for history in histories:
414
+ history.clear()
415
+ logger.info(f"Cleared history for session {session_id}")
416
+ return {"message": f"History cleared for session {session_id}"}
417
+ except Exception as e:
418
+ logger.error(f"Error clearing history: {str(e)}")
419
+ raise HTTPException(status_code=500, detail=f"Failed to clear history: {str(e)}")
420
+
421
+
422
+ @app.get("/health")
423
+ async def health_check():
424
+ """Health check endpoint"""
425
+ try:
426
+ mongo_client.admin.command('ping')
427
+ mongo_status = "connected"
428
+ except Exception as e:
429
+ mongo_status = f"disconnected: {str(e)}"
430
+
431
+ return {
432
+ "status": "healthy",
433
+ "app": "Nivakaran's Portfolio Assistant",
434
+ "mongodb": mongo_status,
435
+ "vectorstore": "initialized" if vectorstore else "not initialized",
436
+ "pdf_libraries": {
437
+ "pypdf": PYPDF_AVAILABLE,
438
+ "pymupdf": PYMUPDF_AVAILABLE
439
+ }
440
+ }
441
+
442
+
443
+ @app.get("/")
444
+ async def root():
445
+ return {
446
+ "message": "Welcome to Nivakaran's Portfolio API",
447
+ "description": "Learn about reforestation, tree planting, and environmental conservation",
448
+ "endpoints": {
449
+ "ask_question": "/ask",
450
+ "get_history": "/history",
451
+ "clear_history": "/history/{session_id}",
452
+ "health_check": "/health",
453
+ "documentation": "/docs"
454
+ }
455
+ }
456
+
457
+
458
+ @app.on_event("shutdown")
459
+ async def shutdown_event():
460
+ """Close MongoDB connection"""
461
+ mongo_client.close()
462
+ logger.info("MongoDB connection closed")
463
+
464
+
465
+ if __name__ == "__main__":
466
+ import uvicorn
467
+ uvicorn.run(app, host=HOST, port=PORT)
max.py ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import io
3
+ import json
4
+ import re
5
+ import logging
6
+ import tempfile
7
+ import base64
8
+ from uuid import uuid4
9
+ from typing import Optional, List
10
+ from fastapi import FastAPI, UploadFile, File, HTTPException
11
+ from fastapi.responses import JSONResponse
12
+ from fastapi.middleware.cors import CORSMiddleware
13
+ from pydantic import BaseModel
14
+ from dotenv import load_dotenv
15
+ from langchain.chains import create_history_aware_retriever, create_retrieval_chain
16
+ from langchain.chains.combine_documents import create_stuff_documents_chain
17
+ from langchain_community.chat_message_histories import ChatMessageHistory
18
+ from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
19
+ from langchain_groq import ChatGroq
20
+ from langchain_huggingface import HuggingFaceEmbeddings
21
+ from langchain_text_splitters import RecursiveCharacterTextSplitter
22
+ from langchain_community.document_loaders import PyPDFLoader
23
+ from langchain_chroma import Chroma
24
+ from langchain.tools import Tool
25
+
26
+
27
+ # Configure logging
28
+ logging.basicConfig(level=logging.INFO)
29
+ logger = logging.getLogger(__name__)
30
+
31
+ # Load environment variables
32
+ load_dotenv()
33
+ HF_TOKEN = os.getenv("HF_TOKEN")
34
+ GROQ_API_KEY = os.getenv("GROQ_API_KEY")
35
+ HOST = os.getenv("HOST", "0.0.0.0")
36
+ PORT = int(os.getenv("PORT", 5000))
37
+ PDF_PATH = os.getenv("PDF_PATH", "nivakaran.pdf")
38
+
39
+ # Validate environment variables
40
+ if not all([HF_TOKEN, GROQ_API_KEY, PDF_PATH]):
41
+ logger.error("Missing required environment variables")
42
+ raise RuntimeError("Environment variables not set")
43
+
44
+ # Initialize FastAPI app
45
+ app = FastAPI(
46
+ title="Portfolio API",
47
+ description="API for Nivakaran's portfolio",
48
+ version="1.0.0",
49
+ )
50
+
51
+ # Configure CORS
52
+ app.add_middleware(
53
+ CORSMiddleware,
54
+ allow_origins=["*"], # Restrict to specific origins in production
55
+ allow_credentials=True,
56
+ allow_methods=["GET", "POST"],
57
+ allow_headers=["*"],
58
+ )
59
+
60
+ # Initialize RAG components
61
+ embeddings = HuggingFaceEmbeddings(model_name="./local_model")
62
+ llm = ChatGroq(model_name="Deepseek-R1-Distill-Llama-70b")
63
+ session_store = {}
64
+
65
+ def process_pdf(file_path: str):
66
+ try:
67
+ loader = PyPDFLoader(file_path)
68
+ documents = loader.load()
69
+ text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=500)
70
+ splits = text_splitter.split_documents(documents)
71
+ vectorstore = Chroma.from_documents(
72
+ documents=splits,
73
+ embedding=embeddings,
74
+ persist_directory="./portfolio.db"
75
+ )
76
+ logger.info(f"PDF {file_path} processed successfully")
77
+ return vectorstore
78
+ except Exception as e:
79
+ logger.error(f"Failed to process PDF: {str(e)}")
80
+ raise RuntimeError("PDF processing failed")
81
+
82
+ # Initialize vectorstore
83
+ try:
84
+ vectorstore = process_pdf(PDF_PATH)
85
+ retriever = vectorstore.as_retriever()
86
+ logger.info("Vectorstore initialized successfully")
87
+ except Exception as e:
88
+ logger.error(f"Vectorstore initialization failed: {str(e)}")
89
+ raise RuntimeError("Vectorstore initialization failed")
90
+
91
+
92
+ class QuestionRequest(BaseModel):
93
+ session_id: str
94
+ question: str
95
+
96
+ class QuestionResponse(BaseModel):
97
+ answer: str
98
+
99
+
100
+ @app.post(
101
+ "/ask",
102
+ response_model=QuestionResponse,
103
+ summary="Ask the portfolio assistant",
104
+ description="Submit a question to get a reply from Max, the portfolio chatbot."
105
+ )
106
+ async def ask_question(request: QuestionRequest):
107
+ session_id = request.session_id
108
+ question = request.question
109
+ logger.info(f"Received question for session {session_id}: {question}")
110
+
111
+ try:
112
+ if session_id not in session_store:
113
+ session_store[session_id] = {
114
+ "history": ChatMessageHistory(),
115
+ "retriever": retriever
116
+ }
117
+
118
+ session = session_store[session_id]
119
+ history = session["history"]
120
+ last_messages = history.messages[-6:]
121
+
122
+ # RAG processing
123
+ contextualize_q_prompt = ChatPromptTemplate.from_messages([
124
+ ("system", "Rephrase questions considering chat history."),
125
+ MessagesPlaceholder("chat_history"),
126
+ ("human", "{input}")
127
+ ])
128
+
129
+ history_aware_retriever = create_history_aware_retriever(
130
+ llm, session["retriever"], contextualize_q_prompt
131
+ )
132
+
133
+ system_prompt = """You are Max, a friendly and professional chatbot designed to
134
+ assist visitors to Nivakaran’s portfolio website. Your primary goal
135
+ is to provide accurate, clear, and helpful information about Nivakaran, based
136
+ on the following context:
137
+
138
+ {context}
139
+
140
+ Your responses should be:
141
+ 1. Informative and relevant, directly addressing the visitor’s questions about Nivakaran’s skills,
142
+ projects, experience, and background.
143
+ 2. Concise but thorough enough to give visitors a clear understanding of Nivakaran’s expertise.
144
+ 3. Engaging and approachable, maintaining a professional yet conversational tone.
145
+ 4. Honest about what is available in the provided context; if you don’t know an answer, politely
146
+ say so and suggest the visitor explore other sections of the portfolio or contact Nivakaran directly.
147
+ 5. Focused on helping visitors understand Nivakaran’s capabilities and what makes him stand out
148
+ as a developer and professional.
149
+ 6. Ready to provide examples, explanations, or links to portfolio projects when relevant.
150
+
151
+ Avoid providing generic or unrelated information. Always tailor your answers to
152
+ highlight Nivakaran’s strengths and the unique value he brings.
153
+ """
154
+
155
+ qa_prompt = ChatPromptTemplate.from_messages([
156
+ ("system", system_prompt),
157
+ MessagesPlaceholder("chat_history"),
158
+ ("human", "{input}")
159
+ ])
160
+
161
+ question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
162
+ rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)
163
+
164
+ # Get and process response
165
+ result = rag_chain.invoke({
166
+ "input": question,
167
+ "chat_history": last_messages
168
+ })
169
+ raw_answer = result["answer"]
170
+
171
+ # Remove <think>...</think> block from answer
172
+ cleaned_answer = re.sub(r"<think>.*?</think>\s*", "", raw_answer, flags=re.DOTALL).strip()
173
+
174
+ # Update history
175
+ history.add_user_message(question)
176
+ history.add_ai_message(cleaned_answer)
177
+
178
+ logger.info(f"Cleaned response for session {session_id}: {cleaned_answer[:100]}...")
179
+ return QuestionResponse(answer=cleaned_answer)
180
+
181
+ except Exception as e:
182
+ logger.error(f"Error processing question for session {session_id}: {str(e)}")
183
+ raise HTTPException(status_code=500, detail=f"Processing failed: {str(e)}")
184
+
185
+
186
+ # Root endpoint
187
+ @app.get("/")
188
+ async def root():
189
+ return {
190
+ "message": "Welcome to the Portfolio API",
191
+ "endpoints": {
192
+ "portfolio_assistant": "/ask",
193
+ "docs": "/docs"
194
+ }
195
+ }
196
+
197
+ if __name__ == "__main__":
198
+ import uvicorn
199
+ uvicorn.run(app, host=HOST, port=PORT)
model.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # Code to save the sentence transformers locally
2
+ from sentence_transformers import SentenceTransformer
3
+
4
+ model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
5
+ model.save("local_model")
6
+
nivakaran.pdf ADDED
Binary file (54.3 kB). View file
 
requirements.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi
2
+ uvicorn
3
+ langchain==0.0.215
4
+ langchain_groq
5
+ langchain_core
6
+ langchain_community
7
+ langchain_chroma
8
+ langchain_huggingface
9
+ dotenv
10
+ pillow
11
+ pydantic
12
+ sentence_transformers
13
+ pypdf
14
+ langchain_mongodb
15
+ sentence_transformers