Spaces:

krishnadhulipalla
/

ChatBot

Running

App Files Files Community

ChatBot / backend /data /all_chunks.json

krishnadhulipalla

Updated UI & personal data

102dac3 about 1 month ago

raw

history blame contribute delete

32.4 kB

	[
	{
	"text": "[HEADER] Krishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at Cloud Systems LLC, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at Virginia Tech, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at UJR Technologies, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in LangGraph, Kubernetes, PyTorch, and MLOps.\n\nKrishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at Cloud Systems LLC, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at Virginia Tech, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at UJR Technologies, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in LangGraph, Kubernetes, PyTorch, and MLOps.",
	"metadata": {
	"source": "bio.md",
	"header": "Krishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at Cloud Systems LLC, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at Virginia Tech, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at UJR Technologies, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in LangGraph, Kubernetes, PyTorch, and MLOps.",
	"chunk_id": "bio.md_#0_9ac3944c",
	"has_header": false,
	"word_count": 83
	}
	},
	{
	"text": "[HEADER] # 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\nThis document details the architecture of Krishna Vamsi Dhulipalla’s personal AI assistant, implemented with LangGraph for orchestrated state management and tool execution. The system is designed for retrieval-augmented, memory-grounded, and multi-turn conversational intelligence, integrating OpenAI GPT-4o, Hugging Face embeddings, and cross-encoder reranking.\n\n---",
	"metadata": {
	"source": "Chatbot_Architecture_Notes.md",
	"header": "# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant",
	"chunk_id": "Chatbot_Architecture_Notes.md_#0_f30fc781",
	"has_header": true,
	"word_count": 52
	}
	},
	{
	"text": "[HEADER] # 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # 🧱 Core Components\n\n## # 1. Models & Their Roles\n\n\| Purpose \| Model Name \| Role Description \|\n\| -------------------------- \| ---------------------------------------- \| ------------------------------------------------ \|\n\| Main Chat Model \| `gpt-4o` \| Handles conversation, tool calls, and reasoning \|\n\| Retriever Embeddings \| `sentence-transformers/all-MiniLM-L6-v2` \| Embedding generation for FAISS vector search \|\n\| Cross-Encoder Reranker \| `cross-encoder/ms-marco-MiniLM-L-6-v2` \| Reranks retrieval results for semantic relevance \|\n\| BM25 Retriever \| (LangChain BM25Retriever) \| Keyword-based search complementing vector search \|\n\nAll models are bound to LangGraph StateGraph nodes for structured execution.\n\n---",
	"metadata": {
	"source": "Chatbot_Architecture_Notes.md",
	"header": "# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant",
	"chunk_id": "Chatbot_Architecture_Notes.md_#1_eb402d95",
	"has_header": true,
	"word_count": 93
	}
	},
	{
	"text": "[HEADER] # 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # 🔍 Retrieval System\n\n## # ✅ Hybrid Retrieval\n\n- FAISS Vector Search with normalized embeddings\n- BM25Retriever for lexical keyword matching\n- Combined using Reciprocal Rank Fusion (RRF)\n\n## # 📊 Reranking & Diversity\n\n1. Initial retrieval with FAISS & BM25 (top-K per retriever)\n2. Fusion via RRF scoring\n3. Cross-Encoder reranking (top-N candidates)\n4. Maximal Marginal Relevance (MMR) selection for diversity\n\n## # 🔎 Retriever Tool (`@tool retriever`)\n\n- Returns top passages with minimal duplication\n- Used in-system prompt to fetch accurate facts about Krishna\n\n---",
	"metadata": {
	"source": "Chatbot_Architecture_Notes.md",
	"header": "# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant",
	"chunk_id": "Chatbot_Architecture_Notes.md_#2_cab54fdc",
	"has_header": true,
	"word_count": 89
	}
	},
	{
	"text": "[HEADER] # 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # 🧠 Memory System\n\n## # Long-Term Memory\n\n- FAISS-based memory vector store stored at `backend/data/memory_faiss`\n- Stores conversation summaries per thread ID\n\n## # Memory Search Tool (`@tool memory_search`)\n\n- Retrieves relevant conversation snippets by semantic similarity\n- Supports thread-scoped search for contextual continuity\n\n## # Memory Write Node\n\n- After each AI response, stores `[Q]: ... [A]: ...` summary\n- Autosaves after every `MEM_AUTOSAVE_EVERY` turns or on thread end\n\n---",
	"metadata": {
	"source": "Chatbot_Architecture_Notes.md",
	"header": "# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant",
	"chunk_id": "Chatbot_Architecture_Notes.md_#3_b0899bfc",
	"has_header": true,
	"word_count": 73
	}
	},
	{
	"text": "[HEADER] # 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # 🧭 Orchestration Flow (LangGraph)\n\n```mermaid\ngraph TD\n A[START] --> B[agent node]\n B -->\|tool call\| C[tools node]\n B -->\|no tool\| D[memory_write]\n C --> B\n D --> E[END]\n```\n\n## # Nodes:\n\n- agent: Calls main LLM with conversation window + system prompt\n- tools: Executes retriever or memory search tools\n- memory_write: Persists summaries to long-term memory\n\n## # Conditional Edges:\n\n- From agent → `tools` if tool call detected\n- From agent → `memory_write` if no tool call\n\n---\n\n# # 💬 System Prompt\n\nThe assistant:\n\n- Uses retriever and memory search tools to gather facts about Krishna\n- Avoids fabrication and requests clarification when needed\n- Responds humorously when off-topic but steers back to Krishna’s expertise\n- Formats with Markdown, headings, and bullet points\n\nEmbedded Krishna’s Bio provides static grounding context.\n\n---",
	"metadata": {
	"source": "Chatbot_Architecture_Notes.md",
	"header": "# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant",
	"chunk_id": "Chatbot_Architecture_Notes.md_#4_c58f0c4c",
	"has_header": true,
	"word_count": 135
	}
	},
	{
	"text": "[HEADER] # 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # 🌐 API & Streaming\n\n- Backend: FastAPI (`backend/api.py`)\n - `/chat` SSE endpoint streams tokens in real-time\n - Passes `thread_id` & `is_final` to LangGraph for stateful conversations\n- Frontend: React + Tailwind (custom chat UI)\n - Threaded conversation storage in browser `localStorage`\n - Real-time token rendering via `EventSource`\n - Features: new chat, clear chat, delete thread, suggestions\n\n---\n\n# # 🧩 Design Improvements\n\n- LangGraph StateGraph ensures explicit control of message flow\n- Thread-scoped memory enables multi-session personalization\n- Hybrid RRF + Cross-Encoder + MMR retrieval pipeline improves relevance & diversity\n- SSE streaming for low-latency feedback\n- Decoupled retrieval and memory as separate tools for modularity",
	"metadata": {
	"source": "Chatbot_Architecture_Notes.md",
	"header": "# 🤖 Chatbot Architecture Overview: Krishna's Personal AI Assistant",
	"chunk_id": "Chatbot_Architecture_Notes.md_#5_50777b59",
	"has_header": true,
	"word_count": 108
	}
	},
	{
	"text": "[HEADER] # Education\n\n# Education\n\n# # Virginia Tech \| Master of Science, Computer Science\n\nDec 2024\n\n- GPA: 3.9/4.0\n\n# # Vel Tech University \| Bachelor of Technology, Computer Science and Engineering\n\n- GPA: 8.24/10",
	"metadata": {
	"source": "education.md",
	"header": "# Education",
	"chunk_id": "education.md_#0_ac6a482e",
	"has_header": true,
	"word_count": 33
	}
	},
	{
	"text": "[HEADER] # Professional Experience\n\n# # Software Engineer - AI Platform \| Cloud Systems LLC \| US Remote\n\nJul 2025 - Present\n\n- Agentic Workflow Automation: Architected an agentic workflow using LangGraph and ReAct to automate SQL generation. This system automated ~65% of ad-hoc data-auditing requests from internal stakeholders, reducing the average response time from 4 hours to under 2 minutes.\n- ETL Optimization: Optimized data ingestion performance by rebuilding ETL pipelines with batched I/O, incremental refresh logic, and dependency pruning, cutting daily execution runtime by 25%.\n- Infrastructure & Reliability: Improved production reliability by shipping the agent service on Kubernetes with autoscaling and rolling deploys, adding alerts and rollback steps for failed releases.\n- Contract Testing: Improved cross-service reliability by implementing Pydantic schema validation and contract tests, preventing multiple breaking changes from reaching production.",
	"metadata": {
	"source": "experience.md",
	"header": "# Professional Experience",
	"chunk_id": "experience.md_#1_33df1d29",
	"has_header": true,
	"word_count": 131
	}
	},
	{
	"text": "[HEADER] # Professional Experience\n\n# # Machine Learning Engineer \| Virginia Tech, Dept. of Plant Sciences \| Blacksburg, VA\n\nAug 2024 - Jul 2025\n\n- Model Optimization: Increased genomics sequence classification throughput by 32% by applying LoRA and soft prompting methods. Packaged repeatable PyTorch pipelines that cut per-experiment training time by 4.5 hours.\n- HPC Orchestration: Developed an ML orchestration layer for distributed GPU training on HPC clusters. Engineered checkpoint-resume logic that handled preemptive node shutdowns, optimizing resource utilization and reducing compute waste by 15%.\n- MLOps: Reduced research environment setup time from hours to minutes by containerizing fine-tuned models with Docker and managing the experimental lifecycle (versions, hyperparameters, and weights) via MLflow.",
	"metadata": {
	"source": "experience.md",
	"header": "# Professional Experience",
	"chunk_id": "experience.md_#2_a1069896",
	"has_header": true,
	"word_count": 109
	}
	},
	{
	"text": "[HEADER] # Professional Experience\n\n# # Software Engineer \| UJR Technologies Pvt Ltd \| Hyderabad, India\n\nJul 2021 - Dec 2022\n\n- API & SDK Development: Designed and maintained standardized REST APIs and Python-based SDKs to streamline the ML development lifecycle, reducing cross-team integration defects by 40%.\n- Model Serving: Engineered model-serving endpoints with automated input validation and deployment health checks, lowering prediction-related failures by 30% for ML-driven features.\n- CI/CD Pipeline: Automated CI/CD pipelines via GitHub Actions with comprehensive test coverage and scripted rollback procedures, decreasing release failures by 20% across production environments.",
	"metadata": {
	"source": "experience.md",
	"header": "# Professional Experience",
	"chunk_id": "experience.md_#3_f6eb60f5",
	"has_header": true,
	"word_count": 90
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# 🌟 Personal and Professional Goals\n\n# # ✅ Short-Term Goals (0–6 months)\n\n1. Deploy Multi-Agent Personal Chatbot\n\n - Integrate RAG-based retrieval, tool calling, and Open Source LLMs\n - Use LangChain, FAISS, BM25, and Gradio UI\n\n2. Publish Second Bioinformatics Paper\n\n - Focus: TF Binding prediction using HyenaDNA and plant genomics data\n - Venue: Submitted to MLCB\n\n3. Transition Toward Production Roles\n\n - Shift from academic research to applied roles in data engineering or ML infrastructure\n - Focus on backend, pipeline, and deployment readiness\n\n4. Accelerate Job Search\n\n - Apply to 3+ targeted roles per week (platform/data engineering preferred)\n - Tailor applications for visa-friendly, high-impact companies\n\n5. R Shiny App Enhancement\n\n - Debug gene co-expression heatmap issues and add new annotation features\n\n6. Learning & Certifications\n - Deepen knowledge in Kubernetes for ML Ops\n - Follow NVIDIA’s RAG Agent curriculum weekly\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#0_337b6890",
	"has_header": true,
	"word_count": 142
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # ⏳ Mid-Term Goals (6–12 months)\n\n1. Launch Open-Source Project\n\n - Create or contribute to ML/data tools (e.g., genomic toolkit, chatbot agent framework)\n\n2. Scale Personal Bot Capabilities\n\n - Add calendar integration, document-based Q&A, semantic memory\n\n3. Advance CI/CD and Observability Skills\n\n - Implement cloud-native monitoring and testing workflows\n\n4. Secure Full-Time Role\n - Land a production-facing role with a U.S. company offering sponsorship support\n\n---\n\n# # 🚀 Long-Term Goals (1–3 years)\n\n1. Become a Senior Data/ML Infrastructure Engineer\n\n - Work on LLM orchestration, agent systems, scalable infrastructure\n\n2. Continue Academic Contributions\n\n - Publish in bioinformatics and AI (focus: genomics + transformers)\n\n3. Launch a Research-Centered Product/Framework\n - Build an open-source or startup framework connecting genomics, LLMs, and real-time ML pipelines\n\n---\n\n# 💬 Example Conversations",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#1_bc53463d",
	"has_header": true,
	"word_count": 128
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# 💬 Example Conversations\n\n# # Q: _What interests you in data engineering?_\n\nA: I enjoy architecting scalable data systems that generate real-world insights. From optimizing ETL pipelines to deploying real-time frameworks like the genomic systems at Virginia Tech, I thrive at the intersection of automation and impact.\n\n---\n\n# # Q: _Describe a pipeline you've built._\n\nA: One example is a real-time IoT pipeline I built at VT. It processed 10,000+ sensor readings using Kafka, Airflow, and Snowflake, feeding into GPT-4 for forecasting with 91% accuracy. This reduced energy costs by 15% and improved dashboard reporting by 30%.\n\n---\n\n# # Q: _What was your most difficult debugging experience?_\n\nA: Debugging duplicate ingestion in a Kafka/Spark pipeline at UJR. I isolated misconfigurations in consumer groups, optimized Spark executors, and applied idempotent logic to reduce latency by 30%.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#2_05b5827c",
	"has_header": true,
	"word_count": 139
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # Q: _How do you handle data cleaning?_\n\nA: I ensure schema consistency, identify missing values and outliers, and use Airflow + dbt for scalable automation. For larger datasets, I optimize transformations using batch jobs or parallel compute.\n\n---\n\n# # Q: _Describe a strong collaboration experience._\n\nA: While working on cross-domain NER at Virginia Tech, I collaborated with infrastructure engineers on EC2 deployment while handling model tuning. Together, we reduced latency by 30% and improved F1-scores by 8%.\n\n---\n\n# # Q: _What tools do you use most often?_\n\nA: Python, Spark, Airflow, dbt, Kafka, and SageMaker are daily drivers. I also rely on Docker, CloudWatch, and Looker for observability and visualizations.\n\n---\n\n# # Q: _What’s a strength and weakness of yours?_\n\nA:\n\n- Strength: Turning complexity into clean, usable data flows.\n- Weakness: Over-polishing outputs, though I’m learning to better balance speed with quality.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#3_e7c4a2f9",
	"has_header": true,
	"word_count": 149
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # Q: _What’s a strength and weakness of yours?_\n\nA:\n\n- Strength: Turning complexity into clean, usable data flows.\n- Weakness: Over-polishing outputs, though I’m learning to better balance speed with quality.\n\n---\n\n# # Q: _What do you want to work on next?_\n\nA: I want to deepen my skills in production ML workflows—especially building intelligent agents and scalable pipelines that serve live products and cross-functional teams.\n\n# # How did you automate preprocessing for 1M+ biological samples?\n\nA: Sure! The goal was to streamline raw sequence processing at scale, so I used Biopython for parsing genomic formats and dbt to standardize and transform the data in a modular way. Everything was orchestrated through Apache Airflow, which let us automate the entire workflow end-to-end — from ingestion to feature extraction. We parallelized parts of the process and optimized SQL logic, which led to a 40% improvement in throughput.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#4_ffdd8b09",
	"has_header": true,
	"word_count": 151
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # What kind of semantic search did you build using LangChain and Pinecone?\n\nA: We built a vector search pipeline tailored to genomic research papers and sequence annotations. I used LangChain to create embeddings and chain logic, and stored those in Pinecone for fast similarity-based retrieval. It supported both question-answering over domain-specific documents and similarity search, helping researchers find related sequences or studies efficiently.\n\n---\n\n# # Can you describe the deployment process using Docker and SageMaker?\n\nA: Definitely. We started by containerizing our models using Docker — bundling dependencies and model weights — and then deployed them as SageMaker endpoints. It made model versioning and scaling super manageable. We monitored everything using CloudWatch for logs and metrics, and used MLflow for tracking experiments and deployments.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#5_a4b0fd49",
	"has_header": true,
	"word_count": 128
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # Why did you migrate from batch to real-time ETL? What problems did that solve?\n\nA: Our batch ETL jobs were lagging in freshness — not ideal for decision-making. So, we moved to a Kafka + Spark streaming setup, which helped us process data as it arrived. That shift reduced latency by around 30%, enabling near real-time dashboards and alerts for operational teams.\n\n---\n\n# # How did you improve Snowflake performance with materialized views?\n\nA: We had complex analytical queries hitting large datasets. To optimize that, I designed materialized views that pre-aggregated common query patterns, like user summaries or event groupings. We also revised schema layouts to reduce joins. Altogether, query performance improved by roughly 40%.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#6_029a317d",
	"has_header": true,
	"word_count": 119
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # What kind of monitoring and alerting did you set up in production?\n\nA: We used CloudWatch extensively — custom metrics, alarms for failure thresholds, and real-time dashboards for service health. This helped us maintain 99.9% uptime by detecting and responding to issues early. I also integrated alerting into our CI/CD flow for rapid rollback if needed.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#7_03a65b27",
	"has_header": true,
	"word_count": 59
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # Tell me more about your IoT-based forecasting project — what did you build, and how is it useful?\n\nA: It was a real-time analytics pipeline simulating 10,000+ IoT sensor readings. I used Kafka for streaming, Airflow for orchestration, and S3 with lifecycle policies to manage cost — that alone reduced storage cost by 40%. We also trained time series models, including LLaMA 2, which outperformed ARIMA and provided more accurate forecasts. Everything was visualized through Looker dashboards, removing the need for manual reporting.",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#8_badb31b7",
	"has_header": true,
	"word_count": 85
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\nI stored raw and processed data in Amazon S3 buckets. Then I configured lifecycle policies to:\n• Automatically move older data to Glacier (cheaper storage)\n• Delete temporary/intermediate files after a certain period\nThis helped lower storage costs without compromising data access, especially since older raw data wasn’t queried often.\n• Schema enforcement: I used tools like Kafka Schema Registry (via Avro) to define a fixed format for sensor data. This avoided issues with malformed or inconsistent data entering the system.\n• Checksum verification: I added simple checksum validation at ingestion to verify that each message hadn’t been corrupted or tampered with. If the checksum didn’t match, the message was flagged and dropped/logged.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#9_29c1f1fe",
	"has_header": false,
	"word_count": 114
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # IntelliMeet looks interesting — how did you ensure privacy and decentralization?\n\nA: We designed it with federated learning so user data stayed local while models trained collaboratively. For privacy, we implemented end-to-end encryption across all video and audio streams. On top of that, we used real-time latency tuning (sub-200ms) and Transformer-based NLP for summarizing meetings — it made collaboration both private and smart.\n\n---\n\n💡 Other Likely Questions:\n\n# # Which tools or frameworks do you feel most comfortable with in production workflows?\n\nA: I’m most confident with Python and SQL, and regularly use tools like Airflow, Kafka, dbt, Docker, and AWS/GCP for production-grade workflows. I’ve also used Spark, Pinecone, and LangChain depending on the use case.\n\n---",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#10_087c9446",
	"has_header": true,
	"word_count": 120
	}
	},
	{
	"text": "[HEADER] # 🌟 Personal and Professional Goals\n\n# # What’s one project you’re especially proud of, and why?\n\nA: I’d say the real-time IoT forecasting project. It brought together multiple moving parts — streaming, predictive modeling, storage optimization, and automation. It felt really satisfying to see a full-stack data pipeline run smoothly, end-to-end, and make a real operational impact.\n\n---\n\n# # Have you had to learn any tools quickly? How did you approach that?\n\nA: Yes — quite a few! I had to pick up LangChain and Pinecone from scratch while building the semantic search pipeline, and even dove into R and Shiny for a gene co-expression app. I usually approach new tools by reverse-engineering examples, reading docs, and shipping small proofs-of-concept early to learn by doing.",
	"metadata": {
	"source": "goals_and_conversations.md",
	"header": "# 🌟 Personal and Professional Goals",
	"chunk_id": "goals_and_conversations.md_#11_0709a337",
	"has_header": true,
	"word_count": 121
	}
	},
	{
	"text": "[HEADER] # Projects\n\n# Projects\n\n# # Autonomous Multi-Agent Web UI Automation System\n\n- Overview: Developed a multi-agent system using LangGraph and Playwright to navigate non-deterministic UI changes across 5 high-complexity SaaS platforms.\n- Impact: Increased task completion success rate from 68% to 94% by implementing a two-stage verification loop with step-level assertions and exponential backoff for dynamic DOM states.\n- Observability: Integrated LangSmith traces for observability, reducing the mean-time-to-debug for broken selectors by 14 minutes per incident.",
	"metadata": {
	"source": "projects.md",
	"header": "# Projects",
	"chunk_id": "projects.md_#0_89043f3a",
	"has_header": true,
	"word_count": 75
	}
	},
	{
	"text": "[HEADER] # Projects\n\n# # Proxy TuNER: Advancing Cross-Domain Named Entity Recognition through Proxy Tuning\n\n- Overview: Improved cross-domain NER F1-score by 8% by implementing a proxy-tuning approach for LLaMA 2 models (7B, 7B-Chat, 13B) using logit ensembling and gradient reversal.\n- Optimization: Optimized inference performance by 30% and reduced training costs by 70% through distributed execution and model path optimizations in PyTorch.\n\n# # IntelliMeet: AI-Enabled Decentralized Video Conferencing App\n\n- Overview: Architected a secure, decentralized video platform using WebRTC and federated learning to maintain data privacy while sharing only aggregated model updates.\n- Reliability: Reduced call dropouts by 25% by engineering network recovery logic and on-device RetinaFace attention detection for client-side quality adaptation.",
	"metadata": {
	"source": "projects.md",
	"header": "# Projects",
	"chunk_id": "projects.md_#1_48eb2f1e",
	"has_header": true,
	"word_count": 112
	}
	},
	{
	"text": "[HEADER] # Publications\n\n# Publications\n\n- Predicting Circadian Transcription in mRNAs and lncRNAs, IEEE BIBM 2024\n- DNA Foundation Models for Cross-Species TF Binding Prediction, NeurIPS ML in CompBio 2025\n- Multi-omics atlas of the plant nuclear envelope, Science Advances (under review) 2025, University of California, Berkeley",
	"metadata": {
	"source": "publications.md",
	"header": "# Publications",
	"chunk_id": "publications.md_#0_3ab998c3",
	"has_header": true,
	"word_count": 44
	}
	},
	{
	"text": "[HEADER] # Technical Skills\n\n# Technical Skills\n\n# # Languages\n\n- Python, SQL, TypeScript, JavaScript, MongoDB\n\n# # ML & AI Frameworks\n\n- PyTorch, Transformers, LangChain, LangGraph, LoRA, RAG, NLP, SKLearn, XGBoost\n\n# # Data & Infrastructure\n\n- Docker, Kubernetes, Apache Airflow, MLflow, Redis, FAISS, AWS, GCP, Git\n\n# # Tools & Observability\n\n- LangSmith, Grafana, CI/CD (GitHub Actions/Jenkins), Weights & Biases, Linux",
	"metadata": {
	"source": "skills.md",
	"header": "# Technical Skills",
	"chunk_id": "skills.md_#0_8e74dc40",
	"has_header": true,
	"word_count": 59
	}
	},
	{
	"text": "[HEADER] ## 🧗‍♂️ Hobbies & Passions\n\n## 🧗‍♂️ Hobbies & Passions\n\nHere’s what keeps me energized and curious outside of work:\n\n- 🥾 Hiking & Outdoor Adventures — Nothing clears my mind like a good hike.\n- 🎬 Marvel Fan for Life — I’ve seen every Marvel movie, and I’d probably give my life for the MCU (Team Iron Man, always).\n- 🏏 Cricket Enthusiast — Whether it's IPL or gully cricket, I'm all in.\n- 🚀 Space Exploration Buff — Obsessed with rockets, Mars missions, and the future of interplanetary travel.\n- 🍳 Cooking Explorer — I enjoy experimenting with recipes, especially fusion dishes.\n- 🕹️ Gaming & Reverse Engineering — I love diving into game logic and breaking things down just to rebuild them better.\n- 🧑‍🤝‍🧑 Time with Friends — Deep conversations, spontaneous trips, or chill evenings—friends keep me grounded.\n\n---",
	"metadata": {
	"source": "xPersonal_Interests_Cleaned.md",
	"header": "## 🧗‍♂️ Hobbies & Passions",
	"chunk_id": "xPersonal_Interests_Cleaned.md_#0_1dbed23b",
	"has_header": true,
	"word_count": 138
	}
	},
	{
	"text": "[HEADER] ## 🧗‍♂️ Hobbies & Passions\n\n# # 🌍 Cultural Openness\n\n- Origin: I’m proudly from India, a land of festivals, diversity, and flavors.\n- Festivals: I enjoy not only Indian festivals like Diwali, Holi, and Ganesh Chaturthi, but also love embracing global celebrations like Christmas, Hallowean, and Thanksgiving.\n- Cultural Curiosity: Whether it’s learning about rituals, history, or cuisine, I enjoy exploring and respecting all cultural backgrounds.\n\n---\n\n# # 🍽️ Favorite Foods\n\nIf you want to bond with me over food, here’s what hits my soul:\n\n- 🥘 Mutton Biryani from Hyderabad — The gold standard of comfort food.\n- 🍬 Indian Milk Sweets — Especially Rasgulla and Kaju Katli.\n- 🍔 Classic Burger — The messier, the better.\n- 🍛 Puri with Aloo Sabzi — A perfect nostalgic breakfast.\n- 🍮 Gulab Jamun — Always room for dessert.\n\n---",
	"metadata": {
	"source": "xPersonal_Interests_Cleaned.md",
	"header": "## 🧗‍♂️ Hobbies & Passions",
	"chunk_id": "xPersonal_Interests_Cleaned.md_#1_3fb21b0c",
	"has_header": true,
	"word_count": 136
	}
	},
	{
	"text": "[HEADER] ## 🧗‍♂️ Hobbies & Passions\n\n# # 🎉 Fun Facts\n\n- I sometimes pause Marvel movies just to admire the visuals.\n- I've explored how video game stories are built and love experimenting with alternate paths.\n- I can tell if biryani is authentic based on the layering of the rice.\n- I once helped organize a cricket tournament on a week’s notice and we pulled it off with 12 teams!\n- I enjoy solving puzzles, even if they're frustrating sometimes.\n\n---\n\nThis side of me helps fuel the creativity, discipline, and joy I bring into my projects. Let’s connect over ideas _and_ biryani!",
	"metadata": {
	"source": "xPersonal_Interests_Cleaned.md",
	"header": "## 🧗‍♂️ Hobbies & Passions",
	"chunk_id": "xPersonal_Interests_Cleaned.md_#2_42616ef4",
	"has_header": true,
	"word_count": 99
	}
	}
	]