ChatBot / backend /data /all_chunks.json
krishnadhulipalla's picture
Updated UI & personal data
102dac3
[
{
"text": "[HEADER] Krishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at **Cloud Systems LLC**, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at **Virginia Tech**, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at **UJR Technologies**, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in **LangGraph**, **Kubernetes**, **PyTorch**, and **MLOps**.\n\nKrishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at **Cloud Systems LLC**, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at **Virginia Tech**, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at **UJR Technologies**, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in **LangGraph**, **Kubernetes**, **PyTorch**, and **MLOps**.",
"metadata": {
"source": "bio.md",
"header": "Krishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at **Cloud Systems LLC**, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at **Virginia Tech**, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at **UJR Technologies**, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in **LangGraph**, **Kubernetes**, **PyTorch**, and **MLOps**.",
"chunk_id": "bio.md_#0_9ac3944c",
"has_header": false,
"word_count": 83
}
},
{
"text": "[HEADER] # ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\nThis document details the architecture of **Krishna Vamsi Dhulipallaโ€™s** personal AI assistant, implemented with **LangGraph** for orchestrated state management and tool execution. The system is designed for **retrieval-augmented, memory-grounded, and multi-turn conversational intelligence**, integrating **OpenAI GPT-4o**, **Hugging Face embeddings**, and **cross-encoder reranking**.\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#0_f30fc781",
"has_header": true,
"word_count": 52
}
},
{
"text": "[HEADER] # ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐Ÿงฑ Core Components\n\n## # 1. **Models & Their Roles**\n\n| Purpose | Model Name | Role Description |\n| -------------------------- | ---------------------------------------- | ------------------------------------------------ |\n| **Main Chat Model** | `gpt-4o` | Handles conversation, tool calls, and reasoning |\n| **Retriever Embeddings** | `sentence-transformers/all-MiniLM-L6-v2` | Embedding generation for FAISS vector search |\n| **Cross-Encoder Reranker** | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Reranks retrieval results for semantic relevance |\n| **BM25 Retriever** | (LangChain BM25Retriever) | Keyword-based search complementing vector search |\n\nAll models are bound to LangGraph **StateGraph** nodes for structured execution.\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#1_eb402d95",
"has_header": true,
"word_count": 93
}
},
{
"text": "[HEADER] # ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐Ÿ” Retrieval System\n\n## # โœ… **Hybrid Retrieval**\n\n- **FAISS Vector Search** with normalized embeddings\n- **BM25Retriever** for lexical keyword matching\n- Combined using **Reciprocal Rank Fusion (RRF)**\n\n## # ๐Ÿ“Š **Reranking & Diversity**\n\n1. Initial retrieval with FAISS & BM25 (top-K per retriever)\n2. Fusion via RRF scoring\n3. **Cross-Encoder reranking** (top-N candidates)\n4. **Maximal Marginal Relevance (MMR)** selection for diversity\n\n## # ๐Ÿ”Ž Retriever Tool (`@tool retriever`)\n\n- Returns top passages with minimal duplication\n- Used in-system prompt to fetch accurate facts about Krishna\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#2_cab54fdc",
"has_header": true,
"word_count": 89
}
},
{
"text": "[HEADER] # ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐Ÿง  Memory System\n\n## # Long-Term Memory\n\n- **FAISS-based memory vector store** stored at `backend/data/memory_faiss`\n- Stores conversation summaries per thread ID\n\n## # Memory Search Tool (`@tool memory_search`)\n\n- Retrieves relevant conversation snippets by semantic similarity\n- Supports **thread-scoped** search for contextual continuity\n\n## # Memory Write Node\n\n- After each AI response, stores `[Q]: ... [A]: ...` summary\n- Autosaves after every `MEM_AUTOSAVE_EVERY` turns or on thread end\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#3_b0899bfc",
"has_header": true,
"word_count": 73
}
},
{
"text": "[HEADER] # ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐Ÿงญ Orchestration Flow (LangGraph)\n\n```mermaid\ngraph TD\n A[START] --> B[agent node]\n B -->|tool call| C[tools node]\n B -->|no tool| D[memory_write]\n C --> B\n D --> E[END]\n```\n\n## # **Nodes**:\n\n- **agent**: Calls main LLM with conversation window + system prompt\n- **tools**: Executes retriever or memory search tools\n- **memory_write**: Persists summaries to long-term memory\n\n## # **Conditional Edges**:\n\n- From **agent** โ†’ `tools` if tool call detected\n- From **agent** โ†’ `memory_write` if no tool call\n\n---\n\n# # ๐Ÿ’ฌ System Prompt\n\nThe assistant:\n\n- Uses retriever and memory search tools to gather facts about Krishna\n- Avoids fabrication and requests clarification when needed\n- Responds humorously when off-topic but steers back to Krishnaโ€™s expertise\n- Formats with Markdown, headings, and bullet points\n\nEmbedded **Krishnaโ€™s Bio** provides static grounding context.\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#4_c58f0c4c",
"has_header": true,
"word_count": 135
}
},
{
"text": "[HEADER] # ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐ŸŒ API & Streaming\n\n- **Backend**: FastAPI (`backend/api.py`)\n - `/chat` SSE endpoint streams tokens in real-time\n - Passes `thread_id` & `is_final` to LangGraph for stateful conversations\n- **Frontend**: React + Tailwind (custom chat UI)\n - Threaded conversation storage in browser `localStorage`\n - Real-time token rendering via `EventSource`\n - Features: new chat, clear chat, delete thread, suggestions\n\n---\n\n# # ๐Ÿงฉ Design Improvements\n\n- **LangGraph StateGraph** ensures explicit control of message flow\n- **Thread-scoped memory** enables multi-session personalization\n- **Hybrid RRF + Cross-Encoder + MMR** retrieval pipeline improves relevance & diversity\n- **SSE streaming** for low-latency feedback\n- **Decoupled retrieval** and **memory** as separate tools for modularity",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐Ÿค– Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#5_50777b59",
"has_header": true,
"word_count": 108
}
},
{
"text": "[HEADER] # Education\n\n# Education\n\n# # Virginia Tech | Master of Science, Computer Science\n\n**Dec 2024**\n\n- **GPA**: 3.9/4.0\n\n# # Vel Tech University | Bachelor of Technology, Computer Science and Engineering\n\n- **GPA**: 8.24/10",
"metadata": {
"source": "education.md",
"header": "# Education",
"chunk_id": "education.md_#0_ac6a482e",
"has_header": true,
"word_count": 33
}
},
{
"text": "[HEADER] # Professional Experience\n\n# # Software Engineer - AI Platform | Cloud Systems LLC | US Remote\n\n**Jul 2025 - Present**\n\n- **Agentic Workflow Automation**: Architected an agentic workflow using **LangGraph** and **ReAct** to automate SQL generation. This system automated **~65%** of ad-hoc data-auditing requests from internal stakeholders, reducing the average response time from **4 hours to under 2 minutes**.\n- **ETL Optimization**: Optimized data ingestion performance by rebuilding ETL pipelines with batched I/O, incremental refresh logic, and dependency pruning, cutting daily execution runtime by **25%**.\n- **Infrastructure & Reliability**: Improved production reliability by shipping the agent service on **Kubernetes** with autoscaling and rolling deploys, adding alerts and rollback steps for failed releases.\n- **Contract Testing**: Improved cross-service reliability by implementing **Pydantic** schema validation and contract tests, preventing multiple breaking changes from reaching production.",
"metadata": {
"source": "experience.md",
"header": "# Professional Experience",
"chunk_id": "experience.md_#1_33df1d29",
"has_header": true,
"word_count": 131
}
},
{
"text": "[HEADER] # Professional Experience\n\n# # Machine Learning Engineer | Virginia Tech, Dept. of Plant Sciences | Blacksburg, VA\n\n**Aug 2024 - Jul 2025**\n\n- **Model Optimization**: Increased genomics sequence classification throughput by **32%** by applying **LoRA** and **soft prompting** methods. Packaged repeatable **PyTorch** pipelines that cut per-experiment training time by **4.5 hours**.\n- **HPC Orchestration**: Developed an ML orchestration layer for distributed GPU training on HPC clusters. Engineered checkpoint-resume logic that handled preemptive node shutdowns, optimizing resource utilization and reducing compute waste by **15%**.\n- **MLOps**: Reduced research environment setup time from hours to minutes by containerizing fine-tuned models with **Docker** and managing the experimental lifecycle (versions, hyperparameters, and weights) via **MLflow**.",
"metadata": {
"source": "experience.md",
"header": "# Professional Experience",
"chunk_id": "experience.md_#2_a1069896",
"has_header": true,
"word_count": 109
}
},
{
"text": "[HEADER] # Professional Experience\n\n# # Software Engineer | UJR Technologies Pvt Ltd | Hyderabad, India\n\n**Jul 2021 - Dec 2022**\n\n- **API & SDK Development**: Designed and maintained standardized **REST APIs** and **Python-based SDKs** to streamline the ML development lifecycle, reducing cross-team integration defects by **40%**.\n- **Model Serving**: Engineered model-serving endpoints with automated input validation and deployment health checks, lowering prediction-related failures by **30%** for ML-driven features.\n- **CI/CD Pipeline**: Automated CI/CD pipelines via **GitHub Actions** with comprehensive test coverage and scripted rollback procedures, decreasing release failures by **20%** across production environments.",
"metadata": {
"source": "experience.md",
"header": "# Professional Experience",
"chunk_id": "experience.md_#3_f6eb60f5",
"has_header": true,
"word_count": 90
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# ๐ŸŒŸ Personal and Professional Goals\n\n# # โœ… Short-Term Goals (0โ€“6 months)\n\n1. **Deploy Multi-Agent Personal Chatbot**\n\n - Integrate RAG-based retrieval, tool calling, and Open Source LLMs\n - Use LangChain, FAISS, BM25, and Gradio UI\n\n2. **Publish Second Bioinformatics Paper**\n\n - Focus: TF Binding prediction using HyenaDNA and plant genomics data\n - Venue: Submitted to MLCB\n\n3. **Transition Toward Production Roles**\n\n - Shift from academic research to applied roles in data engineering or ML infrastructure\n - Focus on backend, pipeline, and deployment readiness\n\n4. **Accelerate Job Search**\n\n - Apply to 3+ targeted roles per week (platform/data engineering preferred)\n - Tailor applications for visa-friendly, high-impact companies\n\n5. **R Shiny App Enhancement**\n\n - Debug gene co-expression heatmap issues and add new annotation features\n\n6. **Learning & Certifications**\n - Deepen knowledge in Kubernetes for ML Ops\n - Follow NVIDIAโ€™s RAG Agent curriculum weekly\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#0_337b6890",
"has_header": true,
"word_count": 142
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # โณ Mid-Term Goals (6โ€“12 months)\n\n1. **Launch Open-Source Project**\n\n - Create or contribute to ML/data tools (e.g., genomic toolkit, chatbot agent framework)\n\n2. **Scale Personal Bot Capabilities**\n\n - Add calendar integration, document-based Q&A, semantic memory\n\n3. **Advance CI/CD and Observability Skills**\n\n - Implement cloud-native monitoring and testing workflows\n\n4. **Secure Full-Time Role**\n - Land a production-facing role with a U.S. company offering sponsorship support\n\n---\n\n# # ๐Ÿš€ Long-Term Goals (1โ€“3 years)\n\n1. **Become a Senior Data/ML Infrastructure Engineer**\n\n - Work on LLM orchestration, agent systems, scalable infrastructure\n\n2. **Continue Academic Contributions**\n\n - Publish in bioinformatics and AI (focus: genomics + transformers)\n\n3. **Launch a Research-Centered Product/Framework**\n - Build an open-source or startup framework connecting genomics, LLMs, and real-time ML pipelines\n\n---\n\n# ๐Ÿ’ฌ Example Conversations",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#1_bc53463d",
"has_header": true,
"word_count": 128
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# ๐Ÿ’ฌ Example Conversations\n\n# # Q: _What interests you in data engineering?_\n\n**A:** I enjoy architecting scalable data systems that generate real-world insights. From optimizing ETL pipelines to deploying real-time frameworks like the genomic systems at Virginia Tech, I thrive at the intersection of automation and impact.\n\n---\n\n# # Q: _Describe a pipeline you've built._\n\n**A:** One example is a real-time IoT pipeline I built at VT. It processed 10,000+ sensor readings using Kafka, Airflow, and Snowflake, feeding into GPT-4 for forecasting with 91% accuracy. This reduced energy costs by 15% and improved dashboard reporting by 30%.\n\n---\n\n# # Q: _What was your most difficult debugging experience?_\n\n**A:** Debugging duplicate ingestion in a Kafka/Spark pipeline at UJR. I isolated misconfigurations in consumer groups, optimized Spark executors, and applied idempotent logic to reduce latency by 30%.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#2_05b5827c",
"has_header": true,
"word_count": 139
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # Q: _How do you handle data cleaning?_\n\n**A:** I ensure schema consistency, identify missing values and outliers, and use Airflow + dbt for scalable automation. For larger datasets, I optimize transformations using batch jobs or parallel compute.\n\n---\n\n# # Q: _Describe a strong collaboration experience._\n\n**A:** While working on cross-domain NER at Virginia Tech, I collaborated with infrastructure engineers on EC2 deployment while handling model tuning. Together, we reduced latency by 30% and improved F1-scores by 8%.\n\n---\n\n# # Q: _What tools do you use most often?_\n\n**A:** Python, Spark, Airflow, dbt, Kafka, and SageMaker are daily drivers. I also rely on Docker, CloudWatch, and Looker for observability and visualizations.\n\n---\n\n# # Q: _Whatโ€™s a strength and weakness of yours?_\n\n**A:**\n\n- **Strength**: Turning complexity into clean, usable data flows.\n- **Weakness**: Over-polishing outputs, though Iโ€™m learning to better balance speed with quality.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#3_e7c4a2f9",
"has_header": true,
"word_count": 149
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # Q: _Whatโ€™s a strength and weakness of yours?_\n\n**A:**\n\n- **Strength**: Turning complexity into clean, usable data flows.\n- **Weakness**: Over-polishing outputs, though Iโ€™m learning to better balance speed with quality.\n\n---\n\n# # Q: _What do you want to work on next?_\n\n**A:** I want to deepen my skills in production ML workflowsโ€”especially building intelligent agents and scalable pipelines that serve live products and cross-functional teams.\n\n# # How did you automate preprocessing for 1M+ biological samples?\n\nA: Sure! The goal was to streamline raw sequence processing at scale, so I used Biopython for parsing genomic formats and dbt to standardize and transform the data in a modular way. Everything was orchestrated through Apache Airflow, which let us automate the entire workflow end-to-end โ€” from ingestion to feature extraction. We parallelized parts of the process and optimized SQL logic, which led to a 40% improvement in throughput.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#4_ffdd8b09",
"has_header": true,
"word_count": 151
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # What kind of semantic search did you build using LangChain and Pinecone?\n\nA: We built a vector search pipeline tailored to genomic research papers and sequence annotations. I used LangChain to create embeddings and chain logic, and stored those in Pinecone for fast similarity-based retrieval. It supported both question-answering over domain-specific documents and similarity search, helping researchers find related sequences or studies efficiently.\n\n---\n\n# # Can you describe the deployment process using Docker and SageMaker?\n\nA: Definitely. We started by containerizing our models using Docker โ€” bundling dependencies and model weights โ€” and then deployed them as SageMaker endpoints. It made model versioning and scaling super manageable. We monitored everything using CloudWatch for logs and metrics, and used MLflow for tracking experiments and deployments.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#5_a4b0fd49",
"has_header": true,
"word_count": 128
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # Why did you migrate from batch to real-time ETL? What problems did that solve?\n\nA: Our batch ETL jobs were lagging in freshness โ€” not ideal for decision-making. So, we moved to a Kafka + Spark streaming setup, which helped us process data as it arrived. That shift reduced latency by around 30%, enabling near real-time dashboards and alerts for operational teams.\n\n---\n\n# # How did you improve Snowflake performance with materialized views?\n\nA: We had complex analytical queries hitting large datasets. To optimize that, I designed materialized views that pre-aggregated common query patterns, like user summaries or event groupings. We also revised schema layouts to reduce joins. Altogether, query performance improved by roughly 40%.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#6_029a317d",
"has_header": true,
"word_count": 119
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # What kind of monitoring and alerting did you set up in production?\n\nA: We used CloudWatch extensively โ€” custom metrics, alarms for failure thresholds, and real-time dashboards for service health. This helped us maintain 99.9% uptime by detecting and responding to issues early. I also integrated alerting into our CI/CD flow for rapid rollback if needed.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#7_03a65b27",
"has_header": true,
"word_count": 59
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # Tell me more about your IoT-based forecasting project โ€” what did you build, and how is it useful?\n\nA: It was a real-time analytics pipeline simulating 10,000+ IoT sensor readings. I used Kafka for streaming, Airflow for orchestration, and S3 with lifecycle policies to manage cost โ€” that alone reduced storage cost by 40%. We also trained time series models, including LLaMA 2, which outperformed ARIMA and provided more accurate forecasts. Everything was visualized through Looker dashboards, removing the need for manual reporting.",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#8_badb31b7",
"has_header": true,
"word_count": 85
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\nI stored raw and processed data in Amazon S3 buckets. Then I configured lifecycle policies to:\nโ€ข Automatically move older data to Glacier (cheaper storage)\nโ€ข Delete temporary/intermediate files after a certain period\nThis helped lower storage costs without compromising data access, especially since older raw data wasnโ€™t queried often.\nโ€ข Schema enforcement: I used tools like Kafka Schema Registry (via Avro) to define a fixed format for sensor data. This avoided issues with malformed or inconsistent data entering the system.\nโ€ข Checksum verification: I added simple checksum validation at ingestion to verify that each message hadnโ€™t been corrupted or tampered with. If the checksum didnโ€™t match, the message was flagged and dropped/logged.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#9_29c1f1fe",
"has_header": false,
"word_count": 114
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # IntelliMeet looks interesting โ€” how did you ensure privacy and decentralization?\n\nA: We designed it with federated learning so user data stayed local while models trained collaboratively. For privacy, we implemented end-to-end encryption across all video and audio streams. On top of that, we used real-time latency tuning (sub-200ms) and Transformer-based NLP for summarizing meetings โ€” it made collaboration both private and smart.\n\n---\n\n๐Ÿ’ก Other Likely Questions:\n\n# # Which tools or frameworks do you feel most comfortable with in production workflows?\n\nA: Iโ€™m most confident with Python and SQL, and regularly use tools like Airflow, Kafka, dbt, Docker, and AWS/GCP for production-grade workflows. Iโ€™ve also used Spark, Pinecone, and LangChain depending on the use case.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#10_087c9446",
"has_header": true,
"word_count": 120
}
},
{
"text": "[HEADER] # ๐ŸŒŸ Personal and Professional Goals\n\n# # Whatโ€™s one project youโ€™re especially proud of, and why?\n\nA: Iโ€™d say the real-time IoT forecasting project. It brought together multiple moving parts โ€” streaming, predictive modeling, storage optimization, and automation. It felt really satisfying to see a full-stack data pipeline run smoothly, end-to-end, and make a real operational impact.\n\n---\n\n# # Have you had to learn any tools quickly? How did you approach that?\n\nA: Yes โ€” quite a few! I had to pick up LangChain and Pinecone from scratch while building the semantic search pipeline, and even dove into R and Shiny for a gene co-expression app. I usually approach new tools by reverse-engineering examples, reading docs, and shipping small proofs-of-concept early to learn by doing.",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ŸŒŸ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#11_0709a337",
"has_header": true,
"word_count": 121
}
},
{
"text": "[HEADER] # Projects\n\n# Projects\n\n# # Autonomous Multi-Agent Web UI Automation System\n\n- **Overview**: Developed a multi-agent system using **LangGraph** and **Playwright** to navigate non-deterministic UI changes across 5 high-complexity SaaS platforms.\n- **Impact**: Increased task completion success rate from **68% to 94%** by implementing a two-stage verification loop with step-level assertions and exponential backoff for dynamic DOM states.\n- **Observability**: Integrated **LangSmith** traces for observability, reducing the mean-time-to-debug for broken selectors by **14 minutes per incident**.",
"metadata": {
"source": "projects.md",
"header": "# Projects",
"chunk_id": "projects.md_#0_89043f3a",
"has_header": true,
"word_count": 75
}
},
{
"text": "[HEADER] # Projects\n\n# # Proxy TuNER: Advancing Cross-Domain Named Entity Recognition through Proxy Tuning\n\n- **Overview**: Improved cross-domain NER F1-score by **8%** by implementing a proxy-tuning approach for **LLaMA 2 models** (7B, 7B-Chat, 13B) using logit ensembling and gradient reversal.\n- **Optimization**: Optimized inference performance by **30%** and reduced training costs by **70%** through distributed execution and model path optimizations in **PyTorch**.\n\n# # IntelliMeet: AI-Enabled Decentralized Video Conferencing App\n\n- **Overview**: Architected a secure, decentralized video platform using **WebRTC** and **federated learning** to maintain data privacy while sharing only aggregated model updates.\n- **Reliability**: Reduced call dropouts by **25%** by engineering network recovery logic and on-device **RetinaFace** attention detection for client-side quality adaptation.",
"metadata": {
"source": "projects.md",
"header": "# Projects",
"chunk_id": "projects.md_#1_48eb2f1e",
"has_header": true,
"word_count": 112
}
},
{
"text": "[HEADER] # Publications\n\n# Publications\n\n- **Predicting Circadian Transcription in mRNAs and lncRNAs**, IEEE BIBM 2024\n- **DNA Foundation Models for Cross-Species TF Binding Prediction**, NeurIPS ML in CompBio 2025\n- **Multi-omics atlas of the plant nuclear envelope**, Science Advances (under review) 2025, University of California, Berkeley",
"metadata": {
"source": "publications.md",
"header": "# Publications",
"chunk_id": "publications.md_#0_3ab998c3",
"has_header": true,
"word_count": 44
}
},
{
"text": "[HEADER] # Technical Skills\n\n# Technical Skills\n\n# # Languages\n\n- Python, SQL, TypeScript, JavaScript, MongoDB\n\n# # ML & AI Frameworks\n\n- PyTorch, Transformers, LangChain, LangGraph, LoRA, RAG, NLP, SKLearn, XGBoost\n\n# # Data & Infrastructure\n\n- Docker, Kubernetes, Apache Airflow, MLflow, Redis, FAISS, AWS, GCP, Git\n\n# # Tools & Observability\n\n- LangSmith, Grafana, CI/CD (GitHub Actions/Jenkins), Weights & Biases, Linux",
"metadata": {
"source": "skills.md",
"header": "# Technical Skills",
"chunk_id": "skills.md_#0_8e74dc40",
"has_header": true,
"word_count": 59
}
},
{
"text": "[HEADER] ## ๐Ÿง—โ€โ™‚๏ธ Hobbies & Passions\n\n## ๐Ÿง—โ€โ™‚๏ธ Hobbies & Passions\n\nHereโ€™s what keeps me energized and curious outside of work:\n\n- **๐Ÿฅพ Hiking & Outdoor Adventures** โ€” Nothing clears my mind like a good hike.\n- **๐ŸŽฌ Marvel Fan for Life** โ€” Iโ€™ve seen every Marvel movie, and Iโ€™d probably give my life for the MCU (Team Iron Man, always).\n- **๐Ÿ Cricket Enthusiast** โ€” Whether it's IPL or gully cricket, I'm all in.\n- **๐Ÿš€ Space Exploration Buff** โ€” Obsessed with rockets, Mars missions, and the future of interplanetary travel.\n- **๐Ÿณ Cooking Explorer** โ€” I enjoy experimenting with recipes, especially fusion dishes.\n- **๐Ÿ•น๏ธ Gaming & Reverse Engineering** โ€” I love diving into game logic and breaking things down just to rebuild them better.\n- **๐Ÿง‘โ€๐Ÿคโ€๐Ÿง‘ Time with Friends** โ€” Deep conversations, spontaneous trips, or chill eveningsโ€”friends keep me grounded.\n\n---",
"metadata": {
"source": "xPersonal_Interests_Cleaned.md",
"header": "## ๐Ÿง—โ€โ™‚๏ธ Hobbies & Passions",
"chunk_id": "xPersonal_Interests_Cleaned.md_#0_1dbed23b",
"has_header": true,
"word_count": 138
}
},
{
"text": "[HEADER] ## ๐Ÿง—โ€โ™‚๏ธ Hobbies & Passions\n\n# # ๐ŸŒ Cultural Openness\n\n- **Origin**: Iโ€™m proudly from **India**, a land of festivals, diversity, and flavors.\n- **Festivals**: I enjoy not only Indian festivals like **Diwali**, **Holi**, and **Ganesh Chaturthi**, but also love embracing global celebrations like **Christmas**, **Hallowean**, and **Thanksgiving**.\n- **Cultural Curiosity**: Whether itโ€™s learning about rituals, history, or cuisine, I enjoy exploring and respecting all cultural backgrounds.\n\n---\n\n# # ๐Ÿฝ๏ธ Favorite Foods\n\nIf you want to bond with me over food, hereโ€™s what hits my soul:\n\n- **๐Ÿฅ˜ Mutton Biryani from Hyderabad** โ€” The gold standard of comfort food.\n- **๐Ÿฌ Indian Milk Sweets** โ€” Especially Rasgulla and Kaju Katli.\n- **๐Ÿ” Classic Burger** โ€” The messier, the better.\n- **๐Ÿ› Puri with Aloo Sabzi** โ€” A perfect nostalgic breakfast.\n- **๐Ÿฎ Gulab Jamun** โ€” Always room for dessert.\n\n---",
"metadata": {
"source": "xPersonal_Interests_Cleaned.md",
"header": "## ๐Ÿง—โ€โ™‚๏ธ Hobbies & Passions",
"chunk_id": "xPersonal_Interests_Cleaned.md_#1_3fb21b0c",
"has_header": true,
"word_count": 136
}
},
{
"text": "[HEADER] ## ๐Ÿง—โ€โ™‚๏ธ Hobbies & Passions\n\n# # ๐ŸŽ‰ Fun Facts\n\n- I sometimes pause Marvel movies just to admire the visuals.\n- I've explored how video game stories are built and love experimenting with alternate paths.\n- I can tell if biryani is authentic based on the layering of the rice.\n- I once helped organize a cricket tournament on a weekโ€™s notice and we pulled it off with 12 teams!\n- I enjoy solving puzzles, even if they're frustrating sometimes.\n\n---\n\nThis side of me helps fuel the creativity, discipline, and joy I bring into my projects. Letโ€™s connect over ideas _and_ biryani!",
"metadata": {
"source": "xPersonal_Interests_Cleaned.md",
"header": "## ๐Ÿง—โ€โ™‚๏ธ Hobbies & Passions",
"chunk_id": "xPersonal_Interests_Cleaned.md_#2_42616ef4",
"has_header": true,
"word_count": 99
}
}
]