Spaces:
Running
Running
File size: 32,377 Bytes
249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 102dac3 249a397 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
[
{
"text": "[HEADER] Krishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at **Cloud Systems LLC**, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at **Virginia Tech**, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at **UJR Technologies**, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in **LangGraph**, **Kubernetes**, **PyTorch**, and **MLOps**.\n\nKrishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at **Cloud Systems LLC**, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at **Virginia Tech**, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at **UJR Technologies**, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in **LangGraph**, **Kubernetes**, **PyTorch**, and **MLOps**.",
"metadata": {
"source": "bio.md",
"header": "Krishna Vamsi Dhulipalla is a Software Engineer specializing in generic workflows and AI platforms. He currently works at **Cloud Systems LLC**, where he architects LangGraph-based agents to automate data auditing. Previously, he served as a Machine Learning Engineer at **Virginia Tech**, optimizing genomic models with LoRA/soft prompting, and as a Software Engineer at **UJR Technologies**, building ML SDKs and CI/CD pipelines. He holds an M.S. in Computer Science from Virginia Tech (Dec 2024) and has significant expertise in **LangGraph**, **Kubernetes**, **PyTorch**, and **MLOps**.",
"chunk_id": "bio.md_#0_9ac3944c",
"has_header": false,
"word_count": 83
}
},
{
"text": "[HEADER] # ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\nThis document details the architecture of **Krishna Vamsi Dhulipallaโs** personal AI assistant, implemented with **LangGraph** for orchestrated state management and tool execution. The system is designed for **retrieval-augmented, memory-grounded, and multi-turn conversational intelligence**, integrating **OpenAI GPT-4o**, **Hugging Face embeddings**, and **cross-encoder reranking**.\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#0_f30fc781",
"has_header": true,
"word_count": 52
}
},
{
"text": "[HEADER] # ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐งฑ Core Components\n\n## # 1. **Models & Their Roles**\n\n| Purpose | Model Name | Role Description |\n| -------------------------- | ---------------------------------------- | ------------------------------------------------ |\n| **Main Chat Model** | `gpt-4o` | Handles conversation, tool calls, and reasoning |\n| **Retriever Embeddings** | `sentence-transformers/all-MiniLM-L6-v2` | Embedding generation for FAISS vector search |\n| **Cross-Encoder Reranker** | `cross-encoder/ms-marco-MiniLM-L-6-v2` | Reranks retrieval results for semantic relevance |\n| **BM25 Retriever** | (LangChain BM25Retriever) | Keyword-based search complementing vector search |\n\nAll models are bound to LangGraph **StateGraph** nodes for structured execution.\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#1_eb402d95",
"has_header": true,
"word_count": 93
}
},
{
"text": "[HEADER] # ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐ Retrieval System\n\n## # โ
**Hybrid Retrieval**\n\n- **FAISS Vector Search** with normalized embeddings\n- **BM25Retriever** for lexical keyword matching\n- Combined using **Reciprocal Rank Fusion (RRF)**\n\n## # ๐ **Reranking & Diversity**\n\n1. Initial retrieval with FAISS & BM25 (top-K per retriever)\n2. Fusion via RRF scoring\n3. **Cross-Encoder reranking** (top-N candidates)\n4. **Maximal Marginal Relevance (MMR)** selection for diversity\n\n## # ๐ Retriever Tool (`@tool retriever`)\n\n- Returns top passages with minimal duplication\n- Used in-system prompt to fetch accurate facts about Krishna\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#2_cab54fdc",
"has_header": true,
"word_count": 89
}
},
{
"text": "[HEADER] # ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐ง Memory System\n\n## # Long-Term Memory\n\n- **FAISS-based memory vector store** stored at `backend/data/memory_faiss`\n- Stores conversation summaries per thread ID\n\n## # Memory Search Tool (`@tool memory_search`)\n\n- Retrieves relevant conversation snippets by semantic similarity\n- Supports **thread-scoped** search for contextual continuity\n\n## # Memory Write Node\n\n- After each AI response, stores `[Q]: ... [A]: ...` summary\n- Autosaves after every `MEM_AUTOSAVE_EVERY` turns or on thread end\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#3_b0899bfc",
"has_header": true,
"word_count": 73
}
},
{
"text": "[HEADER] # ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐งญ Orchestration Flow (LangGraph)\n\n```mermaid\ngraph TD\n A[START] --> B[agent node]\n B -->|tool call| C[tools node]\n B -->|no tool| D[memory_write]\n C --> B\n D --> E[END]\n```\n\n## # **Nodes**:\n\n- **agent**: Calls main LLM with conversation window + system prompt\n- **tools**: Executes retriever or memory search tools\n- **memory_write**: Persists summaries to long-term memory\n\n## # **Conditional Edges**:\n\n- From **agent** โ `tools` if tool call detected\n- From **agent** โ `memory_write` if no tool call\n\n---\n\n# # ๐ฌ System Prompt\n\nThe assistant:\n\n- Uses retriever and memory search tools to gather facts about Krishna\n- Avoids fabrication and requests clarification when needed\n- Responds humorously when off-topic but steers back to Krishnaโs expertise\n- Formats with Markdown, headings, and bullet points\n\nEmbedded **Krishnaโs Bio** provides static grounding context.\n\n---",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#4_c58f0c4c",
"has_header": true,
"word_count": 135
}
},
{
"text": "[HEADER] # ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant\n\n# # ๐ API & Streaming\n\n- **Backend**: FastAPI (`backend/api.py`)\n - `/chat` SSE endpoint streams tokens in real-time\n - Passes `thread_id` & `is_final` to LangGraph for stateful conversations\n- **Frontend**: React + Tailwind (custom chat UI)\n - Threaded conversation storage in browser `localStorage`\n - Real-time token rendering via `EventSource`\n - Features: new chat, clear chat, delete thread, suggestions\n\n---\n\n# # ๐งฉ Design Improvements\n\n- **LangGraph StateGraph** ensures explicit control of message flow\n- **Thread-scoped memory** enables multi-session personalization\n- **Hybrid RRF + Cross-Encoder + MMR** retrieval pipeline improves relevance & diversity\n- **SSE streaming** for low-latency feedback\n- **Decoupled retrieval** and **memory** as separate tools for modularity",
"metadata": {
"source": "Chatbot_Architecture_Notes.md",
"header": "# ๐ค Chatbot Architecture Overview: Krishna's Personal AI Assistant",
"chunk_id": "Chatbot_Architecture_Notes.md_#5_50777b59",
"has_header": true,
"word_count": 108
}
},
{
"text": "[HEADER] # Education\n\n# Education\n\n# # Virginia Tech | Master of Science, Computer Science\n\n**Dec 2024**\n\n- **GPA**: 3.9/4.0\n\n# # Vel Tech University | Bachelor of Technology, Computer Science and Engineering\n\n- **GPA**: 8.24/10",
"metadata": {
"source": "education.md",
"header": "# Education",
"chunk_id": "education.md_#0_ac6a482e",
"has_header": true,
"word_count": 33
}
},
{
"text": "[HEADER] # Professional Experience\n\n# # Software Engineer - AI Platform | Cloud Systems LLC | US Remote\n\n**Jul 2025 - Present**\n\n- **Agentic Workflow Automation**: Architected an agentic workflow using **LangGraph** and **ReAct** to automate SQL generation. This system automated **~65%** of ad-hoc data-auditing requests from internal stakeholders, reducing the average response time from **4 hours to under 2 minutes**.\n- **ETL Optimization**: Optimized data ingestion performance by rebuilding ETL pipelines with batched I/O, incremental refresh logic, and dependency pruning, cutting daily execution runtime by **25%**.\n- **Infrastructure & Reliability**: Improved production reliability by shipping the agent service on **Kubernetes** with autoscaling and rolling deploys, adding alerts and rollback steps for failed releases.\n- **Contract Testing**: Improved cross-service reliability by implementing **Pydantic** schema validation and contract tests, preventing multiple breaking changes from reaching production.",
"metadata": {
"source": "experience.md",
"header": "# Professional Experience",
"chunk_id": "experience.md_#1_33df1d29",
"has_header": true,
"word_count": 131
}
},
{
"text": "[HEADER] # Professional Experience\n\n# # Machine Learning Engineer | Virginia Tech, Dept. of Plant Sciences | Blacksburg, VA\n\n**Aug 2024 - Jul 2025**\n\n- **Model Optimization**: Increased genomics sequence classification throughput by **32%** by applying **LoRA** and **soft prompting** methods. Packaged repeatable **PyTorch** pipelines that cut per-experiment training time by **4.5 hours**.\n- **HPC Orchestration**: Developed an ML orchestration layer for distributed GPU training on HPC clusters. Engineered checkpoint-resume logic that handled preemptive node shutdowns, optimizing resource utilization and reducing compute waste by **15%**.\n- **MLOps**: Reduced research environment setup time from hours to minutes by containerizing fine-tuned models with **Docker** and managing the experimental lifecycle (versions, hyperparameters, and weights) via **MLflow**.",
"metadata": {
"source": "experience.md",
"header": "# Professional Experience",
"chunk_id": "experience.md_#2_a1069896",
"has_header": true,
"word_count": 109
}
},
{
"text": "[HEADER] # Professional Experience\n\n# # Software Engineer | UJR Technologies Pvt Ltd | Hyderabad, India\n\n**Jul 2021 - Dec 2022**\n\n- **API & SDK Development**: Designed and maintained standardized **REST APIs** and **Python-based SDKs** to streamline the ML development lifecycle, reducing cross-team integration defects by **40%**.\n- **Model Serving**: Engineered model-serving endpoints with automated input validation and deployment health checks, lowering prediction-related failures by **30%** for ML-driven features.\n- **CI/CD Pipeline**: Automated CI/CD pipelines via **GitHub Actions** with comprehensive test coverage and scripted rollback procedures, decreasing release failures by **20%** across production environments.",
"metadata": {
"source": "experience.md",
"header": "# Professional Experience",
"chunk_id": "experience.md_#3_f6eb60f5",
"has_header": true,
"word_count": 90
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# ๐ Personal and Professional Goals\n\n# # โ
Short-Term Goals (0โ6 months)\n\n1. **Deploy Multi-Agent Personal Chatbot**\n\n - Integrate RAG-based retrieval, tool calling, and Open Source LLMs\n - Use LangChain, FAISS, BM25, and Gradio UI\n\n2. **Publish Second Bioinformatics Paper**\n\n - Focus: TF Binding prediction using HyenaDNA and plant genomics data\n - Venue: Submitted to MLCB\n\n3. **Transition Toward Production Roles**\n\n - Shift from academic research to applied roles in data engineering or ML infrastructure\n - Focus on backend, pipeline, and deployment readiness\n\n4. **Accelerate Job Search**\n\n - Apply to 3+ targeted roles per week (platform/data engineering preferred)\n - Tailor applications for visa-friendly, high-impact companies\n\n5. **R Shiny App Enhancement**\n\n - Debug gene co-expression heatmap issues and add new annotation features\n\n6. **Learning & Certifications**\n - Deepen knowledge in Kubernetes for ML Ops\n - Follow NVIDIAโs RAG Agent curriculum weekly\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#0_337b6890",
"has_header": true,
"word_count": 142
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # โณ Mid-Term Goals (6โ12 months)\n\n1. **Launch Open-Source Project**\n\n - Create or contribute to ML/data tools (e.g., genomic toolkit, chatbot agent framework)\n\n2. **Scale Personal Bot Capabilities**\n\n - Add calendar integration, document-based Q&A, semantic memory\n\n3. **Advance CI/CD and Observability Skills**\n\n - Implement cloud-native monitoring and testing workflows\n\n4. **Secure Full-Time Role**\n - Land a production-facing role with a U.S. company offering sponsorship support\n\n---\n\n# # ๐ Long-Term Goals (1โ3 years)\n\n1. **Become a Senior Data/ML Infrastructure Engineer**\n\n - Work on LLM orchestration, agent systems, scalable infrastructure\n\n2. **Continue Academic Contributions**\n\n - Publish in bioinformatics and AI (focus: genomics + transformers)\n\n3. **Launch a Research-Centered Product/Framework**\n - Build an open-source or startup framework connecting genomics, LLMs, and real-time ML pipelines\n\n---\n\n# ๐ฌ Example Conversations",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#1_bc53463d",
"has_header": true,
"word_count": 128
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# ๐ฌ Example Conversations\n\n# # Q: _What interests you in data engineering?_\n\n**A:** I enjoy architecting scalable data systems that generate real-world insights. From optimizing ETL pipelines to deploying real-time frameworks like the genomic systems at Virginia Tech, I thrive at the intersection of automation and impact.\n\n---\n\n# # Q: _Describe a pipeline you've built._\n\n**A:** One example is a real-time IoT pipeline I built at VT. It processed 10,000+ sensor readings using Kafka, Airflow, and Snowflake, feeding into GPT-4 for forecasting with 91% accuracy. This reduced energy costs by 15% and improved dashboard reporting by 30%.\n\n---\n\n# # Q: _What was your most difficult debugging experience?_\n\n**A:** Debugging duplicate ingestion in a Kafka/Spark pipeline at UJR. I isolated misconfigurations in consumer groups, optimized Spark executors, and applied idempotent logic to reduce latency by 30%.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#2_05b5827c",
"has_header": true,
"word_count": 139
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # Q: _How do you handle data cleaning?_\n\n**A:** I ensure schema consistency, identify missing values and outliers, and use Airflow + dbt for scalable automation. For larger datasets, I optimize transformations using batch jobs or parallel compute.\n\n---\n\n# # Q: _Describe a strong collaboration experience._\n\n**A:** While working on cross-domain NER at Virginia Tech, I collaborated with infrastructure engineers on EC2 deployment while handling model tuning. Together, we reduced latency by 30% and improved F1-scores by 8%.\n\n---\n\n# # Q: _What tools do you use most often?_\n\n**A:** Python, Spark, Airflow, dbt, Kafka, and SageMaker are daily drivers. I also rely on Docker, CloudWatch, and Looker for observability and visualizations.\n\n---\n\n# # Q: _Whatโs a strength and weakness of yours?_\n\n**A:**\n\n- **Strength**: Turning complexity into clean, usable data flows.\n- **Weakness**: Over-polishing outputs, though Iโm learning to better balance speed with quality.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#3_e7c4a2f9",
"has_header": true,
"word_count": 149
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # Q: _Whatโs a strength and weakness of yours?_\n\n**A:**\n\n- **Strength**: Turning complexity into clean, usable data flows.\n- **Weakness**: Over-polishing outputs, though Iโm learning to better balance speed with quality.\n\n---\n\n# # Q: _What do you want to work on next?_\n\n**A:** I want to deepen my skills in production ML workflowsโespecially building intelligent agents and scalable pipelines that serve live products and cross-functional teams.\n\n# # How did you automate preprocessing for 1M+ biological samples?\n\nA: Sure! The goal was to streamline raw sequence processing at scale, so I used Biopython for parsing genomic formats and dbt to standardize and transform the data in a modular way. Everything was orchestrated through Apache Airflow, which let us automate the entire workflow end-to-end โ from ingestion to feature extraction. We parallelized parts of the process and optimized SQL logic, which led to a 40% improvement in throughput.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#4_ffdd8b09",
"has_header": true,
"word_count": 151
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # What kind of semantic search did you build using LangChain and Pinecone?\n\nA: We built a vector search pipeline tailored to genomic research papers and sequence annotations. I used LangChain to create embeddings and chain logic, and stored those in Pinecone for fast similarity-based retrieval. It supported both question-answering over domain-specific documents and similarity search, helping researchers find related sequences or studies efficiently.\n\n---\n\n# # Can you describe the deployment process using Docker and SageMaker?\n\nA: Definitely. We started by containerizing our models using Docker โ bundling dependencies and model weights โ and then deployed them as SageMaker endpoints. It made model versioning and scaling super manageable. We monitored everything using CloudWatch for logs and metrics, and used MLflow for tracking experiments and deployments.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#5_a4b0fd49",
"has_header": true,
"word_count": 128
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # Why did you migrate from batch to real-time ETL? What problems did that solve?\n\nA: Our batch ETL jobs were lagging in freshness โ not ideal for decision-making. So, we moved to a Kafka + Spark streaming setup, which helped us process data as it arrived. That shift reduced latency by around 30%, enabling near real-time dashboards and alerts for operational teams.\n\n---\n\n# # How did you improve Snowflake performance with materialized views?\n\nA: We had complex analytical queries hitting large datasets. To optimize that, I designed materialized views that pre-aggregated common query patterns, like user summaries or event groupings. We also revised schema layouts to reduce joins. Altogether, query performance improved by roughly 40%.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#6_029a317d",
"has_header": true,
"word_count": 119
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # What kind of monitoring and alerting did you set up in production?\n\nA: We used CloudWatch extensively โ custom metrics, alarms for failure thresholds, and real-time dashboards for service health. This helped us maintain 99.9% uptime by detecting and responding to issues early. I also integrated alerting into our CI/CD flow for rapid rollback if needed.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#7_03a65b27",
"has_header": true,
"word_count": 59
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # Tell me more about your IoT-based forecasting project โ what did you build, and how is it useful?\n\nA: It was a real-time analytics pipeline simulating 10,000+ IoT sensor readings. I used Kafka for streaming, Airflow for orchestration, and S3 with lifecycle policies to manage cost โ that alone reduced storage cost by 40%. We also trained time series models, including LLaMA 2, which outperformed ARIMA and provided more accurate forecasts. Everything was visualized through Looker dashboards, removing the need for manual reporting.",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#8_badb31b7",
"has_header": true,
"word_count": 85
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\nI stored raw and processed data in Amazon S3 buckets. Then I configured lifecycle policies to:\nโข Automatically move older data to Glacier (cheaper storage)\nโข Delete temporary/intermediate files after a certain period\nThis helped lower storage costs without compromising data access, especially since older raw data wasnโt queried often.\nโข Schema enforcement: I used tools like Kafka Schema Registry (via Avro) to define a fixed format for sensor data. This avoided issues with malformed or inconsistent data entering the system.\nโข Checksum verification: I added simple checksum validation at ingestion to verify that each message hadnโt been corrupted or tampered with. If the checksum didnโt match, the message was flagged and dropped/logged.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#9_29c1f1fe",
"has_header": false,
"word_count": 114
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # IntelliMeet looks interesting โ how did you ensure privacy and decentralization?\n\nA: We designed it with federated learning so user data stayed local while models trained collaboratively. For privacy, we implemented end-to-end encryption across all video and audio streams. On top of that, we used real-time latency tuning (sub-200ms) and Transformer-based NLP for summarizing meetings โ it made collaboration both private and smart.\n\n---\n\n๐ก Other Likely Questions:\n\n# # Which tools or frameworks do you feel most comfortable with in production workflows?\n\nA: Iโm most confident with Python and SQL, and regularly use tools like Airflow, Kafka, dbt, Docker, and AWS/GCP for production-grade workflows. Iโve also used Spark, Pinecone, and LangChain depending on the use case.\n\n---",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#10_087c9446",
"has_header": true,
"word_count": 120
}
},
{
"text": "[HEADER] # ๐ Personal and Professional Goals\n\n# # Whatโs one project youโre especially proud of, and why?\n\nA: Iโd say the real-time IoT forecasting project. It brought together multiple moving parts โ streaming, predictive modeling, storage optimization, and automation. It felt really satisfying to see a full-stack data pipeline run smoothly, end-to-end, and make a real operational impact.\n\n---\n\n# # Have you had to learn any tools quickly? How did you approach that?\n\nA: Yes โ quite a few! I had to pick up LangChain and Pinecone from scratch while building the semantic search pipeline, and even dove into R and Shiny for a gene co-expression app. I usually approach new tools by reverse-engineering examples, reading docs, and shipping small proofs-of-concept early to learn by doing.",
"metadata": {
"source": "goals_and_conversations.md",
"header": "# ๐ Personal and Professional Goals",
"chunk_id": "goals_and_conversations.md_#11_0709a337",
"has_header": true,
"word_count": 121
}
},
{
"text": "[HEADER] # Projects\n\n# Projects\n\n# # Autonomous Multi-Agent Web UI Automation System\n\n- **Overview**: Developed a multi-agent system using **LangGraph** and **Playwright** to navigate non-deterministic UI changes across 5 high-complexity SaaS platforms.\n- **Impact**: Increased task completion success rate from **68% to 94%** by implementing a two-stage verification loop with step-level assertions and exponential backoff for dynamic DOM states.\n- **Observability**: Integrated **LangSmith** traces for observability, reducing the mean-time-to-debug for broken selectors by **14 minutes per incident**.",
"metadata": {
"source": "projects.md",
"header": "# Projects",
"chunk_id": "projects.md_#0_89043f3a",
"has_header": true,
"word_count": 75
}
},
{
"text": "[HEADER] # Projects\n\n# # Proxy TuNER: Advancing Cross-Domain Named Entity Recognition through Proxy Tuning\n\n- **Overview**: Improved cross-domain NER F1-score by **8%** by implementing a proxy-tuning approach for **LLaMA 2 models** (7B, 7B-Chat, 13B) using logit ensembling and gradient reversal.\n- **Optimization**: Optimized inference performance by **30%** and reduced training costs by **70%** through distributed execution and model path optimizations in **PyTorch**.\n\n# # IntelliMeet: AI-Enabled Decentralized Video Conferencing App\n\n- **Overview**: Architected a secure, decentralized video platform using **WebRTC** and **federated learning** to maintain data privacy while sharing only aggregated model updates.\n- **Reliability**: Reduced call dropouts by **25%** by engineering network recovery logic and on-device **RetinaFace** attention detection for client-side quality adaptation.",
"metadata": {
"source": "projects.md",
"header": "# Projects",
"chunk_id": "projects.md_#1_48eb2f1e",
"has_header": true,
"word_count": 112
}
},
{
"text": "[HEADER] # Publications\n\n# Publications\n\n- **Predicting Circadian Transcription in mRNAs and lncRNAs**, IEEE BIBM 2024\n- **DNA Foundation Models for Cross-Species TF Binding Prediction**, NeurIPS ML in CompBio 2025\n- **Multi-omics atlas of the plant nuclear envelope**, Science Advances (under review) 2025, University of California, Berkeley",
"metadata": {
"source": "publications.md",
"header": "# Publications",
"chunk_id": "publications.md_#0_3ab998c3",
"has_header": true,
"word_count": 44
}
},
{
"text": "[HEADER] # Technical Skills\n\n# Technical Skills\n\n# # Languages\n\n- Python, SQL, TypeScript, JavaScript, MongoDB\n\n# # ML & AI Frameworks\n\n- PyTorch, Transformers, LangChain, LangGraph, LoRA, RAG, NLP, SKLearn, XGBoost\n\n# # Data & Infrastructure\n\n- Docker, Kubernetes, Apache Airflow, MLflow, Redis, FAISS, AWS, GCP, Git\n\n# # Tools & Observability\n\n- LangSmith, Grafana, CI/CD (GitHub Actions/Jenkins), Weights & Biases, Linux",
"metadata": {
"source": "skills.md",
"header": "# Technical Skills",
"chunk_id": "skills.md_#0_8e74dc40",
"has_header": true,
"word_count": 59
}
},
{
"text": "[HEADER] ## ๐งโโ๏ธ Hobbies & Passions\n\n## ๐งโโ๏ธ Hobbies & Passions\n\nHereโs what keeps me energized and curious outside of work:\n\n- **๐ฅพ Hiking & Outdoor Adventures** โ Nothing clears my mind like a good hike.\n- **๐ฌ Marvel Fan for Life** โ Iโve seen every Marvel movie, and Iโd probably give my life for the MCU (Team Iron Man, always).\n- **๐ Cricket Enthusiast** โ Whether it's IPL or gully cricket, I'm all in.\n- **๐ Space Exploration Buff** โ Obsessed with rockets, Mars missions, and the future of interplanetary travel.\n- **๐ณ Cooking Explorer** โ I enjoy experimenting with recipes, especially fusion dishes.\n- **๐น๏ธ Gaming & Reverse Engineering** โ I love diving into game logic and breaking things down just to rebuild them better.\n- **๐งโ๐คโ๐ง Time with Friends** โ Deep conversations, spontaneous trips, or chill eveningsโfriends keep me grounded.\n\n---",
"metadata": {
"source": "xPersonal_Interests_Cleaned.md",
"header": "## ๐งโโ๏ธ Hobbies & Passions",
"chunk_id": "xPersonal_Interests_Cleaned.md_#0_1dbed23b",
"has_header": true,
"word_count": 138
}
},
{
"text": "[HEADER] ## ๐งโโ๏ธ Hobbies & Passions\n\n# # ๐ Cultural Openness\n\n- **Origin**: Iโm proudly from **India**, a land of festivals, diversity, and flavors.\n- **Festivals**: I enjoy not only Indian festivals like **Diwali**, **Holi**, and **Ganesh Chaturthi**, but also love embracing global celebrations like **Christmas**, **Hallowean**, and **Thanksgiving**.\n- **Cultural Curiosity**: Whether itโs learning about rituals, history, or cuisine, I enjoy exploring and respecting all cultural backgrounds.\n\n---\n\n# # ๐ฝ๏ธ Favorite Foods\n\nIf you want to bond with me over food, hereโs what hits my soul:\n\n- **๐ฅ Mutton Biryani from Hyderabad** โ The gold standard of comfort food.\n- **๐ฌ Indian Milk Sweets** โ Especially Rasgulla and Kaju Katli.\n- **๐ Classic Burger** โ The messier, the better.\n- **๐ Puri with Aloo Sabzi** โ A perfect nostalgic breakfast.\n- **๐ฎ Gulab Jamun** โ Always room for dessert.\n\n---",
"metadata": {
"source": "xPersonal_Interests_Cleaned.md",
"header": "## ๐งโโ๏ธ Hobbies & Passions",
"chunk_id": "xPersonal_Interests_Cleaned.md_#1_3fb21b0c",
"has_header": true,
"word_count": 136
}
},
{
"text": "[HEADER] ## ๐งโโ๏ธ Hobbies & Passions\n\n# # ๐ Fun Facts\n\n- I sometimes pause Marvel movies just to admire the visuals.\n- I've explored how video game stories are built and love experimenting with alternate paths.\n- I can tell if biryani is authentic based on the layering of the rice.\n- I once helped organize a cricket tournament on a weekโs notice and we pulled it off with 12 teams!\n- I enjoy solving puzzles, even if they're frustrating sometimes.\n\n---\n\nThis side of me helps fuel the creativity, discipline, and joy I bring into my projects. Letโs connect over ideas _and_ biryani!",
"metadata": {
"source": "xPersonal_Interests_Cleaned.md",
"header": "## ๐งโโ๏ธ Hobbies & Passions",
"chunk_id": "xPersonal_Interests_Cleaned.md_#2_42616ef4",
"has_header": true,
"word_count": 99
}
}
] |