diff --git a/MIGRATION_GUIDE.md b/MIGRATION_GUIDE.md
deleted file mode 100644
index 298982aa85dfd6b513dabe3660f8e82718896bbb..0000000000000000000000000000000000000000
--- a/MIGRATION_GUIDE.md
+++ /dev/null
@@ -1,81 +0,0 @@
-# Repository Restructure Migration Guide
-
-## What Changed
-
-The repository has been reorganized for better structure and maintainability.
-
-## File Moves
-
-### RAG System
-- `src/modal-rag.py` → `src/rag/modal-rag.py`
-- `src/modal-rag-product-design.py` → `src/rag/modal-rag-product-design.py`
-
-### Web Application
-- `web_app.py` → `src/web/web_app.py`
-- `query_product_design.py` → `src/web/query_product_design.py`
-- `templates/` → `src/web/templates/`
-- `static/` → `src/web/static/`
-
-### Scripts
-- Data processing scripts → `scripts/data/`
-- Setup scripts → `scripts/setup/`
-- Utility scripts → `scripts/tools/`
-
-### Documentation
-- All `.md` files → `docs/guides/`
-- Product design docs → `docs/product-design/`
-
-### Tests
-- `test_*.py` → `tests/`
-
-## Updated Commands
-
-### Old Commands (No longer work)
-```bash
-python web_app.py
-modal run src/modal-rag-product-design.py::query_product_design
-```
-
-### New Commands
-```bash
-# Web app
-python src/web/web_app.py
-# Or use helper script
-./scripts/setup/start_web.sh
-
-# Modal RAG
-modal run src/rag/modal-rag-product-design.py::query_product_design --question "your question"
-
-# Indexing
-modal run src/rag/modal-rag-product-design.py::index_product_design
-```
-
-## Import Path Updates
-
-If you have custom scripts that import from these modules, update the imports:
-
-```python
-# Old
-from query_product_design import query_rag
-
-# New
-import sys
-sys.path.insert(0, 'src/web')
-from query_product_design import query_rag
-```
-
-## Next Steps
-
-1. Update any custom scripts with new import paths
-2. Update CI/CD pipelines if applicable
-3. Update documentation references
-4. Test all functionality
-
-## Rollback
-
-If you need to rollback, all files are still in git history. You can:
-```bash
-git log --oneline --all -- "old/path/to/file"
-git checkout <commit-hash> -- "old/path/to/file"
-```
-
diff --git a/NEXT_STEPS.md b/NEXT_STEPS.md
deleted file mode 100644
index e7ca127a656b1e5ec4b7b7342e157b541df6ee82..0000000000000000000000000000000000000000
--- a/NEXT_STEPS.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# Next Steps
-
-## Current Status
-
-✅ **Completed:**
-- Repository restructured and organized
-- RAG system configured (Word, PDF, Excel only - no markdown)
-- Web interface functional
-- Nebius deployment guide created
-- Documentation updated
-
-## Immediate Next Steps
-
-### 1. Test the Updated RAG System
-
-**Upload Product Design Documents:**
-```bash
-# Upload Word document (if you have it)
-modal volume put mcp-hack-ins-products \
-  docs/product-design/tokyo_auto_insurance_product_design.docx \
-  docs/product-design/tokyo_auto_insurance_product_design.docx
-
-# Upload PDF (if you have one)
-modal volume put mcp-hack-ins-products \
-  docs/product-design/tokyo_auto_insurance_product_design.pdf \
-  docs/product-design/tokyo_auto_insurance_product_design.pdf
-
-# Upload Excel (if you have one)
-modal volume put mcp-hack-ins-products \
-  docs/product-design/tokyo_auto_insurance_product_design.xlsx \
-  docs/product-design/tokyo_auto_insurance_product_design.xlsx
-```
-
-**Re-index Documents:**
-```bash
-# Using CLI
-python src/web/query_product_design.py --index
-
-# Or direct Modal command
-modal run src/rag/modal-rag-product-design.py::index_product_design
-```
-
-**Test Queries:**
-```bash
-# Test via CLI
-python src/web/query_product_design.py --query "What are the three product tiers?"
-
-# Or start web interface
-python src/web/web_app.py
-# Then open http://127.0.0.1:5000 in browser
-```
-
-### 2. Verify File Processing
-
-Check that the system correctly:
-- ✅ Loads Word documents
-- ✅ Loads PDF documents (if uploaded)
-- ✅ Loads Excel files (if uploaded)
-- ❌ Ignores markdown files
-- ❌ Ignores other file types
-
-### 3. Production Readiness
-
-**Option A: Continue with Modal (Current Setup)**
-- ✅ Already working
-- ✅ No changes needed
-- Just ensure documents are uploaded and indexed
-
-**Option B: Deploy to Nebius**
-- Review: `docs/deployment/NEBIUS_DEPLOYMENT.md`
-- Set up Nebius account
-- Deploy RAG service and web app
-- Migrate from Modal to Nebius
-
-## Recommended Path Forward
-
-### Short Term (This Week)
-1. **Upload and index documents**
-   - Ensure Word/PDF/Excel files are in Modal volume
-   - Run indexing
-   - Test queries
-
-2. **Validate RAG quality**
-   - Ask various product questions
-   - Verify answer quality and accuracy
-   - Check source citations
-
-3. **Test web interface**
-   - Start web app
-   - Test from browser
-   - Verify all features work
-
-### Medium Term (Next 2 Weeks)
-1. **Optimize RAG performance**
-   - Monitor query times
-   - Adjust chunk sizes if needed
-   - Fine-tune retrieval parameters
-
-2. **Add more documents** (if needed)
-   - Upload additional product design files
-   - Re-index as needed
-
-3. **User testing**
-   - Share with team/stakeholders
-   - Gather feedback
-   - Iterate on improvements
-
-### Long Term (Next Month)
-1. **Deploy to production**
-   - Choose: Modal or Nebius
-   - Set up monitoring
-   - Configure auto-scaling (if needed)
-
-2. **Enhance features**
-   - Add authentication (if needed)
-   - Add query history
-   - Add export functionality
-   - Add analytics
-
-3. **Scale and optimize**
-   - Monitor costs
-   - Optimize for performance
-   - Add caching if needed
-
-## Quick Commands Reference
-
-```bash
-# Index documents
-python src/web/query_product_design.py --index
-
-# Query via CLI
-python src/web/query_product_design.py --query "your question"
-
-# Start web interface
-python src/web/web_app.py
-# Or use helper script:
-./scripts/setup/start_web.sh
-
-# Check Modal volume contents
-modal volume list mcp-hack-ins-products
-```
-
-## Decision Points
-
-1. **Deployment Platform:**
-   - [ ] Stay with Modal (current)
-   - [ ] Migrate to Nebius
-   - [ ] Use both (hybrid)
-
-2. **Document Management:**
-   - [ ] Keep documents in Modal volume
-   - [ ] Move to object storage (S3, etc.)
-   - [ ] Use version control
-
-3. **Access Control:**
-   - [ ] Public access (current)
-   - [ ] Add authentication
-   - [ ] Add role-based access
-
-## Questions to Consider
-
-- Do you have Word/PDF/Excel versions of your product design documents?
-- Do you need to convert markdown files to Word/PDF format?
-- Are you ready to deploy to production?
-- Do you need authentication/access control?
-- What's your target user base?
-
-## Getting Help
-
-- **Documentation:** See `docs/` directory
-- **Troubleshooting:** See `docs/guides/TROUBLESHOOTING.md`
-- **Deployment:** See `docs/deployment/NEBIUS_DEPLOYMENT.md`
-- **Quick Start:** See `QUICK_START.md`
-
diff --git a/diagrams/1-indexing-flow.mmd b/diagrams/1-indexing-flow.mmd
deleted file mode 100644
index a4be2fa43c1f2b07d94192d3ce3a394f5b8776f6..0000000000000000000000000000000000000000
--- a/diagrams/1-indexing-flow.mmd
+++ /dev/null
@@ -1,28 +0,0 @@
-sequenceDiagram
-    participant User
-    participant Modal
-    participant CreateVectorDB as create_vector_db()
-    participant PDFLoader
-    participant TextSplitter
-    participant Embeddings as HuggingFaceEmbeddings<br/>(CUDA)
-    participant ChromaDB as Remote ChromaDB
-
-    User->>Modal: modal run modal-rag.py::index
-    Modal->>CreateVectorDB: Execute function
-    
-    CreateVectorDB->>PDFLoader: Load PDFs from /insurance-data
-    PDFLoader-->>CreateVectorDB: Return documents
-    
-    CreateVectorDB->>TextSplitter: Split documents (chunk_size=1000)
-    TextSplitter-->>CreateVectorDB: Return chunks
-    
-    CreateVectorDB->>Embeddings: Initialize (device='cuda')
-    CreateVectorDB->>Embeddings: Generate embeddings for chunks
-    Embeddings-->>CreateVectorDB: Return embeddings
-    
-    CreateVectorDB->>ChromaDB: Connect to remote service
-    CreateVectorDB->>ChromaDB: Upsert chunks + embeddings
-    ChromaDB-->>CreateVectorDB: Confirm storage
-    
-    CreateVectorDB-->>Modal: Complete
-    Modal-->>User: Success message
diff --git a/diagrams/1-indexing-flow.svg b/diagrams/1-indexing-flow.svg
index 20fa3fbabe654964249f4157f926b6d41dc3da00..bccdedef49627bbe1543c9106a56f402626f3f73 100644
--- a/diagrams/1-indexing-flow.svg
+++ b/diagrams/1-indexing-flow.svg
@@ -1 +1,99 @@
-<svg id="my-svg" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="max-width: 1635px; background-color: transparent;" viewBox="-50 -10 1635 787" role="graphics-document document" aria-roledescription="sequence"><g><rect x="1385" y="701" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-bottom"/><text x="1460" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1460" dy="0">Remote ChromaDB</tspan></text></g><g><rect x="1149" y="701" fill="#eaeaea" stroke="#666" width="186" height="65" name="Embeddings" rx="3" ry="3" class="actor actor-bottom"/><text x="1242" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1242" dy="-8">HuggingFaceEmbeddings</tspan></text><text x="1242" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1242" dy="8">(CUDA)</tspan></text></g><g><rect x="949" y="701" fill="#eaeaea" stroke="#666" width="150" height="65" name="TextSplitter" rx="3" ry="3" class="actor actor-bottom"/><text x="1024" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1024" dy="0">TextSplitter</tspan></text></g><g><rect x="749" y="701" fill="#eaeaea" stroke="#666" width="150" height="65" name="PDFLoader" rx="3" ry="3" class="actor actor-bottom"/><text x="824" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="824" dy="0">PDFLoader</tspan></text></g><g><rect x="470" y="701" fill="#eaeaea" stroke="#666" width="150" height="65" name="CreateVectorDB" rx="3" ry="3" class="actor actor-bottom"/><text x="545" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="545" dy="0">create_vector_db()</tspan></text></g><g><rect x="270" y="701" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-bottom"/><text x="345" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="345" dy="0">Modal</tspan></text></g><g><rect x="0" y="701" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-bottom"/><text x="75" y="733.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g><g><line id="actor6" x1="1460" y1="65" x2="1460" y2="701" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="ChromaDB"/><g id="root-6"><rect x="1385" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-top"/><text x="1460" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1460" dy="0">Remote ChromaDB</tspan></text></g></g><g><line id="actor5" x1="1242" y1="65" x2="1242" y2="701" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Embeddings"/><g id="root-5"><rect x="1149" y="0" fill="#eaeaea" stroke="#666" width="186" height="65" name="Embeddings" rx="3" ry="3" class="actor actor-top"/><text x="1242" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1242" dy="-8">HuggingFaceEmbeddings</tspan></text><text x="1242" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1242" dy="8">(CUDA)</tspan></text></g></g><g><line id="actor4" x1="1024" y1="65" x2="1024" y2="701" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="TextSplitter"/><g id="root-4"><rect x="949" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="TextSplitter" rx="3" ry="3" class="actor actor-top"/><text x="1024" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1024" dy="0">TextSplitter</tspan></text></g></g><g><line id="actor3" x1="824" y1="65" x2="824" y2="701" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="PDFLoader"/><g id="root-3"><rect x="749" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="PDFLoader" rx="3" ry="3" class="actor actor-top"/><text x="824" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="824" dy="0">PDFLoader</tspan></text></g></g><g><line id="actor2" x1="545" y1="65" x2="545" y2="701" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="CreateVectorDB"/><g id="root-2"><rect x="470" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="CreateVectorDB" rx="3" ry="3" class="actor actor-top"/><text x="545" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="545" dy="0">create_vector_db()</tspan></text></g></g><g><line id="actor1" x1="345" y1="65" x2="345" y2="701" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Modal"/><g id="root-1"><rect x="270" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-top"/><text x="345" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="345" dy="0">Modal</tspan></text></g></g><g><line id="actor0" x1="75" y1="65" x2="75" y2="701" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="User"/><g id="root-0"><rect x="0" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-top"/><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g></g><style>#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#a44141;}#my-svg .error-text{fill:#ddd;stroke:#ddd;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:lightgrey;stroke:lightgrey;}#my-svg .marker.cross{stroke:lightgrey;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .actor{stroke:#ccc;fill:#1f2020;}#my-svg text.actor&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .actor-line{stroke:#ccc;}#my-svg .innerArc{stroke-width:1.5;stroke-dasharray:none;}#my-svg .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:lightgrey;}#my-svg .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:lightgrey;}#my-svg #arrowhead path{fill:lightgrey;stroke:lightgrey;}#my-svg .sequenceNumber{fill:black;}#my-svg #sequencenumber{fill:lightgrey;}#my-svg #crosshead path{fill:lightgrey;stroke:lightgrey;}#my-svg .messageText{fill:lightgrey;stroke:none;}#my-svg .labelBox{stroke:#ccc;fill:#1f2020;}#my-svg .labelText,#my-svg .labelText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopText,#my-svg .loopText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:#ccc;fill:#ccc;}#my-svg .note{stroke:hsl(180, 0%, 18.3529411765%);fill:hsl(180, 1.5873015873%, 28.3529411765%);}#my-svg .noteText,#my-svg .noteText&gt;tspan{fill:rgb(183.8476190475, 181.5523809523, 181.5523809523);stroke:none;}#my-svg .activation0{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation1{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation2{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .actorPopupMenu{position:absolute;}#my-svg .actorPopupMenuPanel{position:absolute;fill:#1f2020;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#my-svg .actor-man line{stroke:#ccc;fill:#1f2020;}#my-svg .actor-man circle,#my-svg line{stroke:#ccc;fill:#1f2020;stroke-width:2px;}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g/><defs><symbol id="computer" width="24" height="24"><path transform="scale(.5)" d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z"/></symbol></defs><defs><symbol id="database" fill-rule="evenodd" clip-rule="evenodd"><path transform="scale(.5)" d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z"/></symbol></defs><defs><symbol id="clock" width="24" height="24"><path transform="scale(.5)" d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z"/></symbol></defs><defs><marker id="arrowhead" refX="7.9" refY="5" markerUnits="userSpaceOnUse" markerWidth="12" markerHeight="12" orient="auto-start-reverse"><path d="M -1 0 L 10 5 L 0 10 z"/></marker></defs><defs><marker id="crosshead" markerWidth="15" markerHeight="8" orient="auto" refX="4" refY="4.5"><path fill="none" stroke="#000000" stroke-width="1pt" d="M 1,2 L 6,7 M 6,2 L 1,7" style="stroke-dasharray: 0, 0;"/></marker></defs><defs><marker id="filled-head" refX="15.5" refY="7" markerWidth="20" markerHeight="28" orient="auto"><path d="M 18,7 L9,13 L14,7 L9,1 Z"/></marker></defs><defs><marker id="sequencenumber" refX="15" refY="15" markerWidth="60" markerHeight="40" orient="auto"><circle cx="15" cy="15" r="6"/></marker></defs><text x="209" y="80" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">modal run modal-rag.py::index</text><line x1="76" y1="109" x2="341" y2="109" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="444" y="124" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Execute function</text><line x1="346" y1="153" x2="541" y2="153" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="683" y="168" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Load PDFs from /insurance-data</text><line x1="546" y1="197" x2="820" y2="197" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="686" y="212" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return documents</text><line x1="823" y1="241" x2="549" y2="241" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="783" y="256" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Split documents (chunk_size=1000)</text><line x1="546" y1="285" x2="1020" y2="285" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="786" y="300" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return chunks</text><line x1="1023" y1="329" x2="549" y2="329" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="892" y="344" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Initialize (device='cuda')</text><line x1="546" y1="373" x2="1238" y2="373" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="892" y="388" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generate embeddings for chunks</text><line x1="546" y1="417" x2="1238" y2="417" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="895" y="432" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return embeddings</text><line x1="1241" y1="461" x2="549" y2="461" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="1001" y="476" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Connect to remote service</text><line x1="546" y1="505" x2="1456" y2="505" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1001" y="520" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Upsert chunks + embeddings</text><line x1="546" y1="549" x2="1456" y2="549" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1004" y="564" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Confirm storage</text><line x1="1459" y1="593" x2="549" y2="593" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="447" y="608" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Complete</text><line x1="544" y1="637" x2="349" y2="637" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="212" y="652" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Success message</text><line x1="344" y1="681" x2="79" y2="681" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/></svg>
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1000 600">
+  <defs>
+    <style>
+      .title { font: bold 20px sans-serif; fill: #1a1a1a; }
+      .subtitle { font: 14px sans-serif; fill: #666; }
+      .box { fill: #f0f7ff; stroke: #2563eb; stroke-width: 2; }
+      .box-header { fill: #2563eb; }
+      .box-text { font: 13px sans-serif; fill: #1a1a1a; }
+      .box-header-text { font: bold 13px sans-serif; fill: white; }
+      .arrow { stroke: #2563eb; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
+      .metric { font: 11px monospace; fill: #059669; font-weight: bold; }
+    </style>
+    <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
+      <polygon points="0 0, 10 3, 0 6" fill="#2563eb" />
+    </marker>
+  </defs>
+  
+  <!-- Title -->
+  <text x="500" y="35" class="title" text-anchor="middle">RAG Indexing Flow - Document Processing</text>
+  <text x="500" y="55" class="subtitle" text-anchor="middle">Insurance Product Documents → Vector Database</text>
+  
+  <!-- Step 1: PDF Documents -->
+  <rect x="50" y="100" width="180" height="100" class="box" rx="8"/>
+  <rect x="50" y="100" width="180" height="35" class="box-header" rx="8"/>
+  <text x="140" y="122" class="box-header-text" text-anchor="middle">1. PDF DOCUMENTS</text>
+  <text x="140" y="155" class="box-text" text-anchor="middle">📄 Insurance PDFs</text>
+  <text x="140" y="175" class="box-text" text-anchor="middle">MetLife, AIG,</text>
+  <text x="140" y="192" class="box-text" text-anchor="middle">Japan Post, Sonpo</text>
+  
+  <!-- Arrow 1 -->
+  <path d="M 230 150 L 270 150" class="arrow"/>
+  
+  <!-- Step 2: Text Extraction -->
+  <rect x="270" y="100" width="180" height="100" class="box" rx="8"/>
+  <rect x="270" y="100" width="180" height="35" class="box-header" rx="8"/>
+  <text x="360" y="122" class="box-header-text" text-anchor="middle">2. TEXT EXTRACTION</text>
+  <text x="360" y="155" class="box-text" text-anchor="middle">PyPDF Loader</text>
+  <text x="360" y="175" class="box-text" text-anchor="middle">Page-by-page</text>
+  <text x="360" y="192" class="metric" text-anchor="middle">→ Documents</text>
+  
+  <!-- Arrow 2 -->
+  <path d="M 450 150 L 490 150" class="arrow"/>
+  
+  <!-- Step 3: Text Chunking -->
+  <rect x="490" y="100" width="200" height="100" class="box" rx="8"/>
+  <rect x="490" y="100" width="200" height="35" class="box-header" rx="8"/>
+  <text x="590" y="122" class="box-header-text" text-anchor="middle">3. TEXT CHUNKING</text>
+  <text x="590" y="155" class="box-text" text-anchor="middle">RecursiveCharacterSplitter</text>
+  <text x="590" y="175" class="box-text" text-anchor="middle">Chunk: 1000 chars</text>
+  <text x="590" y="192" class="metric" text-anchor="middle">→ 3,766 chunks</text>
+  
+  <!-- Arrow 3 -->
+  <path d="M 590 200 L 590 240" class="arrow"/>
+  
+  <!-- Step 4: Embedding Generation -->
+  <rect x="490" y="240" width="200" height="100" class="box" rx="8"/>
+  <rect x="490" y="240" width="200" height="35" class="box-header" rx="8"/>
+  <text x="590" y="262" class="box-header-text" text-anchor="middle">4. EMBEDDINGS</text>
+  <text x="590" y="295" class="box-text" text-anchor="middle">📊 bge-small-en-v1.5</text>
+  <text x="590" y="315" class="box-text" text-anchor="middle">384-dim vectors</text>
+  <text x="590" y="332" class="metric" text-anchor="middle">GPU accelerated</text>
+  
+  <!-- Arrow 4 -->
+  <path d="M 490 290 L 450 290" class="arrow"/>
+  
+  <!-- Step 5: ChromaDB Storage -->
+  <rect x="250" y="240" width="200" height="100" class="box" rx="8"/>
+  <rect x="250" y="240" width="200" height="35" class="box-header" rx="8"/>
+  <text x="350" y="262" class="box-header-text" text-anchor="middle">5. VECTOR DB</text>
+  <text x="350" y="295" class="box-text" text-anchor="middle">💾 ChromaDB</text>
+  <text x="350" y="315" class="box-text" text-anchor="middle">Collection: langchain</text>
+  <text x="350" y="332" class="metric" text-anchor="middle">Persisted locally</text>
+  
+  <!-- Arrow 5 -->
+  <path d="M 350 340 L 350 380" class="arrow"/>
+  
+  <!-- Step 6: Ready for Queries -->
+  <rect x="250" y="380" width="200" height="80" fill="#d1fae5" stroke="#059669" stroke-width="2" rx="8"/>
+  <rect x="250" y="380" width="200" height="35" class="box-header" rx="8"/>
+  <text x="350" y="402" class="box-header-text" text-anchor="middle">6. READY ✅</text>
+  <text x="350" y="430" class="box-text" text-anchor="middle">Fast similarity search</text>
+  <text x="350" y="450" class="metric" text-anchor="middle">~400ms retrieval</text>
+  
+  <!-- Stats Box -->
+  <rect x="50" y="380" width="180" height="150" fill="#fef3c7" stroke="#d97706" stroke-width="2" rx="8"/>
+  <text x="140" y="405" class="box-text" text-anchor="middle" font-weight="bold">Statistics</text>
+  <text x="60" y="430" class="box-text">Total Docs:</text>
+  <text x="210" y="430" class="metric" text-anchor="end">3,766</text>
+  <text x="60" y="455" class="box-text">Chunk Size:</text>
+  <text x="210" y="455" class="metric" text-anchor="end">1,000</text>
+  <text x="60" y="480" class="box-text">Overlap:</text>
+  <text x="210" y="480" class="metric" text-anchor="end">200</text>
+  <text x="60" y="505" class="box-text">Vector Dim:</text>
+  <text x="210" y="505" class="metric" text-anchor="end">384</text>
+  
+  <!-- Tech Stack -->
+  <rect x="50" y="550" width="900" height="40" fill="#f0f7ff" stroke="#2563eb" stroke-width="1" rx="6"/>
+  <text x="500" y="575" class="box-text" text-anchor="middle">LangChain 0.3.7 • ChromaDB 0.5.20 • PyPDF 5.1.0 • sentence-transformers 3.3.0</text>
+</svg>
\ No newline at end of file
diff --git a/diagrams/2-query-flow-medium.mmd b/diagrams/2-query-flow-medium.mmd
deleted file mode 100644
index 0aa35692e8b4958e45453c761530e639a22ab907..0000000000000000000000000000000000000000
--- a/diagrams/2-query-flow-medium.mmd
+++ /dev/null
@@ -1,25 +0,0 @@
-sequenceDiagram
-    participant User
-    participant Modal
-    participant RAGModel
-    participant Embeddings
-    participant ChromaDB
-    participant LLM
-
-    User->>Modal: modal run query --question "..."
-    
-    Note over Modal,RAGModel: Container Startup (if cold)
-    Modal->>RAGModel: Initialize
-    RAGModel->>Embeddings: Load embedding model (GPU)
-    RAGModel->>LLM: Load Mistral-7B (GPU)
-    
-    Note over Modal,LLM: Query Processing
-    Modal->>RAGModel: Process question
-    RAGModel->>Embeddings: Convert question to vector
-    RAGModel->>ChromaDB: Search similar documents
-    ChromaDB-->>RAGModel: Top 3 matching docs
-    
-    RAGModel->>LLM: Generate answer + context
-    LLM-->>RAGModel: Answer
-    
-    RAGModel-->>User: Display answer + sources
diff --git a/diagrams/2-query-flow-medium.svg b/diagrams/2-query-flow-medium.svg
index d024cefc9782066a86b69186b96f89ef2832b0fe..20a6f0a7accd1f41b1bf27263cd8736d78a74008 100644
--- a/diagrams/2-query-flow-medium.svg
+++ b/diagrams/2-query-flow-medium.svg
@@ -1 +1,106 @@
-<svg id="my-svg" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="max-width: 1393px; background-color: transparent;" viewBox="-50 -10 1393 753" role="graphics-document document" aria-roledescription="sequence"><g><rect x="1143" y="667" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-bottom"/><text x="1218" y="699.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1218" dy="0">LLM</tspan></text></g><g><rect x="943" y="667" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-bottom"/><text x="1018" y="699.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1018" dy="0">ChromaDB</tspan></text></g><g><rect x="743" y="667" fill="#eaeaea" stroke="#666" width="150" height="65" name="Embeddings" rx="3" ry="3" class="actor actor-bottom"/><text x="818" y="699.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="818" dy="0">Embeddings</tspan></text></g><g><rect x="474" y="667" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-bottom"/><text x="549" y="699.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="549" dy="0">RAGModel</tspan></text></g><g><rect x="274" y="667" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-bottom"/><text x="349" y="699.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="349" dy="0">Modal</tspan></text></g><g><rect x="0" y="667" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-bottom"/><text x="75" y="699.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g><g><line id="actor5" x1="1218" y1="65" x2="1218" y2="667" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="LLM"/><g id="root-5"><rect x="1143" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-top"/><text x="1218" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1218" dy="0">LLM</tspan></text></g></g><g><line id="actor4" x1="1018" y1="65" x2="1018" y2="667" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="ChromaDB"/><g id="root-4"><rect x="943" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-top"/><text x="1018" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1018" dy="0">ChromaDB</tspan></text></g></g><g><line id="actor3" x1="818" y1="65" x2="818" y2="667" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Embeddings"/><g id="root-3"><rect x="743" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Embeddings" rx="3" ry="3" class="actor actor-top"/><text x="818" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="818" dy="0">Embeddings</tspan></text></g></g><g><line id="actor2" x1="549" y1="65" x2="549" y2="667" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="RAGModel"/><g id="root-2"><rect x="474" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-top"/><text x="549" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="549" dy="0">RAGModel</tspan></text></g></g><g><line id="actor1" x1="349" y1="65" x2="349" y2="667" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Modal"/><g id="root-1"><rect x="274" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-top"/><text x="349" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="349" dy="0">Modal</tspan></text></g></g><g><line id="actor0" x1="75" y1="65" x2="75" y2="667" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="User"/><g id="root-0"><rect x="0" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-top"/><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g></g><style>#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#a44141;}#my-svg .error-text{fill:#ddd;stroke:#ddd;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:lightgrey;stroke:lightgrey;}#my-svg .marker.cross{stroke:lightgrey;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .actor{stroke:#ccc;fill:#1f2020;}#my-svg text.actor&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .actor-line{stroke:#ccc;}#my-svg .innerArc{stroke-width:1.5;stroke-dasharray:none;}#my-svg .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:lightgrey;}#my-svg .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:lightgrey;}#my-svg #arrowhead path{fill:lightgrey;stroke:lightgrey;}#my-svg .sequenceNumber{fill:black;}#my-svg #sequencenumber{fill:lightgrey;}#my-svg #crosshead path{fill:lightgrey;stroke:lightgrey;}#my-svg .messageText{fill:lightgrey;stroke:none;}#my-svg .labelBox{stroke:#ccc;fill:#1f2020;}#my-svg .labelText,#my-svg .labelText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopText,#my-svg .loopText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:#ccc;fill:#ccc;}#my-svg .note{stroke:hsl(180, 0%, 18.3529411765%);fill:hsl(180, 1.5873015873%, 28.3529411765%);}#my-svg .noteText,#my-svg .noteText&gt;tspan{fill:rgb(183.8476190475, 181.5523809523, 181.5523809523);stroke:none;}#my-svg .activation0{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation1{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation2{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .actorPopupMenu{position:absolute;}#my-svg .actorPopupMenuPanel{position:absolute;fill:#1f2020;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#my-svg .actor-man line{stroke:#ccc;fill:#1f2020;}#my-svg .actor-man circle,#my-svg line{stroke:#ccc;fill:#1f2020;stroke-width:2px;}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g/><defs><symbol id="computer" width="24" height="24"><path transform="scale(.5)" d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z"/></symbol></defs><defs><symbol id="database" fill-rule="evenodd" clip-rule="evenodd"><path transform="scale(.5)" d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z"/></symbol></defs><defs><symbol id="clock" width="24" height="24"><path transform="scale(.5)" d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z"/></symbol></defs><defs><marker id="arrowhead" refX="7.9" refY="5" markerUnits="userSpaceOnUse" markerWidth="12" markerHeight="12" orient="auto-start-reverse"><path d="M -1 0 L 10 5 L 0 10 z"/></marker></defs><defs><marker id="crosshead" markerWidth="15" markerHeight="8" orient="auto" refX="4" refY="4.5"><path fill="none" stroke="#000000" stroke-width="1pt" d="M 1,2 L 6,7 M 6,2 L 1,7" style="stroke-dasharray: 0, 0;"/></marker></defs><defs><marker id="filled-head" refX="15.5" refY="7" markerWidth="20" markerHeight="28" orient="auto"><path d="M 18,7 L9,13 L14,7 L9,1 Z"/></marker></defs><defs><marker id="sequencenumber" refX="15" refY="15" markerWidth="60" markerHeight="40" orient="auto"><circle cx="15" cy="15" r="6"/></marker></defs><g><rect x="324" y="119" fill="#EDF2AE" stroke="#666" width="250" height="39" class="note"/><text x="449" y="124" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="449">Container Startup (if cold)</tspan></text></g><g><rect x="324" y="300" fill="#EDF2AE" stroke="#666" width="919" height="39" class="note"/><text x="784" y="305" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="784">Query Processing</tspan></text></g><text x="211" y="80" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">modal run query --question "..."</text><line x1="76" y1="109" x2="345" y2="109" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="448" y="173" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Initialize</text><line x1="350" y1="202" x2="545" y2="202" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="682" y="217" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Load embedding model (GPU)</text><line x1="550" y1="246" x2="814" y2="246" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="882" y="261" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Load Mistral-7B (GPU)</text><line x1="550" y1="290" x2="1214" y2="290" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="448" y="354" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Process question</text><line x1="350" y1="383" x2="545" y2="383" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="682" y="398" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Convert question to vector</text><line x1="550" y1="427" x2="814" y2="427" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="782" y="442" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Search similar documents</text><line x1="550" y1="471" x2="1014" y2="471" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="785" y="486" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Top 3 matching docs</text><line x1="1017" y1="515" x2="553" y2="515" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="882" y="530" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generate answer + context</text><line x1="550" y1="559" x2="1214" y2="559" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="885" y="574" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Answer</text><line x1="1217" y1="603" x2="553" y2="603" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="314" y="618" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Display answer + sources</text><line x1="548" y1="647" x2="79" y2="647" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/></svg>
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1000 700">
+  <defs>
+    <style>
+      .title { font: bold 22px sans-serif; fill: #1a1a1a; }
+      .subtitle { font: 15px sans-serif; fill: #666; }
+      .box { fill: #f0f7ff; stroke: #2563eb; stroke-width: 2; }
+      .box-header { fill: #2563eb; }
+      .box-text { font: 13px sans-serif; fill: #1a1a1a; }
+      .box-header-text { font: bold 13px sans-serif; fill: white; }
+      .arrow { stroke: #2563eb; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
+      .metric { font: 12px monospace; fill: #059669; font-weight: bold; }
+      .fast { fill: #d1fae5; stroke: #059669; }
+    </style>
+    <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
+      <polygon points="0 0, 10 3, 0 6" fill="#2563eb" />
+    </marker>
+  </defs>
+  
+  <!-- Title -->
+  <text x="500" y="40" class="title" text-anchor="middle">RAG Query Flow - Medium Detail</text>
+  <text x="500" y="65" class="subtitle" text-anchor="middle">Optimized Retrieval-Augmented Generation Pipeline</text>
+  
+  <!-- User Query -->
+  <rect x="50" y="120" width="160" height="90" class="box" rx="8"/>
+  <rect x="50" y="120" width="160" height="35" class="box-header" rx="8"/>
+  <text x="130" y="142" class="box-header-text" text-anchor="middle">USER QUERY</text>
+  <text x="130" y="175" class="box-text" text-anchor="middle">💬 Question</text>
+  <text x="130" y="195" class="box-text" text-anchor="middle">via API</text>
+  
+  <path d="M 210 165 L 260 165" class="arrow"/>
+  
+  <!-- Embedding -->
+  <rect x="260" y="120" width="180" height="90" class="box" rx="8"/>
+  <rect x="260" y="120" width="180" height="35" class="box-header" rx="8"/>
+  <text x="350" y="142" class="box-header-text" text-anchor="middle">EMBEDDING</text>
+  <text x="350" y="175" class="box-text" text-anchor="middle">bge-small-en-v1.5</text>
+  <text x="350" y="195" class="metric" text-anchor="middle">~2ms</text>
+  
+  <path d="M 440 165 L 490 165" class="arrow"/>
+  
+  <!-- Vector Search -->
+  <rect x="490" y="120" width="200" height="90" class="fast" rx="8"/>
+  <rect x="490" y="120" width="200" height="35" class="box-header" rx="8"/>
+  <text x="590" y="142" class="box-header-text" text-anchor="middle">VECTOR SEARCH</text>
+  <text x="590" y="175" class="box-text" text-anchor="middle">ChromaDB (Local)</text>
+  <text x="590" y="195" class="metric" text-anchor="middle">~400ms ⚡</text>
+  
+  <path d="M 590 210 L 590 260" class="arrow"/>
+  
+  <!-- Retrieved Context -->
+  <rect x="490" y="260" width="200" height="90" class="box" rx="8"/>
+  <rect x="490" y="260" width="200" height="35" class="box-header" rx="8"/>
+  <text x="590" y="282" class="box-header-text" text-anchor="middle">CONTEXT</text>
+  <text x="590" y="315" class="box-text" text-anchor="middle">Top-3 documents</text>
+  <text x="590" y="335" class="box-text" text-anchor="middle">+ metadata</text>
+  
+  <path d="M 490 305 L 440 305" class="arrow"/>
+  
+  <!-- Prompt Construction -->
+  <rect x="260" y="260" width="180" height="90" class="box" rx="8"/>
+  <rect x="260" y="260" width="180" height="35" class="box-header" rx="8"/>
+  <text x="350" y="282" class="box-header-text" text-anchor="middle">PROMPT</text>
+  <text x="350" y="315" class="box-text" text-anchor="middle">Alpaca Template</text>
+  <text x="350" y="335" class="box-text" text-anchor="middle">Context + Question</text>
+  
+  <path d="M 350 350 L 350 400" class="arrow"/>
+  
+  <!-- vLLM Generation -->
+  <rect x="210" y="400" width="280" height="120" class="fast" rx="8"/>
+  <rect x="210" y="400" width="280" height="40" class="box-header" rx="8"/>
+  <text x="350" y="425" class="box-header-text" text-anchor="middle" font-size="15">vLLM ENGINE</text>
+  <text x="350" y="460" class="box-text" text-anchor="middle">Phi-3-mini (Fine-tuned)</text>
+  <text x="350" y="480" class="box-text" text-anchor="middle">AsyncLLMEngine</text>
+  <text x="350" y="505" class="metric" text-anchor="middle">~2-3s</text>
+  
+  <path d="M 490 460 L 540 460" class="arrow"/>
+  
+  <!-- Response -->
+  <rect x="540" y="400" width="200" height="120" class="box" rx="8"/>
+  <rect x="540" y="400" width="200" height="40" class="box-header" rx="8"/>
+  <text x="640" y="425" class="box-header-text" text-anchor="middle">RESPONSE</text>
+  <text x="640" y="460" class="box-text" text-anchor="middle">✅ Answer</text>
+  <text x="640" y="480" class="box-text" text-anchor="middle">📄 Sources</text>
+  <text x="640" y="505" class="metric" text-anchor="middle">Total: &lt;3s</text>
+  
+  <!-- Performance Box -->
+  <rect x="50" y="400" width="140" height="180" fill="#f0fdf4" stroke="#059669" stroke-width="2" rx="8"/>
+  <text x="120" y="425" class="box-text" text-anchor="middle" font-weight="bold">Latency</text>
+  <text x="60" y="455" class="box-text">Embed:</text>
+  <text x="170" y="455" class="metric" text-anchor="end">2ms</text>
+  <text x="60" y="480" class="box-text">Search:</text>
+  <text x="170" y="480" class="metric" text-anchor="end">400ms</text>
+  <text x="60" y="505" class="box-text">Generate:</text>
+  <text x="170" y="505" class="metric" text-anchor="end">2-3s</text>
+  <rect x="60" y="520" width="110" height="30" fill="#d1fae5" stroke="#059669" stroke-width="1" rx="4"/>
+  <text x="120" y="542" class="metric" text-anchor="middle" font-size="14">&lt;3s ⚡</text>
+  <text x="120" y="570" class="box-text" text-anchor="middle" font-size="11">Modal A10G</text>
+  
+  <!-- Tech Stack -->
+  <rect x="50" y="610" width="690" height="70" fill="#fef3c7" stroke="#d97706" stroke-width="2" rx="8"/>
+  <text x="395" y="640" class="box-text" text-anchor="middle" font-weight="bold" font-size="15">Technology Stack</text>
+  <text x="395" y="665" class="box-text" text-anchor="middle">vLLM 0.6.3 • ChromaDB 0.5.20 • LangChain 0.3.7 • sentence-transformers 3.3.0 • FastAPI</text>
+  
+  <!-- Footer -->
+  <text x="500" y="695" class="subtitle" text-anchor="middle" font-size="11">Endpoint: rag-vllm-optimized | Updated: 2025-11-30</text>
+</svg>
\ No newline at end of file
diff --git a/diagrams/2-query-flow-simple.mmd b/diagrams/2-query-flow-simple.mmd
deleted file mode 100644
index 19b7abc9fd590ab43c4df2d6fb3e18d6504762b6..0000000000000000000000000000000000000000
--- a/diagrams/2-query-flow-simple.mmd
+++ /dev/null
@@ -1,19 +0,0 @@
-sequenceDiagram
-    participant User
-    participant Modal
-    participant RAGModel
-    participant ChromaDB
-    participant LLM as Mistral-7B
-
-    User->>Modal: Ask question
-    Modal->>RAGModel: Initialize (warm container)
-    
-    Note over RAGModel: Load models on GPU
-    
-    RAGModel->>ChromaDB: Search for relevant docs
-    ChromaDB-->>RAGModel: Return top 3 documents
-    
-    RAGModel->>LLM: Generate answer with context
-    LLM-->>RAGModel: Generated answer
-    
-    RAGModel-->>User: Answer + Sources
diff --git a/diagrams/2-query-flow-simple.svg b/diagrams/2-query-flow-simple.svg
index ec035acaa8d7ddb6b3a5b42daefc99dac2702eec..32060b85a9fca1b0dfa41df0f86e3809d5b78ae8 100644
--- a/diagrams/2-query-flow-simple.svg
+++ b/diagrams/2-query-flow-simple.svg
@@ -1 +1,105 @@
-<svg id="my-svg" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="max-width: 1116px; background-color: transparent;" viewBox="-50 -10 1116 528" role="graphics-document document" aria-roledescription="sequence"><g><rect x="866" y="442" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-bottom"/><text x="941" y="474.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="941" dy="0">Mistral-7B</tspan></text></g><g><rect x="666" y="442" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-bottom"/><text x="741" y="474.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="741" dy="0">ChromaDB</tspan></text></g><g><rect x="441" y="442" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-bottom"/><text x="516" y="474.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="516" dy="0">RAGModel</tspan></text></g><g><rect x="200" y="442" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-bottom"/><text x="275" y="474.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="275" dy="0">Modal</tspan></text></g><g><rect x="0" y="442" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-bottom"/><text x="75" y="474.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g><g><line id="actor4" x1="941" y1="65" x2="941" y2="442" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="LLM"/><g id="root-4"><rect x="866" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-top"/><text x="941" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="941" dy="0">Mistral-7B</tspan></text></g></g><g><line id="actor3" x1="741" y1="65" x2="741" y2="442" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="ChromaDB"/><g id="root-3"><rect x="666" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-top"/><text x="741" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="741" dy="0">ChromaDB</tspan></text></g></g><g><line id="actor2" x1="516" y1="65" x2="516" y2="442" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="RAGModel"/><g id="root-2"><rect x="441" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-top"/><text x="516" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="516" dy="0">RAGModel</tspan></text></g></g><g><line id="actor1" x1="275" y1="65" x2="275" y2="442" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Modal"/><g id="root-1"><rect x="200" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-top"/><text x="275" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="275" dy="0">Modal</tspan></text></g></g><g><line id="actor0" x1="75" y1="65" x2="75" y2="442" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="User"/><g id="root-0"><rect x="0" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-top"/><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g></g><style>#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#a44141;}#my-svg .error-text{fill:#ddd;stroke:#ddd;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:lightgrey;stroke:lightgrey;}#my-svg .marker.cross{stroke:lightgrey;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .actor{stroke:#ccc;fill:#1f2020;}#my-svg text.actor&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .actor-line{stroke:#ccc;}#my-svg .innerArc{stroke-width:1.5;stroke-dasharray:none;}#my-svg .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:lightgrey;}#my-svg .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:lightgrey;}#my-svg #arrowhead path{fill:lightgrey;stroke:lightgrey;}#my-svg .sequenceNumber{fill:black;}#my-svg #sequencenumber{fill:lightgrey;}#my-svg #crosshead path{fill:lightgrey;stroke:lightgrey;}#my-svg .messageText{fill:lightgrey;stroke:none;}#my-svg .labelBox{stroke:#ccc;fill:#1f2020;}#my-svg .labelText,#my-svg .labelText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopText,#my-svg .loopText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:#ccc;fill:#ccc;}#my-svg .note{stroke:hsl(180, 0%, 18.3529411765%);fill:hsl(180, 1.5873015873%, 28.3529411765%);}#my-svg .noteText,#my-svg .noteText&gt;tspan{fill:rgb(183.8476190475, 181.5523809523, 181.5523809523);stroke:none;}#my-svg .activation0{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation1{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation2{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .actorPopupMenu{position:absolute;}#my-svg .actorPopupMenuPanel{position:absolute;fill:#1f2020;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#my-svg .actor-man line{stroke:#ccc;fill:#1f2020;}#my-svg .actor-man circle,#my-svg line{stroke:#ccc;fill:#1f2020;stroke-width:2px;}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g/><defs><symbol id="computer" width="24" height="24"><path transform="scale(.5)" d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z"/></symbol></defs><defs><symbol id="database" fill-rule="evenodd" clip-rule="evenodd"><path transform="scale(.5)" d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z"/></symbol></defs><defs><symbol id="clock" width="24" height="24"><path transform="scale(.5)" d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z"/></symbol></defs><defs><marker id="arrowhead" refX="7.9" refY="5" markerUnits="userSpaceOnUse" markerWidth="12" markerHeight="12" orient="auto-start-reverse"><path d="M -1 0 L 10 5 L 0 10 z"/></marker></defs><defs><marker id="crosshead" markerWidth="15" markerHeight="8" orient="auto" refX="4" refY="4.5"><path fill="none" stroke="#000000" stroke-width="1pt" d="M 1,2 L 6,7 M 6,2 L 1,7" style="stroke-dasharray: 0, 0;"/></marker></defs><defs><marker id="filled-head" refX="15.5" refY="7" markerWidth="20" markerHeight="28" orient="auto"><path d="M 18,7 L9,13 L14,7 L9,1 Z"/></marker></defs><defs><marker id="sequencenumber" refX="15" refY="15" markerWidth="60" markerHeight="40" orient="auto"><circle cx="15" cy="15" r="6"/></marker></defs><g><rect x="436.5" y="163" fill="#EDF2AE" stroke="#666" width="159" height="39" class="note"/><text x="516" y="168" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="516">Load models on GPU</tspan></text></g><text x="174" y="80" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Ask question</text><line x1="76" y1="109" x2="271" y2="109" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="394" y="124" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Initialize (warm container)</text><line x1="276" y1="153" x2="512" y2="153" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="627" y="217" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Search for relevant docs</text><line x1="517" y1="246" x2="737" y2="246" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="630" y="261" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return top 3 documents</text><line x1="740" y1="290" x2="520" y2="290" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="727" y="305" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generate answer with context</text><line x1="517" y1="334" x2="937" y2="334" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="730" y="349" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generated answer</text><line x1="940" y1="378" x2="520" y2="378" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="297" y="393" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Answer + Sources</text><line x1="515" y1="422" x2="79" y2="422" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/></svg>
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 700">
+  <defs>
+    <style>
+      .title { font: bold 24px sans-serif; fill: #1a1a1a; }
+      .subtitle { font: 16px sans-serif; fill: #666; }
+      .box { fill: #f0f7ff; stroke: #2563eb; stroke-width: 2; }
+      .box-header { fill: #2563eb; }
+      .box-text { font: 14px sans-serif; fill: #1a1a1a; }
+      .box-header-text { font: bold 14px sans-serif; fill: white; }
+      .arrow { stroke: #2563eb; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
+      .metric { font: 12px monospace; fill: #059669; font-weight: bold; }
+      .fast { fill: #d1fae5; stroke: #059669; }
+    </style>
+    <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
+      <polygon points="0 0, 10 3, 0 6" fill="#2563eb" />
+    </marker>
+  </defs>
+  
+  <!-- Title -->
+  <text x="600" y="40" class="title" text-anchor="middle">RAG Query Flow - vLLM Optimized</text>
+  <text x="600" y="65" class="subtitle" text-anchor="middle">High-Performance Retrieval-Augmented Generation</text>
+  
+  <!-- Step 1: User Query -->
+  <rect x="50" y="120" width="180" height="120" class="box" rx="8"/>
+  <rect x="50" y="120" width="180" height="40" class="box-header" rx="8"/>
+  <text x="140" y="145" class="box-header-text" text-anchor="middle">1. USER QUERY</text>
+  <text x="140" y="185" class="box-text" text-anchor="middle" font-size="16">💬</text>
+  <text x="140" y="210" class="box-text" text-anchor="middle">"What insurance</text>
+  <text x="140" y="230" class="box-text" text-anchor="middle">products available?"</text>
+  
+  <!-- Arrow 1 -->
+  <path d="M 230 180 L 280 180" class="arrow"/>
+  
+  <!-- Step 2: Embedding -->
+  <rect x="280" y="120" width="180" height="120" class="box" rx="8"/>
+  <rect x="280" y="120" width="180" height="40" class="box-header" rx="8"/>
+  <text x="370" y="145" class="box-header-text" text-anchor="middle">2. EMBEDDING</text>
+  <text x="370" y="180" class="box-text" text-anchor="middle">📊 bge-small-en-v1.5</text>
+  <text x="370" y="205" class="box-text" text-anchor="middle">GPU: CUDA</text>
+  <text x="370" y="230" class="metric" text-anchor="middle">~2ms</text>
+  
+  <!-- Arrow 2 -->
+  <path d="M 460 180 L 510 180" class="arrow"/>
+  
+  <!-- Step 3: Vector Search -->
+  <rect x="510" y="120" width="200" height="120" class="fast" rx="8"/>
+  <rect x="510" y="120" width="200" height="40" class="box-header" rx="8"/>
+  <text x="610" y="145" class="box-header-text" text-anchor="middle">3. VECTOR SEARCH</text>
+  <text x="610" y="180" class="box-text" text-anchor="middle">💾 ChromaDB (Local)</text>
+  <text x="610" y="205" class="box-text" text-anchor="middle">3,766 documents</text>
+  <text x="610" y="230" class="metric" text-anchor="middle">~400ms ⚡</text>
+  
+  <!-- Arrow 3 -->
+  <path d="M 610 240 L 610 290" class="arrow"/>
+  
+  <!-- Step 4: Context Retrieved -->
+  <rect x="510" y="290" width="200" height="100" class="box" rx="8"/>
+  <rect x="510" y="290" width="200" height="40" class="box-header" rx="8"/>
+  <text x="610" y="315" class="box-header-text" text-anchor="middle">4. CONTEXT</text>
+  <text x="610" y="350" class="box-text" text-anchor="middle">Top 3 documents</text>
+  <text x="610" y="375" class="box-text" text-anchor="middle">+ metadata</text>
+  
+  <!-- Arrow 4 -->
+  <path d="M 610 390 L 610 440" class="arrow"/>
+  
+  <!-- Step 5: vLLM Generation -->
+  <rect x="460" y="440" width="300" height="140" class="fast" rx="8"/>
+  <rect x="460" y="440" width="300" height="40" class="box-header" rx="8"/>
+  <text x="610" y="465" class="box-header-text" text-anchor="middle">5. vLLM GENERATION</text>
+  <text x="610" y="500" class="box-text" text-anchor="middle">🤖 Phi-3-mini (Fine-tuned)</text>
+  <text x="610" y="525" class="box-text" text-anchor="middle">AsyncLLMEngine</text>
+  <text x="610" y="550" class="box-text" text-anchor="middle">GPU Memory: 70%</text>
+  <text x="610" y="575" class="metric" text-anchor="middle">~2-3s ⚡</text>
+  
+  <!-- Arrow 5 -->
+  <path d="M 760 510 L 810 510" class="arrow"/>
+  
+  <!-- Step 6: Response -->
+  <rect x="810" y="440" width="180" height="140" class="box" rx="8"/>
+  <rect x="810" y="440" width="180" height="40" class="box-header" rx="8"/>
+  <text x="900" y="465" class="box-header-text" text-anchor="middle">6. RESPONSE</text>
+  <text x="900" y="500" class="box-text" text-anchor="middle">✅ Answer</text>
+  <text x="900" y="525" class="box-text" text-anchor="middle">📄 Sources</text>
+  <text x="900" y="550" class="box-text" text-anchor="middle">⏱️ Metrics</text>
+  <text x="900" y="575" class="metric" text-anchor="middle">Total: &lt;3s</text>
+  
+  <!-- Performance Box -->
+  <rect x="50" y="440" width="350" height="140" fill="#f0fdf4" stroke="#059669" stroke-width="2" rx="8"/>
+  <text x="225" y="470" class="box-text" text-anchor="middle" font-weight="bold" font-size="16">Performance Breakdown</text>
+  <text x="80" y="505" class="box-text">Embedding:</text>
+  <text x="300" y="505" class="metric" text-anchor="end">~2ms</text>
+  <text x="80" y="530" class="box-text">Vector Search:</text>
+  <text x="300" y="530" class="metric" text-anchor="end">~400ms</text>
+  <text x="80" y="555" class="box-text">LLM Generation:</text>
+  <text x="300" y="555" class="metric" text-anchor="end">~2-3s</text>
+  <text x="225" y="575" class="metric" text-anchor="middle" font-size="16">TOTAL: &lt;3s ✨</text>
+  
+  <!-- Architecture Details -->
+  <rect x="50" y="610" width="940" height="70" fill="#fef3c7" stroke="#d97706" stroke-width="2" rx="8"/>
+  <text x="520" y="640" class="box-text" text-anchor="middle" font-weight="bold" font-size="16">Architecture: Modal A10G GPU</text>
+  <text x="520" y="665" class="box-text" text-anchor="middle">vLLM 0.6.3 • ChromaDB 0.5.20 • LangChain 0.3.7 • sentence-transformers 3.3.0</text>
+  
+  <!-- Footer -->
+  <text x="600" y="695" class="subtitle" text-anchor="middle" font-size="12">Endpoint: rag-vllm-optimized | Updated: 2025-11-30</text>
+</svg>
\ No newline at end of file
diff --git a/diagrams/2-query-flow.mmd b/diagrams/2-query-flow.mmd
deleted file mode 100644
index 60b089f68b689e74aa8c99da0767839771e76b9a..0000000000000000000000000000000000000000
--- a/diagrams/2-query-flow.mmd
+++ /dev/null
@@ -1,39 +0,0 @@
-sequenceDiagram
-    participant User
-    participant Modal
-    participant QueryEntrypoint as query()
-    participant RAGModel
-    participant Embeddings as HuggingFaceEmbeddings<br/>(CUDA)
-    participant ChromaRetriever as RemoteChromaRetriever
-    participant ChromaDB as Remote ChromaDB
-    participant LLM as Mistral-7B<br/>(A10G GPU)
-    participant RAGChain as LangChain RAG
-
-    User->>Modal: modal run modal-rag.py::query --question "..."
-    Modal->>QueryEntrypoint: Execute local entrypoint
-    QueryEntrypoint->>RAGModel: Instantiate RAGModel()
-    
-    Note over RAGModel: @modal.enter() lifecycle
-    RAGModel->>Embeddings: Load embedding model (CUDA)
-    RAGModel->>ChromaDB: Connect to remote service
-    RAGModel->>LLM: Load Mistral-7B (A10G GPU)
-    RAGModel->>RAGModel: Initialize RemoteChromaRetriever
-    
-    QueryEntrypoint->>RAGModel: query.remote(question)
-    
-    RAGModel->>ChromaRetriever: Create retriever instance
-    RAGModel->>RAGChain: Build RAG chain
-    
-    RAGChain->>ChromaRetriever: Retrieve relevant docs
-    ChromaRetriever->>Embeddings: embed_query(question)
-    Embeddings-->>ChromaRetriever: Query embedding
-    ChromaRetriever->>ChromaDB: query(embedding, k=3)
-    ChromaDB-->>ChromaRetriever: Top-k documents
-    ChromaRetriever-->>RAGChain: Return documents
-    
-    RAGChain->>LLM: Generate answer with context
-    LLM-->>RAGChain: Generated answer
-    RAGChain-->>RAGModel: Return result
-    
-    RAGModel-->>QueryEntrypoint: Return {answer, sources}
-    QueryEntrypoint-->>User: Display answer + sources
diff --git a/diagrams/2-query-flow.svg b/diagrams/2-query-flow.svg
index c7a2cc4c5d0012adad9bfb48f6ee070a4a0b03a2..ec1219e45886634ef4496dfc64c892d02a2840a0 100644
--- a/diagrams/2-query-flow.svg
+++ b/diagrams/2-query-flow.svg
@@ -1 +1,138 @@
-<svg id="my-svg" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="max-width: 2275.5px; background-color: transparent;" viewBox="-50 -10 2275.5 1174" role="graphics-document document" aria-roledescription="sequence"><g><rect x="2025.5" y="1088" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGChain" rx="3" ry="3" class="actor actor-bottom"/><text x="2100.5" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="2100.5" dy="0">LangChain RAG</tspan></text></g><g><rect x="1764.5" y="1088" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-bottom"/><text x="1839.5" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1839.5" dy="-8">Mistral-7B</tspan></text><text x="1839.5" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1839.5" dy="8">(A10G GPU)</tspan></text></g><g><rect x="1564.5" y="1088" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-bottom"/><text x="1639.5" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1639.5" dy="0">Remote ChromaDB</tspan></text></g><g><rect x="1328" y="1088" fill="#eaeaea" stroke="#666" width="181" height="65" name="ChromaRetriever" rx="3" ry="3" class="actor actor-bottom"/><text x="1418.5" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1418.5" dy="0">RemoteChromaRetriever</tspan></text></g><g><rect x="1092" y="1088" fill="#eaeaea" stroke="#666" width="186" height="65" name="Embeddings" rx="3" ry="3" class="actor actor-bottom"/><text x="1185" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1185" dy="-8">HuggingFaceEmbeddings</tspan></text><text x="1185" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1185" dy="8">(CUDA)</tspan></text></g><g><rect x="828" y="1088" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-bottom"/><text x="903" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="903" dy="0">RAGModel</tspan></text></g><g><rect x="594" y="1088" fill="#eaeaea" stroke="#666" width="150" height="65" name="QueryEntrypoint" rx="3" ry="3" class="actor actor-bottom"/><text x="669" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="669" dy="0">query()</tspan></text></g><g><rect x="368" y="1088" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-bottom"/><text x="443" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="443" dy="0">Modal</tspan></text></g><g><rect x="0" y="1088" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-bottom"/><text x="75" y="1120.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g><g><line id="actor8" x1="2100.5" y1="65" x2="2100.5" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="RAGChain"/><g id="root-8"><rect x="2025.5" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGChain" rx="3" ry="3" class="actor actor-top"/><text x="2100.5" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="2100.5" dy="0">LangChain RAG</tspan></text></g></g><g><line id="actor7" x1="1839.5" y1="65" x2="1839.5" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="LLM"/><g id="root-7"><rect x="1764.5" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-top"/><text x="1839.5" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1839.5" dy="-8">Mistral-7B</tspan></text><text x="1839.5" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1839.5" dy="8">(A10G GPU)</tspan></text></g></g><g><line id="actor6" x1="1639.5" y1="65" x2="1639.5" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="ChromaDB"/><g id="root-6"><rect x="1564.5" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-top"/><text x="1639.5" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1639.5" dy="0">Remote ChromaDB</tspan></text></g></g><g><line id="actor5" x1="1418.5" y1="65" x2="1418.5" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="ChromaRetriever"/><g id="root-5"><rect x="1328" y="0" fill="#eaeaea" stroke="#666" width="181" height="65" name="ChromaRetriever" rx="3" ry="3" class="actor actor-top"/><text x="1418.5" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1418.5" dy="0">RemoteChromaRetriever</tspan></text></g></g><g><line id="actor4" x1="1185" y1="65" x2="1185" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Embeddings"/><g id="root-4"><rect x="1092" y="0" fill="#eaeaea" stroke="#666" width="186" height="65" name="Embeddings" rx="3" ry="3" class="actor actor-top"/><text x="1185" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1185" dy="-8">HuggingFaceEmbeddings</tspan></text><text x="1185" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1185" dy="8">(CUDA)</tspan></text></g></g><g><line id="actor3" x1="903" y1="65" x2="903" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="RAGModel"/><g id="root-3"><rect x="828" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-top"/><text x="903" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="903" dy="0">RAGModel</tspan></text></g></g><g><line id="actor2" x1="669" y1="65" x2="669" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="QueryEntrypoint"/><g id="root-2"><rect x="594" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="QueryEntrypoint" rx="3" ry="3" class="actor actor-top"/><text x="669" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="669" dy="0">query()</tspan></text></g></g><g><line id="actor1" x1="443" y1="65" x2="443" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Modal"/><g id="root-1"><rect x="368" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-top"/><text x="443" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="443" dy="0">Modal</tspan></text></g></g><g><line id="actor0" x1="75" y1="65" x2="75" y2="1088" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="User"/><g id="root-0"><rect x="0" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-top"/><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g></g><style>#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#a44141;}#my-svg .error-text{fill:#ddd;stroke:#ddd;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:lightgrey;stroke:lightgrey;}#my-svg .marker.cross{stroke:lightgrey;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .actor{stroke:#ccc;fill:#1f2020;}#my-svg text.actor&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .actor-line{stroke:#ccc;}#my-svg .innerArc{stroke-width:1.5;stroke-dasharray:none;}#my-svg .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:lightgrey;}#my-svg .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:lightgrey;}#my-svg #arrowhead path{fill:lightgrey;stroke:lightgrey;}#my-svg .sequenceNumber{fill:black;}#my-svg #sequencenumber{fill:lightgrey;}#my-svg #crosshead path{fill:lightgrey;stroke:lightgrey;}#my-svg .messageText{fill:lightgrey;stroke:none;}#my-svg .labelBox{stroke:#ccc;fill:#1f2020;}#my-svg .labelText,#my-svg .labelText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopText,#my-svg .loopText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:#ccc;fill:#ccc;}#my-svg .note{stroke:hsl(180, 0%, 18.3529411765%);fill:hsl(180, 1.5873015873%, 28.3529411765%);}#my-svg .noteText,#my-svg .noteText&gt;tspan{fill:rgb(183.8476190475, 181.5523809523, 181.5523809523);stroke:none;}#my-svg .activation0{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation1{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation2{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .actorPopupMenu{position:absolute;}#my-svg .actorPopupMenuPanel{position:absolute;fill:#1f2020;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#my-svg .actor-man line{stroke:#ccc;fill:#1f2020;}#my-svg .actor-man circle,#my-svg line{stroke:#ccc;fill:#1f2020;stroke-width:2px;}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g/><defs><symbol id="computer" width="24" height="24"><path transform="scale(.5)" d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z"/></symbol></defs><defs><symbol id="database" fill-rule="evenodd" clip-rule="evenodd"><path transform="scale(.5)" d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z"/></symbol></defs><defs><symbol id="clock" width="24" height="24"><path transform="scale(.5)" d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z"/></symbol></defs><defs><marker id="arrowhead" refX="7.9" refY="5" markerUnits="userSpaceOnUse" markerWidth="12" markerHeight="12" orient="auto-start-reverse"><path d="M -1 0 L 10 5 L 0 10 z"/></marker></defs><defs><marker id="crosshead" markerWidth="15" markerHeight="8" orient="auto" refX="4" refY="4.5"><path fill="none" stroke="#000000" stroke-width="1pt" d="M 1,2 L 6,7 M 6,2 L 1,7" style="stroke-dasharray: 0, 0;"/></marker></defs><defs><marker id="filled-head" refX="15.5" refY="7" markerWidth="20" markerHeight="28" orient="auto"><path d="M 18,7 L9,13 L14,7 L9,1 Z"/></marker></defs><defs><marker id="sequencenumber" refX="15" refY="15" markerWidth="60" markerHeight="40" orient="auto"><circle cx="15" cy="15" r="6"/></marker></defs><g><rect x="813" y="207" fill="#EDF2AE" stroke="#666" width="180" height="39" class="note"/><text x="903" y="212" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="903">@modal.enter() lifecycle</tspan></text></g><text x="258" y="80" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">modal run modal-rag.py::query --question "..."</text><line x1="76" y1="109" x2="439" y2="109" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="555" y="124" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Execute local entrypoint</text><line x1="444" y1="153" x2="665" y2="153" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="785" y="168" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Instantiate RAGModel()</text><line x1="670" y1="197" x2="899" y2="197" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1043" y="261" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Load embedding model (CUDA)</text><line x1="904" y1="290" x2="1181" y2="290" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1270" y="305" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Connect to remote service</text><line x1="904" y1="334" x2="1635.5" y2="334" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1370" y="349" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Load Mistral-7B (A10G GPU)</text><line x1="904" y1="378" x2="1835.5" y2="378" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="904" y="393" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Initialize RemoteChromaRetriever</text><path d="M 904,422 C 964,412 964,452 904,442" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="785" y="467" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">query.remote(question)</text><line x1="670" y1="496" x2="899" y2="496" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1159" y="511" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Create retriever instance</text><line x1="904" y1="540" x2="1414.5" y2="540" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1500" y="555" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Build RAG chain</text><line x1="904" y1="584" x2="2096.5" y2="584" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1761" y="599" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Retrieve relevant docs</text><line x1="2099.5" y1="628" x2="1422.5" y2="628" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1303" y="643" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">embed_query(question)</text><line x1="1417.5" y1="672" x2="1189" y2="672" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1300" y="687" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Query embedding</text><line x1="1186" y1="716" x2="1414.5" y2="716" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="1528" y="731" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">query(embedding, k=3)</text><line x1="1419.5" y1="760" x2="1635.5" y2="760" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1531" y="775" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Top-k documents</text><line x1="1638.5" y1="804" x2="1422.5" y2="804" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="1758" y="819" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return documents</text><line x1="1419.5" y1="848" x2="2096.5" y2="848" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="1972" y="863" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generate answer with context</text><line x1="2099.5" y1="892" x2="1843.5" y2="892" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1969" y="907" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generated answer</text><line x1="1840.5" y1="936" x2="2096.5" y2="936" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="1503" y="951" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return result</text><line x1="2099.5" y1="980" x2="907" y2="980" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="788" y="995" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return {answer, sources}</text><line x1="902" y1="1024" x2="673" y2="1024" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="374" y="1039" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Display answer + sources</text><line x1="668" y1="1068" x2="79" y2="1068" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/></svg>
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 800">
+  <defs>
+    <style>
+      .title { font: bold 24px sans-serif; fill: #1a1a1a; }
+      .subtitle { font: 16px sans-serif; fill: #666; }
+      .box { fill: #f0f7ff; stroke: #2563eb; stroke-width: 2; }
+      .box-header { fill: #2563eb; }
+      .box-text { font: 14px sans-serif; fill: #1a1a1a; }
+      .box-header-text { font: bold 14px sans-serif; fill: white; }
+      .arrow { stroke: #2563eb; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
+      .metric { font: 12px monospace; fill: #059669; font-weight: bold; }
+      .fast { fill: #d1fae5; stroke: #059669; }
+      .component { fill: #fef3c7; stroke: #d97706; }
+    </style>
+    <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
+      <polygon points="0 0, 10 3, 0 6" fill="#2563eb" />
+    </marker>
+  </defs>
+  
+  <!-- Title -->
+  <text x="600" y="40" class="title" text-anchor="middle">RAG Query Flow - Detailed Architecture</text>
+  <text x="600" y="65" class="subtitle" text-anchor="middle">vLLM-Optimized Retrieval-Augmented Generation System</text>
+  
+  <!-- User Request -->
+  <rect x="50" y="120" width="150" height="80" class="box" rx="8"/>
+  <rect x="50" y="120" width="150" height="35" class="box-header" rx="8"/>
+  <text x="125" y="142" class="box-header-text" text-anchor="middle">USER REQUEST</text>
+  <text x="125" y="175" class="box-text" text-anchor="middle">💬 Question</text>
+  
+  <path d="M 200 160 L 240 160" class="arrow"/>
+  
+  <!-- API Endpoint -->
+  <rect x="240" y="120" width="160" height="80" class="box" rx="8"/>
+  <rect x="240" y="120" width="160" height="35" class="box-header" rx="8"/>
+  <text x="320" y="142" class="box-header-text" text-anchor="middle">API ENDPOINT</text>
+  <text x="320" y="175" class="box-text" text-anchor="middle">FastAPI POST</text>
+  
+  <path d="M 400 160 L 440 160" class="arrow"/>
+  
+  <!-- RAG Model Container -->
+  <rect x="440" y="100" width="710" height="550" fill="#f8fafc" stroke="#475569" stroke-width="3" rx="12"/>
+  <text x="795" y="130" class="title" text-anchor="middle" font-size="18">RAG Model Container (Modal A10G GPU)</text>
+  
+  <!-- Embedding Model -->
+  <rect x="470" y="160" width="200" height="100" class="component" rx="8"/>
+  <rect x="470" y="160" width="200" height="35" class="box-header" rx="8"/>
+  <text x="570" y="182" class="box-header-text" text-anchor="middle">EMBEDDING MODEL</text>
+  <text x="570" y="210" class="box-text" text-anchor="middle">bge-small-en-v1.5</text>
+  <text x="570" y="230" class="box-text" text-anchor="middle">GPU: CUDA</text>
+  <text x="570" y="250" class="metric" text-anchor="middle">~2ms</text>
+  
+  <path d="M 670 210 L 720 210" class="arrow"/>
+  
+  <!-- Vector Database -->
+  <rect x="720" y="160" width="200" height="100" class="fast" rx="8"/>
+  <rect x="720" y="160" width="200" height="35" class="box-header" rx="8"/>
+  <text x="820" y="182" class="box-header-text" text-anchor="middle">VECTOR DB</text>
+  <text x="820" y="210" class="box-text" text-anchor="middle">ChromaDB (Local)</text>
+  <text x="820" y="230" class="box-text" text-anchor="middle">3,766 docs</text>
+  <text x="820" y="250" class="metric" text-anchor="middle">~400ms</text>
+  
+  <path d="M 820 260 L 820 300" class="arrow"/>
+  
+  <!-- Context Builder -->
+  <rect x="720" y="300" width="200" height="80" class="box" rx="8"/>
+  <rect x="720" y="300" width="200" height="35" class="box-header" rx="8"/>
+  <text x="820" y="322" class="box-header-text" text-anchor="middle">CONTEXT BUILDER</text>
+  <text x="820" y="355" class="box-text" text-anchor="middle">Top-3 documents</text>
+  <text x="820" y="372" class="box-text" text-anchor="middle">+ metadata</text>
+  
+  <path d="M 720 340 L 670 340" class="arrow"/>
+  
+  <!-- Prompt Constructor -->
+  <rect x="470" y="300" width="200" height="80" class="box" rx="8"/>
+  <rect x="470" y="300" width="200" height="35" class="box-header" rx="8"/>
+  <text x="570" y="322" class="box-header-text" text-anchor="middle">PROMPT BUILDER</text>
+  <text x="570" y="355" class="box-text" text-anchor="middle">Alpaca Template</text>
+  <text x="570" y="372" class="box-text" text-anchor="middle">Context + Question</text>
+  
+  <path d="M 570 380 L 570 420" class="arrow"/>
+  
+  <!-- vLLM Engine -->
+  <rect x="470" y="420" width="430" height="140" class="fast" rx="8"/>
+  <rect x="470" y="420" width="430" height="40" class="box-header" rx="8"/>
+  <text x="685" y="445" class="box-header-text" text-anchor="middle" font-size="16">vLLM AsyncLLMEngine</text>
+  
+  <rect x="490" y="475" width="190" height="70" fill="white" stroke="#059669" stroke-width="1" rx="6"/>
+  <text x="585" y="495" class="box-text" text-anchor="middle" font-weight="bold">Fine-tuned Model</text>
+  <text x="585" y="515" class="box-text" text-anchor="middle">Phi-3-mini-4k</text>
+  <text x="585" y="535" class="metric" text-anchor="middle">merged_model/</text>
+  
+  <rect x="700" y="475" width="180" height="70" fill="white" stroke="#059669" stroke-width="1" rx="6"/>
+  <text x="790" y="495" class="box-text" text-anchor="middle" font-weight="bold">Generation</text>
+  <text x="790" y="515" class="box-text" text-anchor="middle">GPU: 70%</text>
+  <text x="790" y="535" class="metric" text-anchor="middle">~2-3s</text>
+  
+  <path d="M 685 560 L 685 600" class="arrow"/>
+  
+  <!-- Response Formatter -->
+  <rect x="570" y="600" width="230" height="80" class="box" rx="8"/>
+  <rect x="570" y="600" width="230" height="35" class="box-header" rx="8"/>
+  <text x="685" y="622" class="box-header-text" text-anchor="middle">RESPONSE FORMATTER</text>
+  <text x="685" y="650" class="box-text" text-anchor="middle">Answer + Sources + Metrics</text>
+  <text x="685" y="670" class="metric" text-anchor="middle">JSON Response</text>
+  
+  <!-- Performance Metrics -->
+  <rect x="50" y="240" width="350" height="180" fill="#f0fdf4" stroke="#059669" stroke-width="2" rx="8"/>
+  <text x="225" y="270" class="box-text" text-anchor="middle" font-weight="bold" font-size="18">Performance Metrics</text>
+  
+  <text x="70" y="305" class="box-text">Embedding Generation:</text>
+  <text x="360" y="305" class="metric" text-anchor="end">~2ms</text>
+  
+  <text x="70" y="335" class="box-text">Vector Search (Local):</text>
+  <text x="360" y="335" class="metric" text-anchor="end">~400ms</text>
+  
+  <text x="70" y="365" class="box-text">LLM Generation (vLLM):</text>
+  <text x="360" y="365" class="metric" text-anchor="end">~2-3s</text>
+  
+  <rect x="70" y="380" width="290" height="30" fill="#d1fae5" stroke="#059669" stroke-width="2" rx="4"/>
+  <text x="225" y="402" class="metric" text-anchor="middle" font-size="16">TOTAL LATENCY: &lt;3s ⚡</text>
+  
+  <!-- Tech Stack -->
+  <rect x="50" y="450" width="350" height="200" fill="#fef3c7" stroke="#d97706" stroke-width="2" rx="8"/>
+  <text x="225" y="480" class="box-text" text-anchor="middle" font-weight="bold" font-size="18">Technology Stack</text>
+  
+  <text x="70" y="515" class="box-text" font-weight="bold">Inference:</text>
+  <text x="90" y="540" class="box-text">• vLLM 0.6.3 (AsyncLLMEngine)</text>
+  
+  <text x="70" y="570" class="box-text" font-weight="bold">Retrieval:</text>
+  <text x="90" y="595" class="box-text">• ChromaDB 0.5.20 (Local)</text>
+  <text x="90" y="615" class="box-text">• sentence-transformers 3.3.0</text>
+  
+  <text x="70" y="640" class="box-text" font-weight="bold">Framework:</text>
+  <text x="90" y="665" class="box-text">• LangChain 0.3.7</text>
+  
+  <!-- Footer -->
+  <text x="600" y="790" class="subtitle" text-anchor="middle" font-size="12">Endpoint: rag-vllm-optimized | Infrastructure: Modal A10G GPU | Updated: 2025-11-30</text>
+</svg>
\ No newline at end of file
diff --git a/diagrams/3-web-endpoint-flow.mmd b/diagrams/3-web-endpoint-flow.mmd
deleted file mode 100644
index e6b7f942a72d721bfbfafb39a5efe97c43e32808..0000000000000000000000000000000000000000
--- a/diagrams/3-web-endpoint-flow.mmd
+++ /dev/null
@@ -1,26 +0,0 @@
-sequenceDiagram
-    participant User
-    participant Browser
-    participant Modal as Modal Platform
-    participant WebEndpoint as RAGModel.web_query
-    participant QueryMethod as RAGModel.query
-    participant RAGChain
-    participant ChromaDB
-    participant LLM
-
-    User->>Browser: GET https://.../web_query?question=...
-    Browser->>Modal: HTTP GET request
-    Modal->>WebEndpoint: Route to @modal.fastapi_endpoint
-    
-    WebEndpoint->>QueryMethod: Call query.local(question)
-    
-    Note over QueryMethod,LLM: Same flow as Query diagram
-    QueryMethod->>RAGChain: Build chain
-    RAGChain->>ChromaDB: Retrieve docs
-    RAGChain->>LLM: Generate answer
-    LLM-->>QueryMethod: Return result
-    
-    QueryMethod-->>WebEndpoint: Return {answer, sources}
-    WebEndpoint-->>Modal: JSON response
-    Modal-->>Browser: HTTP 200 + JSON
-    Browser-->>User: Display result
diff --git a/diagrams/3-web-endpoint-flow.svg b/diagrams/3-web-endpoint-flow.svg
index d3894960789320bbcdbe60d23fc34588cec454d6..04fe74de0f39143d11c946b52887319df767fb8d 100644
--- a/diagrams/3-web-endpoint-flow.svg
+++ b/diagrams/3-web-endpoint-flow.svg
@@ -1 +1,94 @@
-<svg id="my-svg" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="max-width: 1899px; background-color: transparent;" viewBox="-50 -10 1899 748" role="graphics-document document" aria-roledescription="sequence"><g><rect x="1649" y="662" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-bottom"/><text x="1724" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1724" dy="0">LLM</tspan></text></g><g><rect x="1449" y="662" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-bottom"/><text x="1524" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1524" dy="0">ChromaDB</tspan></text></g><g><rect x="1249" y="662" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGChain" rx="3" ry="3" class="actor actor-bottom"/><text x="1324" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1324" dy="0">RAGChain</tspan></text></g><g><rect x="1049" y="662" fill="#eaeaea" stroke="#666" width="150" height="65" name="QueryMethod" rx="3" ry="3" class="actor actor-bottom"/><text x="1124" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1124" dy="0">RAGModel.query</tspan></text></g><g><rect x="802.5" y="662" fill="#eaeaea" stroke="#666" width="171" height="65" name="WebEndpoint" rx="3" ry="3" class="actor actor-bottom"/><text x="888" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="888" dy="0">RAGModel.web_query</tspan></text></g><g><rect x="519" y="662" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-bottom"/><text x="594" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="594" dy="0">Modal Platform</tspan></text></g><g><rect x="319" y="662" fill="#eaeaea" stroke="#666" width="150" height="65" name="Browser" rx="3" ry="3" class="actor actor-bottom"/><text x="394" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="394" dy="0">Browser</tspan></text></g><g><rect x="0" y="662" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-bottom"/><text x="75" y="694.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g><g><line id="actor7" x1="1724" y1="65" x2="1724" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="LLM"/><g id="root-7"><rect x="1649" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="LLM" rx="3" ry="3" class="actor actor-top"/><text x="1724" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1724" dy="0">LLM</tspan></text></g></g><g><line id="actor6" x1="1524" y1="65" x2="1524" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="ChromaDB"/><g id="root-6"><rect x="1449" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-top"/><text x="1524" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1524" dy="0">ChromaDB</tspan></text></g></g><g><line id="actor5" x1="1324" y1="65" x2="1324" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="RAGChain"/><g id="root-5"><rect x="1249" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGChain" rx="3" ry="3" class="actor actor-top"/><text x="1324" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1324" dy="0">RAGChain</tspan></text></g></g><g><line id="actor4" x1="1124" y1="65" x2="1124" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="QueryMethod"/><g id="root-4"><rect x="1049" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="QueryMethod" rx="3" ry="3" class="actor actor-top"/><text x="1124" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1124" dy="0">RAGModel.query</tspan></text></g></g><g><line id="actor3" x1="888" y1="65" x2="888" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="WebEndpoint"/><g id="root-3"><rect x="802.5" y="0" fill="#eaeaea" stroke="#666" width="171" height="65" name="WebEndpoint" rx="3" ry="3" class="actor actor-top"/><text x="888" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="888" dy="0">RAGModel.web_query</tspan></text></g></g><g><line id="actor2" x1="594" y1="65" x2="594" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Modal"/><g id="root-2"><rect x="519" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-top"/><text x="594" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="594" dy="0">Modal Platform</tspan></text></g></g><g><line id="actor1" x1="394" y1="65" x2="394" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Browser"/><g id="root-1"><rect x="319" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Browser" rx="3" ry="3" class="actor actor-top"/><text x="394" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="394" dy="0">Browser</tspan></text></g></g><g><line id="actor0" x1="75" y1="65" x2="75" y2="662" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="User"/><g id="root-0"><rect x="0" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="User" rx="3" ry="3" class="actor actor-top"/><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">User</tspan></text></g></g><style>#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#a44141;}#my-svg .error-text{fill:#ddd;stroke:#ddd;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:lightgrey;stroke:lightgrey;}#my-svg .marker.cross{stroke:lightgrey;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .actor{stroke:#ccc;fill:#1f2020;}#my-svg text.actor&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .actor-line{stroke:#ccc;}#my-svg .innerArc{stroke-width:1.5;stroke-dasharray:none;}#my-svg .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:lightgrey;}#my-svg .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:lightgrey;}#my-svg #arrowhead path{fill:lightgrey;stroke:lightgrey;}#my-svg .sequenceNumber{fill:black;}#my-svg #sequencenumber{fill:lightgrey;}#my-svg #crosshead path{fill:lightgrey;stroke:lightgrey;}#my-svg .messageText{fill:lightgrey;stroke:none;}#my-svg .labelBox{stroke:#ccc;fill:#1f2020;}#my-svg .labelText,#my-svg .labelText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopText,#my-svg .loopText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:#ccc;fill:#ccc;}#my-svg .note{stroke:hsl(180, 0%, 18.3529411765%);fill:hsl(180, 1.5873015873%, 28.3529411765%);}#my-svg .noteText,#my-svg .noteText&gt;tspan{fill:rgb(183.8476190475, 181.5523809523, 181.5523809523);stroke:none;}#my-svg .activation0{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation1{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation2{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .actorPopupMenu{position:absolute;}#my-svg .actorPopupMenuPanel{position:absolute;fill:#1f2020;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#my-svg .actor-man line{stroke:#ccc;fill:#1f2020;}#my-svg .actor-man circle,#my-svg line{stroke:#ccc;fill:#1f2020;stroke-width:2px;}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g/><defs><symbol id="computer" width="24" height="24"><path transform="scale(.5)" d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z"/></symbol></defs><defs><symbol id="database" fill-rule="evenodd" clip-rule="evenodd"><path transform="scale(.5)" d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z"/></symbol></defs><defs><symbol id="clock" width="24" height="24"><path transform="scale(.5)" d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z"/></symbol></defs><defs><marker id="arrowhead" refX="7.9" refY="5" markerUnits="userSpaceOnUse" markerWidth="12" markerHeight="12" orient="auto-start-reverse"><path d="M -1 0 L 10 5 L 0 10 z"/></marker></defs><defs><marker id="crosshead" markerWidth="15" markerHeight="8" orient="auto" refX="4" refY="4.5"><path fill="none" stroke="#000000" stroke-width="1pt" d="M 1,2 L 6,7 M 6,2 L 1,7" style="stroke-dasharray: 0, 0;"/></marker></defs><defs><marker id="filled-head" refX="15.5" refY="7" markerWidth="20" markerHeight="28" orient="auto"><path d="M 18,7 L9,13 L14,7 L9,1 Z"/></marker></defs><defs><marker id="sequencenumber" refX="15" refY="15" markerWidth="60" markerHeight="40" orient="auto"><circle cx="15" cy="15" r="6"/></marker></defs><g><rect x="1099" y="251" fill="#EDF2AE" stroke="#666" width="650" height="39" class="note"/><text x="1424" y="256" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="1424">Same flow as Query diagram</tspan></text></g><text x="233" y="80" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">GET https://.../web_query?question=...</text><line x1="76" y1="109" x2="390" y2="109" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="493" y="124" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">HTTP GET request</text><line x1="395" y1="153" x2="590" y2="153" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="740" y="168" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Route to @modal.fastapi_endpoint</text><line x1="595" y1="197" x2="884" y2="197" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1005" y="212" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Call query.local(question)</text><line x1="889" y1="241" x2="1120" y2="241" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1223" y="305" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Build chain</text><line x1="1125" y1="334" x2="1320" y2="334" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1423" y="349" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Retrieve docs</text><line x1="1325" y1="378" x2="1520" y2="378" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1523" y="393" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Generate answer</text><line x1="1325" y1="422" x2="1720" y2="422" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="1426" y="437" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return result</text><line x1="1723" y1="466" x2="1128" y2="466" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="1008" y="481" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return {answer, sources}</text><line x1="1123" y1="510" x2="892" y2="510" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="743" y="525" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">JSON response</text><line x1="887" y1="554" x2="598" y2="554" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="496" y="569" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">HTTP 200 + JSON</text><line x1="593" y1="598" x2="398" y2="598" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="236" y="613" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Display result</text><line x1="393" y1="642" x2="79" y2="642" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/></svg>
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1000 600">
+  <defs>
+    <style>
+      .title { font: bold 22px sans-serif; fill: #1a1a1a; }
+      .subtitle { font: 15px sans-serif; fill: #666; }
+      .box { fill: #f0f7ff; stroke: #2563eb; stroke-width: 2; }
+      .box-header { fill: #2563eb; }
+      .box-text { font: 13px sans-serif; fill: #1a1a1a; }
+      .box-header-text { font: bold 13px sans-serif; fill: white; }
+      .arrow { stroke: #2563eb; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
+      .metric { font: 12px monospace; fill: #059669; font-weight: bold; }
+      .endpoint { fill: #fef3c7; stroke: #d97706; }
+    </style>
+    <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
+      <polygon points="0 0, 10 3, 0 6" fill="#2563eb" />
+    </marker>
+  </defs>
+  
+  <!-- Title -->
+  <text x="500" y="40" class="title" text-anchor="middle">Web Endpoint Flow - FastAPI Integration</text>
+  <text x="500" y="65" class="subtitle" text-anchor="middle">HTTP API for RAG and Fine-tuned Model Inference</text>
+  
+  <!-- Client Request -->
+  <rect x="50" y="120" width="180" height="100" class="box" rx="8"/>
+  <rect x="50" y="120" width="180" height="35" class="box-header" rx="8"/>
+  <text x="140" y="142" class="box-header-text" text-anchor="middle">CLIENT REQUEST</text>
+  <text x="140" y="175" class="box-text" text-anchor="middle">🌐 HTTP POST</text>
+  <text x="140" y="195" class="box-text" text-anchor="middle">JSON payload</text>
+  
+  <path d="M 230 170 L 280 170" class="arrow"/>
+  
+  <!-- FastAPI Endpoint -->
+  <rect x="280" y="120" width="200" height="100" class="endpoint" rx="8"/>
+  <rect x="280" y="120" width="200" height="35" class="box-header" rx="8"/>
+  <text x="380" y="142" class="box-header-text" text-anchor="middle">FASTAPI ENDPOINT</text>
+  <text x="380" y="175" class="box-text" text-anchor="middle">@modal.fastapi_endpoint</text>
+  <text x="380" y="195" class="box-text" text-anchor="middle">Async handler</text>
+  
+  <path d="M 480 170 L 530 170" class="arrow"/>
+  
+  <!-- Modal Container -->
+  <rect x="530" y="100" width="420" height="380" fill="#f8fafc" stroke="#475569" stroke-width="3" rx="10"/>
+  <text x="740" y="130" class="title" text-anchor="middle" font-size="16">Modal Container (GPU)</text>
+  
+  <!-- RAG Endpoint -->
+  <rect x="560" y="160" width="170" height="110" fill="#d1fae5" stroke="#059669" stroke-width="2" rx="8"/>
+  <rect x="560" y="160" width="170" height="35" class="box-header" rx="8"/>
+  <text x="645" y="182" class="box-header-text" text-anchor="middle">RAG ENDPOINT</text>
+  <text x="645" y="210" class="box-text" text-anchor="middle">rag-vllm-optimized</text>
+  <text x="645" y="230" class="box-text" text-anchor="middle">ChromaDB + vLLM</text>
+  <text x="645" y="255" class="metric" text-anchor="middle">&lt;3s latency</text>
+  
+  <!-- Fine-tuned Endpoint -->
+  <rect x="750" y="160" width="170" height="110" fill="#d1fae5" stroke="#059669" stroke-width="2" rx="8"/>
+  <rect x="750" y="160" width="170" height="35" class="box-header" rx="8"/>
+  <text x="835" y="182" class="box-header-text" text-anchor="middle">FINE-TUNED API</text>
+  <text x="835" y="210" class="box-text" text-anchor="middle">phi3-inference-vllm</text>
+  <text x="835" y="230" class="box-text" text-anchor="middle">Merged model</text>
+  <text x="835" y="255" class="metric" text-anchor="middle">&lt;3s latency</text>
+  
+  <!-- Processing Layer -->
+  <rect x="560" y="300" width="360" height="90" class="box" rx="8"/>
+  <rect x="560" y="300" width="360" height="35" class="box-header" rx="8"/>
+  <text x="740" y="322" class="box-header-text" text-anchor="middle">PROCESSING LAYER</text>
+  <text x="740" y="355" class="box-text" text-anchor="middle">• Request validation • Model inference</text>
+  <text x="740" y="375" class="box-text" text-anchor="middle">• Response formatting • Error handling</text>
+  
+  <!-- GPU Resources -->
+  <rect x="560" y="410" width="360" height="50" fill="#fef3c7" stroke="#d97706" stroke-width="1" rx="6"/>
+  <text x="740" y="435" class="box-text" text-anchor="middle" font-weight="bold">GPU Resources: A10G</text>
+  <text x="740" y="452" class="metric" text-anchor="middle">Shared: Embeddings + vLLM Engine</text>
+  
+  <!-- Response -->
+  <rect x="50" y="300" width="180" height="120" class="box" rx="8"/>
+  <rect x="50" y="300" width="180" height="35" class="box-header" rx="8"/>
+  <text x="140" y="322" class="box-header-text" text-anchor="middle">RESPONSE</text>
+  <text x="140" y="355" class="box-text" text-anchor="middle">✅ Answer</text>
+  <text x="140" y="375" class="box-text" text-anchor="middle">📊 Metrics</text>
+  <text x="140" y="395" class="box-text" text-anchor="middle">📄 Sources (RAG)</text>
+  <text x="140" y="415" class="metric" text-anchor="middle">JSON format</text>
+  
+  <path d="M 530 360 L 230 360" class="arrow"/>
+  
+  <!-- Endpoints Info -->
+  <rect x="50" y="500" width="900" height="80" fill="#f0f7ff" stroke="#2563eb" stroke-width="2" rx="8"/>
+  <text x="500" y="530" class="box-text" text-anchor="middle" font-weight="bold" font-size="16">Available Endpoints</text>
+  <text x="250" y="555" class="box-text" text-anchor="middle" font-weight="bold">RAG:</text>
+  <text x="250" y="572" class="metric" text-anchor="middle" font-size="10">rag-vllm-optimized-ragmodel-query</text>
+  <text x="750" y="555" class="box-text" text-anchor="middle" font-weight="bold">Fine-tuned:</text>
+  <text x="750" y="572" class="metric" text-anchor="middle" font-size="10">phi3-inference-vllm-model-ask</text>
+  
+  <!-- Footer -->
+  <text x="500" y="595" class="subtitle" text-anchor="middle" font-size="11">Infrastructure: Modal • Framework: FastAPI • Updated: 2025-11-30</text>
+</svg>
\ No newline at end of file
diff --git a/diagrams/4-container-lifecycle.mmd b/diagrams/4-container-lifecycle.mmd
deleted file mode 100644
index f813c5d1dfcc69cc1977f73a5843b011f7cc9380..0000000000000000000000000000000000000000
--- a/diagrams/4-container-lifecycle.mmd
+++ /dev/null
@@ -1,31 +0,0 @@
-sequenceDiagram
-    participant Modal
-    participant Container
-    participant RAGModel
-    participant GPU as A10G GPU
-    participant Volume as Modal Volume
-    participant ChromaDB
-
-    Modal->>Container: Start container (min_containers=1)
-    Container->>GPU: Allocate GPU
-    Container->>Volume: Mount /insurance-data
-    
-    Container->>RAGModel: Call @modal.enter()
-    
-    Note over RAGModel: Initialization phase
-    RAGModel->>RAGModel: Load HuggingFaceEmbeddings (CUDA)
-    RAGModel->>ChromaDB: Connect to remote service
-    RAGModel->>RAGModel: Load Mistral-7B (GPU)
-    RAGModel->>RAGModel: Create RemoteChromaRetriever class
-    
-    RAGModel-->>Container: Ready
-    Container-->>Modal: Container warm and ready
-    
-    Note over Modal,Container: Container stays warm (min_containers=1)
-    
-    loop Handle requests
-        Modal->>RAGModel: Invoke query() method
-        RAGModel-->>Modal: Return result
-    end
-    
-    Note over Modal,Container: Container persists until scaled down
diff --git a/diagrams/4-container-lifecycle.svg b/diagrams/4-container-lifecycle.svg
index aa3d4fe075eccfa7fedd8782874bacbd811b042d..5784d43874632c4824abc8d1e32cbe00153f5099 100644
--- a/diagrams/4-container-lifecycle.svg
+++ b/diagrams/4-container-lifecycle.svg
@@ -1 +1,118 @@
-<svg id="my-svg" width="100%" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" style="max-width: 1346px; background-color: transparent;" viewBox="-50 -10 1346 991" role="graphics-document document" aria-roledescription="sequence"><g><rect x="1096" y="905" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-bottom"/><text x="1171" y="937.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1171" dy="0">ChromaDB</tspan></text></g><g><rect x="896" y="905" fill="#eaeaea" stroke="#666" width="150" height="65" name="Volume" rx="3" ry="3" class="actor actor-bottom"/><text x="971" y="937.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="971" dy="0">Modal Volume</tspan></text></g><g><rect x="696" y="905" fill="#eaeaea" stroke="#666" width="150" height="65" name="GPU" rx="3" ry="3" class="actor actor-bottom"/><text x="771" y="937.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="771" dy="0">A10G GPU</tspan></text></g><g><rect x="496" y="905" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-bottom"/><text x="571" y="937.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="571" dy="0">RAGModel</tspan></text></g><g><rect x="294" y="905" fill="#eaeaea" stroke="#666" width="150" height="65" name="Container" rx="3" ry="3" class="actor actor-bottom"/><text x="369" y="937.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="369" dy="0">Container</tspan></text></g><g><rect x="0" y="905" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-bottom"/><text x="75" y="937.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">Modal</tspan></text></g><g><line id="actor5" x1="1171" y1="65" x2="1171" y2="905" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="ChromaDB"/><g id="root-5"><rect x="1096" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="ChromaDB" rx="3" ry="3" class="actor actor-top"/><text x="1171" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="1171" dy="0">ChromaDB</tspan></text></g></g><g><line id="actor4" x1="971" y1="65" x2="971" y2="905" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Volume"/><g id="root-4"><rect x="896" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Volume" rx="3" ry="3" class="actor actor-top"/><text x="971" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="971" dy="0">Modal Volume</tspan></text></g></g><g><line id="actor3" x1="771" y1="65" x2="771" y2="905" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="GPU"/><g id="root-3"><rect x="696" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="GPU" rx="3" ry="3" class="actor actor-top"/><text x="771" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="771" dy="0">A10G GPU</tspan></text></g></g><g><line id="actor2" x1="571" y1="65" x2="571" y2="905" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="RAGModel"/><g id="root-2"><rect x="496" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="RAGModel" rx="3" ry="3" class="actor actor-top"/><text x="571" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="571" dy="0">RAGModel</tspan></text></g></g><g><line id="actor1" x1="369" y1="65" x2="369" y2="905" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Container"/><g id="root-1"><rect x="294" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Container" rx="3" ry="3" class="actor actor-top"/><text x="369" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="369" dy="0">Container</tspan></text></g></g><g><line id="actor0" x1="75" y1="65" x2="75" y2="905" class="actor-line 200" stroke-width="0.5px" stroke="#999" name="Modal"/><g id="root-0"><rect x="0" y="0" fill="#eaeaea" stroke="#666" width="150" height="65" name="Modal" rx="3" ry="3" class="actor actor-top"/><text x="75" y="32.5" dominant-baseline="central" alignment-baseline="central" class="actor actor-box" style="text-anchor: middle; font-size: 16px; font-weight: 400;"><tspan x="75" dy="0">Modal</tspan></text></g></g><style>#my-svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#ccc;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#my-svg .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#my-svg .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#my-svg .error-icon{fill:#a44141;}#my-svg .error-text{fill:#ddd;stroke:#ddd;}#my-svg .edge-thickness-normal{stroke-width:1px;}#my-svg .edge-thickness-thick{stroke-width:3.5px;}#my-svg .edge-pattern-solid{stroke-dasharray:0;}#my-svg .edge-thickness-invisible{stroke-width:0;fill:none;}#my-svg .edge-pattern-dashed{stroke-dasharray:3;}#my-svg .edge-pattern-dotted{stroke-dasharray:2;}#my-svg .marker{fill:lightgrey;stroke:lightgrey;}#my-svg .marker.cross{stroke:lightgrey;}#my-svg svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#my-svg p{margin:0;}#my-svg .actor{stroke:#ccc;fill:#1f2020;}#my-svg text.actor&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .actor-line{stroke:#ccc;}#my-svg .innerArc{stroke-width:1.5;stroke-dasharray:none;}#my-svg .messageLine0{stroke-width:1.5;stroke-dasharray:none;stroke:lightgrey;}#my-svg .messageLine1{stroke-width:1.5;stroke-dasharray:2,2;stroke:lightgrey;}#my-svg #arrowhead path{fill:lightgrey;stroke:lightgrey;}#my-svg .sequenceNumber{fill:black;}#my-svg #sequencenumber{fill:lightgrey;}#my-svg #crosshead path{fill:lightgrey;stroke:lightgrey;}#my-svg .messageText{fill:lightgrey;stroke:none;}#my-svg .labelBox{stroke:#ccc;fill:#1f2020;}#my-svg .labelText,#my-svg .labelText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopText,#my-svg .loopText&gt;tspan{fill:lightgrey;stroke:none;}#my-svg .loopLine{stroke-width:2px;stroke-dasharray:2,2;stroke:#ccc;fill:#ccc;}#my-svg .note{stroke:hsl(180, 0%, 18.3529411765%);fill:hsl(180, 1.5873015873%, 28.3529411765%);}#my-svg .noteText,#my-svg .noteText&gt;tspan{fill:rgb(183.8476190475, 181.5523809523, 181.5523809523);stroke:none;}#my-svg .activation0{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation1{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .activation2{fill:hsl(180, 1.5873015873%, 28.3529411765%);stroke:#ccc;}#my-svg .actorPopupMenu{position:absolute;}#my-svg .actorPopupMenuPanel{position:absolute;fill:#1f2020;box-shadow:0px 8px 16px 0px rgba(0,0,0,0.2);filter:drop-shadow(3px 5px 2px rgb(0 0 0 / 0.4));}#my-svg .actor-man line{stroke:#ccc;fill:#1f2020;}#my-svg .actor-man circle,#my-svg line{stroke:#ccc;fill:#1f2020;stroke-width:2px;}#my-svg :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;}</style><g/><defs><symbol id="computer" width="24" height="24"><path transform="scale(.5)" d="M2 2v13h20v-13h-20zm18 11h-16v-9h16v9zm-10.228 6l.466-1h3.524l.467 1h-4.457zm14.228 3h-24l2-6h2.104l-1.33 4h18.45l-1.297-4h2.073l2 6zm-5-10h-14v-7h14v7z"/></symbol></defs><defs><symbol id="database" fill-rule="evenodd" clip-rule="evenodd"><path transform="scale(.5)" d="M12.258.001l.256.004.255.005.253.008.251.01.249.012.247.015.246.016.242.019.241.02.239.023.236.024.233.027.231.028.229.031.225.032.223.034.22.036.217.038.214.04.211.041.208.043.205.045.201.046.198.048.194.05.191.051.187.053.183.054.18.056.175.057.172.059.168.06.163.061.16.063.155.064.15.066.074.033.073.033.071.034.07.034.069.035.068.035.067.035.066.035.064.036.064.036.062.036.06.036.06.037.058.037.058.037.055.038.055.038.053.038.052.038.051.039.05.039.048.039.047.039.045.04.044.04.043.04.041.04.04.041.039.041.037.041.036.041.034.041.033.042.032.042.03.042.029.042.027.042.026.043.024.043.023.043.021.043.02.043.018.044.017.043.015.044.013.044.012.044.011.045.009.044.007.045.006.045.004.045.002.045.001.045v17l-.001.045-.002.045-.004.045-.006.045-.007.045-.009.044-.011.045-.012.044-.013.044-.015.044-.017.043-.018.044-.02.043-.021.043-.023.043-.024.043-.026.043-.027.042-.029.042-.03.042-.032.042-.033.042-.034.041-.036.041-.037.041-.039.041-.04.041-.041.04-.043.04-.044.04-.045.04-.047.039-.048.039-.05.039-.051.039-.052.038-.053.038-.055.038-.055.038-.058.037-.058.037-.06.037-.06.036-.062.036-.064.036-.064.036-.066.035-.067.035-.068.035-.069.035-.07.034-.071.034-.073.033-.074.033-.15.066-.155.064-.16.063-.163.061-.168.06-.172.059-.175.057-.18.056-.183.054-.187.053-.191.051-.194.05-.198.048-.201.046-.205.045-.208.043-.211.041-.214.04-.217.038-.22.036-.223.034-.225.032-.229.031-.231.028-.233.027-.236.024-.239.023-.241.02-.242.019-.246.016-.247.015-.249.012-.251.01-.253.008-.255.005-.256.004-.258.001-.258-.001-.256-.004-.255-.005-.253-.008-.251-.01-.249-.012-.247-.015-.245-.016-.243-.019-.241-.02-.238-.023-.236-.024-.234-.027-.231-.028-.228-.031-.226-.032-.223-.034-.22-.036-.217-.038-.214-.04-.211-.041-.208-.043-.204-.045-.201-.046-.198-.048-.195-.05-.19-.051-.187-.053-.184-.054-.179-.056-.176-.057-.172-.059-.167-.06-.164-.061-.159-.063-.155-.064-.151-.066-.074-.033-.072-.033-.072-.034-.07-.034-.069-.035-.068-.035-.067-.035-.066-.035-.064-.036-.063-.036-.062-.036-.061-.036-.06-.037-.058-.037-.057-.037-.056-.038-.055-.038-.053-.038-.052-.038-.051-.039-.049-.039-.049-.039-.046-.039-.046-.04-.044-.04-.043-.04-.041-.04-.04-.041-.039-.041-.037-.041-.036-.041-.034-.041-.033-.042-.032-.042-.03-.042-.029-.042-.027-.042-.026-.043-.024-.043-.023-.043-.021-.043-.02-.043-.018-.044-.017-.043-.015-.044-.013-.044-.012-.044-.011-.045-.009-.044-.007-.045-.006-.045-.004-.045-.002-.045-.001-.045v-17l.001-.045.002-.045.004-.045.006-.045.007-.045.009-.044.011-.045.012-.044.013-.044.015-.044.017-.043.018-.044.02-.043.021-.043.023-.043.024-.043.026-.043.027-.042.029-.042.03-.042.032-.042.033-.042.034-.041.036-.041.037-.041.039-.041.04-.041.041-.04.043-.04.044-.04.046-.04.046-.039.049-.039.049-.039.051-.039.052-.038.053-.038.055-.038.056-.038.057-.037.058-.037.06-.037.061-.036.062-.036.063-.036.064-.036.066-.035.067-.035.068-.035.069-.035.07-.034.072-.034.072-.033.074-.033.151-.066.155-.064.159-.063.164-.061.167-.06.172-.059.176-.057.179-.056.184-.054.187-.053.19-.051.195-.05.198-.048.201-.046.204-.045.208-.043.211-.041.214-.04.217-.038.22-.036.223-.034.226-.032.228-.031.231-.028.234-.027.236-.024.238-.023.241-.02.243-.019.245-.016.247-.015.249-.012.251-.01.253-.008.255-.005.256-.004.258-.001.258.001zm-9.258 20.499v.01l.001.021.003.021.004.022.005.021.006.022.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.023.018.024.019.024.021.024.022.025.023.024.024.025.052.049.056.05.061.051.066.051.07.051.075.051.079.052.084.052.088.052.092.052.097.052.102.051.105.052.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.048.144.049.147.047.152.047.155.047.16.045.163.045.167.043.171.043.176.041.178.041.183.039.187.039.19.037.194.035.197.035.202.033.204.031.209.03.212.029.216.027.219.025.222.024.226.021.23.02.233.018.236.016.24.015.243.012.246.01.249.008.253.005.256.004.259.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.021.224-.024.22-.026.216-.027.212-.028.21-.031.205-.031.202-.034.198-.034.194-.036.191-.037.187-.039.183-.04.179-.04.175-.042.172-.043.168-.044.163-.045.16-.046.155-.046.152-.047.148-.048.143-.049.139-.049.136-.05.131-.05.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.053.083-.051.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.05.023-.024.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.023.01-.022.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.127l-.077.055-.08.053-.083.054-.085.053-.087.052-.09.052-.093.051-.095.05-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.045-.118.044-.12.043-.122.042-.124.042-.126.041-.128.04-.13.04-.132.038-.134.038-.135.037-.138.037-.139.035-.142.035-.143.034-.144.033-.147.032-.148.031-.15.03-.151.03-.153.029-.154.027-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.01-.179.008-.179.008-.181.006-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.006-.179-.008-.179-.008-.178-.01-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.027-.153-.029-.151-.03-.15-.03-.148-.031-.146-.032-.145-.033-.143-.034-.141-.035-.14-.035-.137-.037-.136-.037-.134-.038-.132-.038-.13-.04-.128-.04-.126-.041-.124-.042-.122-.042-.12-.044-.117-.043-.116-.045-.113-.045-.112-.046-.109-.047-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.05-.093-.052-.09-.051-.087-.052-.085-.053-.083-.054-.08-.054-.077-.054v4.127zm0-5.654v.011l.001.021.003.021.004.021.005.022.006.022.007.022.009.022.01.022.011.023.012.023.013.023.015.024.016.023.017.024.018.024.019.024.021.024.022.024.023.025.024.024.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.052.11.051.114.051.119.052.123.05.127.051.131.05.135.049.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.044.171.042.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.022.23.02.233.018.236.016.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.012.241-.015.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.048.139-.05.136-.049.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.051.051-.049.023-.025.023-.024.021-.025.02-.024.019-.024.018-.024.017-.024.015-.023.014-.023.013-.024.012-.022.01-.023.01-.023.008-.022.006-.022.006-.022.004-.021.004-.022.001-.021.001-.021v-4.139l-.077.054-.08.054-.083.054-.085.052-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.049-.105.048-.106.047-.109.047-.111.046-.114.045-.115.044-.118.044-.12.044-.122.042-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.035-.143.033-.144.033-.147.033-.148.031-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.025-.161.024-.162.023-.163.022-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.011-.178.009-.179.009-.179.007-.181.007-.182.005-.182.004-.184.003-.184.002h-.37l-.184-.002-.184-.003-.182-.004-.182-.005-.181-.007-.179-.007-.179-.009-.178-.009-.176-.011-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.022-.162-.023-.161-.024-.159-.025-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.031-.146-.033-.145-.033-.143-.033-.141-.035-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.04-.126-.041-.124-.042-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.047-.105-.048-.102-.049-.1-.049-.097-.05-.095-.051-.093-.051-.09-.051-.087-.053-.085-.052-.083-.054-.08-.054-.077-.054v4.139zm0-5.666v.011l.001.02.003.022.004.021.005.022.006.021.007.022.009.023.01.022.011.023.012.023.013.023.015.023.016.024.017.024.018.023.019.024.021.025.022.024.023.024.024.025.052.05.056.05.061.05.066.051.07.051.075.052.079.051.084.052.088.052.092.052.097.052.102.052.105.051.11.052.114.051.119.051.123.051.127.05.131.05.135.05.139.049.144.048.147.048.152.047.155.046.16.045.163.045.167.043.171.043.176.042.178.04.183.04.187.038.19.037.194.036.197.034.202.033.204.032.209.03.212.028.216.027.219.025.222.024.226.021.23.02.233.018.236.017.24.014.243.012.246.01.249.008.253.006.256.003.259.001.26-.001.257-.003.254-.006.25-.008.247-.01.244-.013.241-.014.237-.016.233-.018.231-.02.226-.022.224-.024.22-.025.216-.027.212-.029.21-.03.205-.032.202-.033.198-.035.194-.036.191-.037.187-.039.183-.039.179-.041.175-.042.172-.043.168-.044.163-.045.16-.045.155-.047.152-.047.148-.048.143-.049.139-.049.136-.049.131-.051.126-.05.123-.051.118-.052.114-.051.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.052.07-.051.065-.051.06-.051.056-.05.051-.049.023-.025.023-.025.021-.024.02-.024.019-.024.018-.024.017-.024.015-.023.014-.024.013-.023.012-.023.01-.022.01-.023.008-.022.006-.022.006-.022.004-.022.004-.021.001-.021.001-.021v-4.153l-.077.054-.08.054-.083.053-.085.053-.087.053-.09.051-.093.051-.095.051-.097.05-.1.049-.102.048-.105.048-.106.048-.109.046-.111.046-.114.046-.115.044-.118.044-.12.043-.122.043-.124.042-.126.041-.128.04-.13.039-.132.039-.134.038-.135.037-.138.036-.139.036-.142.034-.143.034-.144.033-.147.032-.148.032-.15.03-.151.03-.153.028-.154.028-.156.027-.158.026-.159.024-.161.024-.162.023-.163.023-.165.021-.166.02-.167.019-.169.018-.169.017-.171.016-.173.015-.173.014-.175.013-.175.012-.177.01-.178.01-.179.009-.179.007-.181.006-.182.006-.182.004-.184.003-.184.001-.185.001-.185-.001-.184-.001-.184-.003-.182-.004-.182-.006-.181-.006-.179-.007-.179-.009-.178-.01-.176-.01-.176-.012-.175-.013-.173-.014-.172-.015-.171-.016-.17-.017-.169-.018-.167-.019-.166-.02-.165-.021-.163-.023-.162-.023-.161-.024-.159-.024-.157-.026-.156-.027-.155-.028-.153-.028-.151-.03-.15-.03-.148-.032-.146-.032-.145-.033-.143-.034-.141-.034-.14-.036-.137-.036-.136-.037-.134-.038-.132-.039-.13-.039-.128-.041-.126-.041-.124-.041-.122-.043-.12-.043-.117-.044-.116-.044-.113-.046-.112-.046-.109-.046-.106-.048-.105-.048-.102-.048-.1-.05-.097-.049-.095-.051-.093-.051-.09-.052-.087-.052-.085-.053-.083-.053-.08-.054-.077-.054v4.153zm8.74-8.179l-.257.004-.254.005-.25.008-.247.011-.244.012-.241.014-.237.016-.233.018-.231.021-.226.022-.224.023-.22.026-.216.027-.212.028-.21.031-.205.032-.202.033-.198.034-.194.036-.191.038-.187.038-.183.04-.179.041-.175.042-.172.043-.168.043-.163.045-.16.046-.155.046-.152.048-.148.048-.143.048-.139.049-.136.05-.131.05-.126.051-.123.051-.118.051-.114.052-.11.052-.106.052-.101.052-.096.052-.092.052-.088.052-.083.052-.079.052-.074.051-.07.052-.065.051-.06.05-.056.05-.051.05-.023.025-.023.024-.021.024-.02.025-.019.024-.018.024-.017.023-.015.024-.014.023-.013.023-.012.023-.01.023-.01.022-.008.022-.006.023-.006.021-.004.022-.004.021-.001.021-.001.021.001.021.001.021.004.021.004.022.006.021.006.023.008.022.01.022.01.023.012.023.013.023.014.023.015.024.017.023.018.024.019.024.02.025.021.024.023.024.023.025.051.05.056.05.06.05.065.051.07.052.074.051.079.052.083.052.088.052.092.052.096.052.101.052.106.052.11.052.114.052.118.051.123.051.126.051.131.05.136.05.139.049.143.048.148.048.152.048.155.046.16.046.163.045.168.043.172.043.175.042.179.041.183.04.187.038.191.038.194.036.198.034.202.033.205.032.21.031.212.028.216.027.22.026.224.023.226.022.231.021.233.018.237.016.241.014.244.012.247.011.25.008.254.005.257.004.26.001.26-.001.257-.004.254-.005.25-.008.247-.011.244-.012.241-.014.237-.016.233-.018.231-.021.226-.022.224-.023.22-.026.216-.027.212-.028.21-.031.205-.032.202-.033.198-.034.194-.036.191-.038.187-.038.183-.04.179-.041.175-.042.172-.043.168-.043.163-.045.16-.046.155-.046.152-.048.148-.048.143-.048.139-.049.136-.05.131-.05.126-.051.123-.051.118-.051.114-.052.11-.052.106-.052.101-.052.096-.052.092-.052.088-.052.083-.052.079-.052.074-.051.07-.052.065-.051.06-.05.056-.05.051-.05.023-.025.023-.024.021-.024.02-.025.019-.024.018-.024.017-.023.015-.024.014-.023.013-.023.012-.023.01-.023.01-.022.008-.022.006-.023.006-.021.004-.022.004-.021.001-.021.001-.021-.001-.021-.001-.021-.004-.021-.004-.022-.006-.021-.006-.023-.008-.022-.01-.022-.01-.023-.012-.023-.013-.023-.014-.023-.015-.024-.017-.023-.018-.024-.019-.024-.02-.025-.021-.024-.023-.024-.023-.025-.051-.05-.056-.05-.06-.05-.065-.051-.07-.052-.074-.051-.079-.052-.083-.052-.088-.052-.092-.052-.096-.052-.101-.052-.106-.052-.11-.052-.114-.052-.118-.051-.123-.051-.126-.051-.131-.05-.136-.05-.139-.049-.143-.048-.148-.048-.152-.048-.155-.046-.16-.046-.163-.045-.168-.043-.172-.043-.175-.042-.179-.041-.183-.04-.187-.038-.191-.038-.194-.036-.198-.034-.202-.033-.205-.032-.21-.031-.212-.028-.216-.027-.22-.026-.224-.023-.226-.022-.231-.021-.233-.018-.237-.016-.241-.014-.244-.012-.247-.011-.25-.008-.254-.005-.257-.004-.26-.001-.26.001z"/></symbol></defs><defs><symbol id="clock" width="24" height="24"><path transform="scale(.5)" d="M12 2c5.514 0 10 4.486 10 10s-4.486 10-10 10-10-4.486-10-10 4.486-10 10-10zm0-2c-6.627 0-12 5.373-12 12s5.373 12 12 12 12-5.373 12-12-5.373-12-12-12zm5.848 12.459c.202.038.202.333.001.372-1.907.361-6.045 1.111-6.547 1.111-.719 0-1.301-.582-1.301-1.301 0-.512.77-5.447 1.125-7.445.034-.192.312-.181.343.014l.985 6.238 5.394 1.011z"/></symbol></defs><defs><marker id="arrowhead" refX="7.9" refY="5" markerUnits="userSpaceOnUse" markerWidth="12" markerHeight="12" orient="auto-start-reverse"><path d="M -1 0 L 10 5 L 0 10 z"/></marker></defs><defs><marker id="crosshead" markerWidth="15" markerHeight="8" orient="auto" refX="4" refY="4.5"><path fill="none" stroke="#000000" stroke-width="1pt" d="M 1,2 L 6,7 M 6,2 L 1,7" style="stroke-dasharray: 0, 0;"/></marker></defs><defs><marker id="filled-head" refX="15.5" refY="7" markerWidth="20" markerHeight="28" orient="auto"><path d="M 18,7 L9,13 L14,7 L9,1 Z"/></marker></defs><defs><marker id="sequencenumber" refX="15" refY="15" markerWidth="60" markerHeight="40" orient="auto"><circle cx="15" cy="15" r="6"/></marker></defs><g><rect x="496" y="251" fill="#EDF2AE" stroke="#666" width="150" height="39" class="note"/><text x="571" y="256" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="571">Initialization phase</tspan></text></g><g><rect x="50" y="654" fill="#EDF2AE" stroke="#666" width="344" height="39" class="note"/><text x="222" y="659" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="222">Container stays warm (min_containers=1)</tspan></text></g><g><line x1="64" y1="703" x2="582" y2="703" class="loopLine"/><line x1="582" y1="703" x2="582" y2="836" class="loopLine"/><line x1="64" y1="836" x2="582" y2="836" class="loopLine"/><line x1="64" y1="703" x2="64" y2="836" class="loopLine"/><polygon points="64,703 114,703 114,716 105.6,723 64,723" class="labelBox"/><text x="89" y="716" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="labelText" style="font-size: 16px; font-weight: 400;">loop</text><text x="348" y="721" text-anchor="middle" class="loopText" style="font-size: 16px; font-weight: 400;"><tspan x="348">[Handle requests]</tspan></text></g><g><rect x="50" y="846" fill="#EDF2AE" stroke="#666" width="344" height="39" class="note"/><text x="222" y="851" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="noteText" dy="1em" style="font-size: 16px; font-weight: 400;"><tspan x="222">Container persists until scaled down</tspan></text></g><text x="221" y="80" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Start container (min_containers=1)</text><line x1="76" y1="109" x2="365" y2="109" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="569" y="124" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Allocate GPU</text><line x1="370" y1="153" x2="767" y2="153" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="669" y="168" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Mount /insurance-data</text><line x1="370" y1="197" x2="967" y2="197" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="469" y="212" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Call @modal.enter()</text><line x1="370" y1="241" x2="567" y2="241" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="572" y="305" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Load HuggingFaceEmbeddings (CUDA)</text><path d="M 572,334 C 632,324 632,364 572,354" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="870" y="379" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Connect to remote service</text><line x1="572" y1="408" x2="1167" y2="408" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="572" y="423" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Load Mistral-7B (GPU)</text><path d="M 572,452 C 632,442 632,482 572,472" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="572" y="497" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Create RemoteChromaRetriever class</text><path d="M 572,526 C 632,516 632,556 572,546" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="472" y="571" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Ready</text><line x1="570" y1="600" x2="373" y2="600" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="224" y="615" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Container warm and ready</text><line x1="368" y1="644" x2="79" y2="644" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/><text x="322" y="753" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Invoke query() method</text><line x1="76" y1="782" x2="567" y2="782" class="messageLine0" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="fill: none;"/><text x="325" y="797" text-anchor="middle" dominant-baseline="middle" alignment-baseline="middle" class="messageText" dy="1em" style="font-size: 16px; font-weight: 400;">Return result</text><line x1="570" y1="826" x2="79" y2="826" class="messageLine1" stroke-width="2" stroke="none" marker-end="url(#arrowhead)" style="stroke-dasharray: 3, 3; fill: none;"/></svg>
\ No newline at end of file
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1000 700">
+  <defs>
+    <style>
+      .title { font: bold 22px sans-serif; fill: #1a1a1a; }
+      .subtitle { font: 15px sans-serif; fill: #666; }
+      .box { fill: #f0f7ff; stroke: #2563eb; stroke-width: 2; }
+      .box-header { fill: #2563eb; }
+      .box-text { font: 13px sans-serif; fill: #1a1a1a; }
+      .box-header-text { font: bold 13px sans-serif; fill: white; }
+      .arrow { stroke: #2563eb; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
+      .metric { font: 12px monospace; fill: #059669; font-weight: bold; }
+      .state { fill: #fef3c7; stroke: #d97706; }
+      .active { fill: #d1fae5; stroke: #059669; }
+    </style>
+    <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
+      <polygon points="0 0, 10 3, 0 6" fill="#2563eb" />
+    </marker>
+  </defs>
+  
+  <!-- Title -->
+  <text x="500" y="40" class="title" text-anchor="middle">Modal Container Lifecycle - vLLM Optimized</text>
+  <text x="500" y="65" class="subtitle" text-anchor="middle">GPU Container Management for RAG and Fine-tuned Models</text>
+  
+  <!-- State 1: Cold Start -->
+  <rect x="50" y="120" width="200" height="120" class="state" rx="8"/>
+  <rect x="50" y="120" width="200" height="35" class="box-header" rx="8"/>
+  <text x="150" y="142" class="box-header-text" text-anchor="middle">1. COLD START</text>
+  <text x="150" y="175" class="box-text" text-anchor="middle">Container creation</text>
+  <text x="150" y="195" class="box-text" text-anchor="middle">Image pull</text>
+  <text x="150" y="215" class="metric" text-anchor="middle">~30-60s</text>
+  
+  <path d="M 250 180 L 300 180" class="arrow"/>
+  
+  <!-- State 2: Loading Resources -->
+  <rect x="300" y="120" width="200" height="120" class="state" rx="8"/>
+  <rect x="300" y="120" width="200" height="35" class="box-header" rx="8"/>
+  <text x="400" y="142" class="box-header-text" text-anchor="middle">2. LOAD RESOURCES</text>
+  <text x="400" y="175" class="box-text" text-anchor="middle">📚 Embeddings</text>
+  <text x="400" y="195" class="box-text" text-anchor="middle">💾 ChromaDB</text>
+  <text x="400" y="215" class="box-text" text-anchor="middle">🤖 vLLM Engine</text>
+  <text x="400" y="232" class="metric" text-anchor="middle">~20-40s</text>
+  
+  <path d="M 500 180 L 550 180" class="arrow"/>
+  
+  <!-- State 3: Ready -->
+  <rect x="550" y="120" width="200" height="120" class="active" rx="8"/>
+  <rect x="550" y="120" width="200" height="35" class="box-header" rx="8"/>
+  <text x="650" y="142" class="box-header-text" text-anchor="middle">3. READY ✅</text>
+  <text x="650" y="175" class="box-text" text-anchor="middle">Accepting requests</text>
+  <text x="650" y="195" class="box-text" text-anchor="middle">GPU warmed up</text>
+  <text x="650" y="215" class="metric" text-anchor="middle">&lt;3s latency</text>
+  
+  <path d="M 650 240 L 650 290" class="arrow"/>
+  
+  <!-- State 4: Active Processing -->
+  <rect x="550" y="290" width="200" height="120" class="active" rx="8"/>
+  <rect x="550" y="290" width="200" height="35" class="box-header" rx="8"/>
+  <text x="650" y="312" class="box-header-text" text-anchor="middle">4. PROCESSING</text>
+  <text x="650" y="345" class="box-text" text-anchor="middle">Handling queries</text>
+  <text x="650" y="365" class="box-text" text-anchor="middle">GPU inference</text>
+  <text x="650" y="385" class="box-text" text-anchor="middle">Concurrent requests</text>
+  <text x="650" y="402" class="metric" text-anchor="middle">Async execution</text>
+  
+  <!-- Idle Timer -->
+  <rect x="300" y="290" width="200" height="120" class="state" rx="8"/>
+  <rect x="300" y="290" width="200" height="35" class="box-header" rx="8"/>
+  <text x="400" y="312" class="box-header-text" text-anchor="middle">5. IDLE TIMER</text>
+  <text x="400" y="345" class="box-text" text-anchor="middle">No requests</text>
+  <text x="400" y="365" class="box-text" text-anchor="middle">Scaledown window:</text>
+  <text x="400" y="385" class="metric" text-anchor="middle">300 seconds</text>
+  <text x="400" y="402" class="box-text" text-anchor="middle" font-size="11">(5 minutes)</text>
+  
+  <path d="M 550 350 L 500 350" class="arrow"/>
+  
+  <!-- State 6: Shutdown -->
+  <rect x="50" y="290" width="200" height="120" class="box" rx="8"/>
+  <rect x="50" y="290" width="200" height="35" class="box-header" rx="8"/>
+  <text x="150" y="312" class="box-header-text" text-anchor="middle">6. SHUTDOWN</text>
+  <text x="150" y="345" class="box-text" text-anchor="middle">Container stopped</text>
+  <text x="150" y="365" class="box-text" text-anchor="middle">Resources freed</text>
+  <text x="150" y="385" class="box-text" text-anchor="middle">Cost savings</text>
+  
+  <path d="M 300 350 L 250 350" class="arrow"/>
+  
+  <!-- Restart arrow -->
+  <path d="M 150 290 Q 150 260 250 180" class="arrow" stroke-dasharray="5,5"/>
+  <text x="180" y="260" class="box-text" font-size="11">New request</text>
+  
+  <!-- Configuration -->
+  <rect x="50" y="460" width="700" height="120" fill="#f0fdf4" stroke="#059669" stroke-width="2" rx="8"/>
+  <text x="400" y="490" class="box-text" text-anchor="middle" font-weight="bold" font-size="16">Container Configuration</text>
+  
+  <text x="70" y="520" class="box-text" font-weight="bold">GPU:</text>
+  <text x="120" y="520" class="box-text">A10G (24GB VRAM)</text>
+  
+  <text x="70" y="545" class="box-text" font-weight="bold">Scaledown:</text>
+  <text x="170" y="545" class="box-text">300s idle timeout</text>
+  
+  <text x="400" y="520" class="box-text" font-weight="bold">Memory:</text>
+  <text x="470" y="520" class="box-text">70% GPU utilization</text>
+  
+  <text x="400" y="545" class="box-text" font-weight="bold">Concurrency:</text>
+  <text x="500" y="545" class="box-text">Async (multiple requests)</text>
+  
+  <text x="70" y="570" class="box-text" font-weight="bold">Warm Start:</text>
+  <text x="170" y="570" class="metric">&lt;100ms (if cached)</text>
+  
+  <text x="400" y="570" class="box-text" font-weight="bold">Cold Start:</text>
+  <text x="490" y="570" class="metric">~50-100s total</text>
+  
+  <!-- Benefits -->
+  <rect x="50" y="610" width="700" height="70" fill="#fef3c7" stroke="#d97706" stroke-width="2" rx="8"/>
+  <text x="400" y="640" class="box-text" text-anchor="middle" font-weight="bold" font-size="15">Benefits of Modal Container Management</text>
+  <text x="400" y="665" class="box-text" text-anchor="middle">✅ Auto-scaling • 💰 Cost optimization • ⚡ Fast warm starts • 🔄 Automatic restarts • 📊 Built-in monitoring</text>
+  
+  <!-- Footer -->
+  <text x="500" y="695" class="subtitle" text-anchor="middle" font-size="11">Infrastructure: Modal • GPU: A10G • Updated: 2025-11-30</text>
+</svg>
\ No newline at end of file
diff --git a/diagrams/finetuning.svg b/diagrams/finetuning.svg
index df1f55bf62e5e6bd208514bd0cf1d147b495b554..4a5f22660ed1e52955555f9316fda240948a15ac 100644
--- a/diagrams/finetuning.svg
+++ b/diagrams/finetuning.svg
@@ -1,179 +1,114 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 1400">
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1200 800">
   <defs>
     <style>
       .title { font: bold 24px sans-serif; fill: #1a1a1a; }
-      .subtitle { font: bold 18px sans-serif; fill: #2563eb; }
-      .text { font: 14px sans-serif; fill: #374151; }
-      .small { font: 12px sans-serif; fill: #6b7280; }
-      .code { font: 12px monospace; fill: #059669; }
-      .box { fill: #f3f4f6; stroke: #d1d5db; stroke-width: 2; }
-      .highlight { fill: #dbeafe; stroke: #2563eb; stroke-width: 2; }
-      .problem { fill: #fee2e2; stroke: #ef4444; stroke-width: 2; }
-      .arrow { fill: none; stroke: #6b7280; stroke-width: 2; marker-end: url(#arrowhead); }
-      .arrow-highlight { fill: none; stroke: #2563eb; stroke-width: 3; marker-end: url(#arrowhead-blue); }
+      .subtitle { font: 16px sans-serif; fill: #666; }
+      .box { fill: #f0f7ff; stroke: #2563eb; stroke-width: 2; }
+      .box-header { fill: #2563eb; }
+      .box-text { font: 14px sans-serif; fill: #1a1a1a; }
+      .box-header-text { font: bold 14px sans-serif; fill: white; }
+      .arrow { stroke: #2563eb; stroke-width: 2; fill: none; marker-end: url(#arrowhead); }
+      .metric { font: 12px monospace; fill: #059669; }
     </style>
     <marker id="arrowhead" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
-      <polygon points="0 0, 10 3, 0 6" fill="#6b7280" />
-    </marker>
-    <marker id="arrowhead-blue" markerWidth="10" markerHeight="10" refX="9" refY="3" orient="auto">
       <polygon points="0 0, 10 3, 0 6" fill="#2563eb" />
     </marker>
   </defs>
   
   <!-- Title -->
-  <text x="600" y="40" text-anchor="middle" class="title">Fine-tuning Pipeline: CSV → Trained Model</text>
-  
-  <!-- Step 1: Raw Data -->
-  <rect x="50" y="80" width="300" height="140" class="box" rx="8"/>
-  <text x="200" y="105" text-anchor="middle" class="subtitle">1. Raw Data Sources</text>
-  <text x="70" y="135" class="text">📁 Census CSVs (6,838 files)</text>
-  <text x="70" y="160" class="text">📁 Economy/Labor CSVs (50 files)</text>
-  <text x="70" y="185" class="small">• Multi-row headers with metadata</text>
-  <text x="70" y="205" class="small">• Codes (13103) instead of names</text>
-  
-  <!-- Step 2: CSV Structure -->
-  <rect x="400" y="80" width="350" height="140" class="box" rx="8"/>
-  <text x="575" y="105" text-anchor="middle" class="subtitle">2. CSV Structure</text>
-  <text x="420" y="130" class="code">Row 0: Title, Unnamed:1, Unnamed:2...</text>
-  <text x="420" y="150" class="code">Row 1-7: Metadata, notes...</text>
-  <text x="420" y="170" class="code">Row 8: Code, Name, Population...</text>
-  <text x="420" y="190" class="code">Row 9: 13103, Minato-ku, 260071...</text>
-  <text x="420" y="210" class="small">⚠️ Real data starts at Row 8+</text>
-  
-  <!-- Arrow 1 to 2 -->
-  <path d="M 350 150 L 395 150" class="arrow"/>
-  
-  <!-- Step 3: Smart Parser -->
-  <rect x="850" y="80" width="300" height="140" class="highlight" rx="8"/>
-  <text x="1000" y="105" text-anchor="middle" class="subtitle">3. Smart Parser</text>
-  <text x="870" y="130" class="text">📝 prepare_finetune_data.py</text>
-  <text x="870" y="155" class="small">✓ Skip rows with "Unnamed"</text>
-  <text x="870" y="175" class="small">✓ Detect header row (Row 8)</text>
-  <text x="870" y="195" class="small">✓ Clean values (remove codes)</text>
-  <text x="870" y="215" class="small">✓ Filter valid columns</text>
-  
-  <!-- Arrow 2 to 3 -->
-  <path d="M 750 150 L 845 150" class="arrow-highlight"/>
-  
-  <!-- Step 4: Data Extraction -->
-  <rect x="50" y="270" width="300" height="160" class="box" rx="8"/>
-  <text x="200" y="295" text-anchor="middle" class="subtitle">4. Data Extraction</text>
-  <text x="70" y="320" class="text">For each CSV file:</text>
-  <text x="70" y="345" class="small">1. Read file → find header row</text>
-  <text x="70" y="365" class="small">2. Extract data rows (9+)</text>
-  <text x="70" y="385" class="small">3. Sample 500 rows per file</text>
-  <text x="70" y="405" class="small">4. Generate QA pairs</text>
-  <text x="70" y="425" class="small">Result: ~1,350 training samples</text>
-  
-  <!-- Arrow 3 to 4 -->
-  <path d="M 1000 220 L 1000 340 L 355 340" class="arrow"/>
-  
-  <!-- Step 5: QA Generation (Current - Problem) -->
-  <rect x="400" y="270" width="350" height="160" class="problem" rx="8"/>
-  <text x="575" y="295" text-anchor="middle" class="subtitle">5. QA Generation (Current)</text>
-  <text x="420" y="320" class="text">❌ Problem: Uses random columns</text>
-  <text x="420" y="345" class="code">row_label = "13103" (code!)</text>
-  <text x="420" y="365" class="code">column = "Members per household"</text>
-  <text x="420" y="385" class="code">value = "2.56"</text>
-  <text x="420" y="410" class="small">Q: "What is X for 13103?"</text>
-  <text x="420" y="425" class="small">A: "The X for 13103 is 2.56."</text>
-  
-  <!-- Arrow 4 to 5 -->
-  <path d="M 350 350 L 395 350" class="arrow"/>
-  
-  <!-- Step 6: Better QA Generation -->
-  <rect x="850" y="270" width="300" height="160" class="highlight" rx="8"/>
-  <text x="1000" y="295" text-anchor="middle" class="subtitle">6. Better Approach</text>
-  <text x="870" y="320" class="text">✓ Always use name column</text>
-  <text x="870" y="345" class="code">row_label = "Minato-ku, Tokyo"</text>
-  <text x="870" y="365" class="code">column = "Members per household"</text>
-  <text x="870" y="385" class="code">value = "2.56"</text>
-  <text x="870" y="410" class="small">Q: "What is X for Minato-ku?"</text>
-  <text x="870" y="425" class="small">A: "The X for Minato-ku is 2.56."</text>
-  
-  <!-- Arrow 5 to 6 (improvement) -->
-  <path d="M 750 350 L 845 350" class="arrow" stroke-dasharray="5,5"/>
-  <text x="797" y="340" class="small" fill="#ef4444">needs fix</text>
-  
-  <!-- Step 7: JSONL Format -->
-  <rect x="50" y="480" width="700" height="120" class="box" rx="8"/>
-  <text x="400" y="505" text-anchor="middle" class="subtitle">7. Training Data (JSONL Format)</text>
-  <text x="70" y="535" class="code">{</text>
-  <text x="90" y="555" class="code">"instruction": "What is the Members per household for 1231?",</text>
-  <text x="90" y="575" class="code">"input": "Context: Japan Census data...",</text>
-  <text x="90" y="595" class="code">"output": "The Members per household for 1231 is 3.56."</text>
-  <text x="70" y="615" class="code">}</text>
-  
-  <!-- Arrow 5/6 to 7 -->
-  <path d="M 575 430 L 575 475" class="arrow"/>
-  
-  <!-- Step 8: Fine-tuning -->
-  <rect x="850" y="480" width="300" height="120" class="highlight" rx="8"/>
-  <text x="1000" y="505" text-anchor="middle" class="subtitle">8. Fine-tuning</text>
-  <text x="870" y="530" class="text">📝 finetune_modal.py</text>
-  <text x="870" y="555" class="small">• Model: Phi-3-mini-4k-instruct</text>
-  <text x="870" y="575" class="small">• GPU: H200 (90 mins)</text>
-  <text x="870" y="595" class="small">• Method: LoRA + Unsloth</text>
-  
-  <!-- Arrow 7 to 8 -->
-  <path d="M 750 540 L 845 540" class="arrow-highlight"/>
-  
-  <!-- Step 9: Trained Model -->
-  <rect x="50" y="650" width="300" height="100" class="highlight" rx="8"/>
-  <text x="200" y="675" text-anchor="middle" class="subtitle">9. Fine-tuned Model</text>
-  <text x="70" y="700" class="text">🎯 Saved to Modal Volume:</text>
-  <text x="70" y="720" class="code">model-checkpoints/</text>
-  <text x="70" y="740" class="small">Ready for inference!</text>
-  
-  <!-- Arrow 8 to 9 -->
-  <path d="M 1000 600 L 1000 700 L 355 700" class="arrow-highlight"/>
-  
-  <!-- Step 10: Inference API -->
-  <rect x="400" y="650" width="350" height="100" class="box" rx="8"/>
-  <text x="575" y="675" text-anchor="middle" class="subtitle">10. Inference API</text>
-  <text x="420" y="700" class="text">📝 api_endpoint.py (GPU - A10G)</text>
-  <text x="420" y="720" class="text">📝 api_endpoint_cpu.py (CPU)</text>
-  <text x="420" y="740" class="small">POST /ask → Get answers</text>
-  
-  <!-- Arrow 9 to 10 -->
-  <path d="M 350 700 L 395 700" class="arrow"/>
-  
-  <!-- Step 11: User Query -->
-  <rect x="850" y="650" width="300" height="100" class="box" rx="8"/>
-  <text x="1000" y="675" text-anchor="middle" class="subtitle">11. User Query</text>
-  <text x="870" y="700" class="text">Q: "Population of Tokyo?"</text>
-  <text x="870" y="720" class="text">A: "The population for</text>
-  <text x="920" y="740" class="text">13100 is 13,960,000."</text>
-  
-  <!-- Arrow 10 to 11 -->
-  <path d="M 750 700 L 845 700" class="arrow"/>
-  
-  <!-- Summary Box -->
-  <rect x="50" y="800" width="1100" height="140" class="highlight" rx="8"/>
-  <text x="600" y="830" text-anchor="middle" class="subtitle">📊 Pipeline Summary</text>
-  <text x="70" y="860" class="text">Input: 6,888 CSV files with complex headers → Output: Fine-tuned model that answers questions</text>
-  <text x="70" y="885" class="text">✓ Smart header detection (skip metadata rows)</text>
-  <text x="70" y="905" class="text">✓ QA pair generation (1,350 samples)</text>
-  <text x="70" y="925" class="text">⚠️ Current issue: Uses codes (13103) instead of names (Minato-ku)</text>
-  
-  <!-- Key Metrics -->
-  <rect x="50" y="980" width="550" height="120" class="box" rx="8"/>
-  <text x="325" y="1005" text-anchor="middle" class="subtitle">📈 Current Metrics</text>
-  <text x="70" y="1030" class="text">• Total CSV files: 6,888</text>
-  <text x="70" y="1050" class="text">• Training samples: 1,351</text>
-  <text x="70" y="1070" class="text">• Validation samples: 151</text>
-  <text x="70" y="1090" class="text">• Training time: ~90 minutes (H200)</text>
-  
-  <!-- Issues & Solutions -->
-  <rect x="650" y="980" width="500" height="120" class="problem" rx="8"/>
-  <text x="900" y="1005" text-anchor="middle" class="subtitle">⚠️ Issues &amp; Solutions</text>
-  <text x="670" y="1030" class="text">Issue: Row labels use codes (13103)</text>
-  <text x="670" y="1050" class="text">Solution: Always use name column</text>
-  <text x="670" y="1070" class="text">Issue: Only 1,351 samples (too small)</text>
-  <text x="670" y="1090" class="text">Solution: Fix census file parsing</text>
+  <text x="600" y="40" class="title" text-anchor="middle">Fine-Tuning Pipeline - Phi-3-mini with vLLM</text>
+  <text x="600" y="65" class="subtitle" text-anchor="middle">High-Performance Model Training &amp; Deployment</text>
+  
+  <!-- Step 1: Data Preparation -->
+  <rect x="50" y="120" width="220" height="180" class="box" rx="8"/>
+  <rect x="50" y="120" width="220" height="40" class="box-header" rx="8"/>
+  <text x="160" y="145" class="box-header-text" text-anchor="middle">1. DATA PREPARATION</text>
+  <text x="160" y="185" class="box-text" text-anchor="middle">📊 Japan Census CSV</text>
+  <text x="160" y="210" class="box-text" text-anchor="middle" font-weight="bold">201,651 samples</text>
+  <text x="160" y="235" class="box-text" text-anchor="middle">QA Generation (Gemini)</text>
+  <text x="160" y="260" class="box-text" text-anchor="middle">Train/Val Split (80/20)</text>
+  <text x="160" y="285" class="metric" text-anchor="middle">→ train.jsonl / val.jsonl</text>
+  
+  <!-- Arrow 1 -->
+  <path d="M 270 210 L 320 210" class="arrow"/>
+  
+  <!-- Step 2: Fine-Tuning -->
+  <rect x="320" y="120" width="240" height="180" class="box" rx="8"/>
+  <rect x="320" y="120" width="240" height="40" class="box-header" rx="8"/>
+  <text x="440" y="145" class="box-header-text" text-anchor="middle">2. FINE-TUNING</text>
+  <text x="440" y="180" class="box-text" text-anchor="middle">🖥️ Modal H200 GPU</text>
+  <text x="440" y="205" class="box-text" text-anchor="middle">Phi-3-mini-4k-instruct</text>
+  <text x="440" y="230" class="box-text" text-anchor="middle">LoRA (r=16, α=16)</text>
+  <text x="440" y="255" class="box-text" text-anchor="middle">4-bit Quantization</text>
+  <text x="440" y="280" class="metric" text-anchor="middle">10,000 steps → adapter</text>
+  
+  <!-- Arrow 2 -->
+  <path d="M 560 210 L 610 210" class="arrow"/>
+  
+  <!-- Step 3: Model Merging -->
+  <rect x="610" y="120" width="240" height="180" class="box" rx="8"/>
+  <rect x="610" y="120" width="240" height="40" class="box-header" rx="8"/>
+  <text x="730" y="145" class="box-header-text" text-anchor="middle">3. MODEL MERGING</text>
+  <text x="730" y="180" class="box-text" text-anchor="middle">🔄 Modal A10G GPU</text>
+  <text x="730" y="205" class="box-text" text-anchor="middle">Merge LoRA + Base</text>
+  <text x="730" y="230" class="box-text" text-anchor="middle">Save as bfloat16</text>
+  <text x="730" y="255" class="box-text" text-anchor="middle">→ merged_model/</text>
+  <text x="730" y="280" class="metric" text-anchor="middle">model-checkpoints volume</text>
+  
+  <!-- Arrow 3 -->
+  <path d="M 730 300 L 730 350" class="arrow"/>
+  
+  <!-- Step 4: Deployment Options -->
+  <rect x="900" y="120" width="250" height="180" class="box" rx="8"/>
+  <rect x="900" y="120" width="250" height="40" class="box-header" rx="8"/>
+  <text x="1025" y="145" class="box-header-text" text-anchor="middle">4. DEPLOYMENT</text>
+  
+  <!-- Option A -->
+  <rect x="920" y="175" width="210" height="50" fill="#e0f2fe" stroke="#0284c7" stroke-width="1" rx="4"/>
+  <text x="1025" y="195" class="box-text" text-anchor="middle" font-weight="bold">Option A: Standard API</text>
+  <text x="1025" y="215" class="box-text" text-anchor="middle" font-size="12">Transformers + PEFT</text>
+  
+  <!-- Option B -->
+  <rect x="920" y="235" width="210" height="50" fill="#d1fae5" stroke="#059669" stroke-width="2" rx="4"/>
+  <text x="1025" y="255" class="box-text" text-anchor="middle" font-weight="bold">Option B: vLLM API ⚡</text>
+  <text x="1025" y="275" class="metric" text-anchor="middle" font-weight="bold">&lt;3s latency</text>
+  
+  <!-- Arrow to deployments -->
+  <path d="M 850 210 L 900 210" class="arrow"/>
+  
+  <!-- Performance Metrics Box -->
+  <rect x="50" y="350" width="800" height="120" fill="#f0fdf4" stroke="#059669" stroke-width="2" rx="8"/>
+  <text x="450" y="380" class="box-text" text-anchor="middle" font-weight="bold" font-size="16">Performance Metrics</text>
+  
+  <text x="200" y="410" class="box-text">Training Time: ~2-3 hours</text>
+  <text x="200" y="435" class="box-text">GPU Memory: ~40GB (H200)</text>
+  <text x="200" y="460" class="box-text">Dataset Size: 201,651 samples</text>
+  
+  <text x="550" y="410" class="metric" font-weight="bold">Inference Latency:</text>
+  <text x="550" y="435" class="box-text">Standard API: ~10s</text>
+  <text x="550" y="460" class="metric" font-weight="bold">vLLM API: &lt;3s ✨</text>
+  
+  <!-- Tech Stack -->
+  <rect x="50" y="500" width="800" height="100" fill="#fef3c7" stroke="#d97706" stroke-width="2" rx="8"/>
+  <text x="450" y="530" class="box-text" text-anchor="middle" font-weight="bold" font-size="16">Technology Stack</text>
+  <text x="450" y="560" class="box-text" text-anchor="middle">Modal • PyTorch 2.4.0 • Transformers 4.44.2 • PEFT 0.12.0 • vLLM 0.6.3 • bitsandbytes 0.43.3</text>
+  <text x="450" y="585" class="box-text" text-anchor="middle">LoRA Fine-Tuning • 4-bit Quantization • Async Inference Engine</text>
+  
+  <!-- Endpoints -->
+  <rect x="900" y="350" width="250" height="250" fill="#faf5ff" stroke="#7c3aed" stroke-width="2" rx="8"/>
+  <text x="1025" y="380" class="box-text" text-anchor="middle" font-weight="bold" font-size="16">API Endpoints</text>
+  
+  <text x="1025" y="415" class="box-text" text-anchor="middle" font-size="12" font-weight="bold">Standard:</text>
+  <text x="1025" y="435" class="metric" text-anchor="middle" font-size="10">phi3-inference-gpu</text>
+  
+  <text x="1025" y="470" class="box-text" text-anchor="middle" font-size="12" font-weight="bold">vLLM (Optimized):</text>
+  <text x="1025" y="490" class="metric" text-anchor="middle" font-size="10">phi3-inference-vllm</text>
+  
+  <text x="1025" y="525" class="box-text" text-anchor="middle" font-size="12" font-weight="bold">Evaluation:</text>
+  <text x="1025" y="545" class="metric" text-anchor="middle" font-size="10">eval-finetuned</text>
+  
+  <text x="1025" y="580" class="box-text" text-anchor="middle" font-size="11">All on Modal A10G GPU</text>
   
   <!-- Footer -->
-  <text x="600" y="1150" text-anchor="middle" class="small">Files: prepare_finetune_data.py → finetune_modal.py → api_endpoint.py</text>
-  <text x="600" y="1170" text-anchor="middle" class="small">Modal Volumes: census-data, economy-labor-data, finetune-dataset, model-checkpoints</text>
+  <text x="600" y="780" class="subtitle" text-anchor="middle" font-size="12">Updated: 2025-11-30 | Architecture: vLLM-Optimized Pipeline</text>
 </svg>
diff --git a/docs/NEXT_STEPS.md b/docs/NEXT_STEPS.md
new file mode 100644
index 0000000000000000000000000000000000000000..4a6e7f6d35ca77ce57de1cb3c4dc3d37d0f7c578
--- /dev/null
+++ b/docs/NEXT_STEPS.md
@@ -0,0 +1,181 @@
+# Next Steps & Roadmap
+
+## ✅ Current Status
+
+**Completed:**
+- Fine-tuning pipeline with vLLM optimization
+- RAG system with local ChromaDB
+- High-performance inference (<3s latency)
+- Model merging for production deployment
+- Comprehensive documentation
+
+## 🎯 Immediate Next Steps
+
+### 1. Test Fine-Tuned Model Performance
+
+```bash
+# Test the vLLM-optimized endpoint
+curl -X POST https://mcp-hack--phi3-inference-vllm-model-ask.modal.run \
+  -H "Content-Type: application/json" \
+  -d '{"question": "What is the population of Tokyo?", "context": "Japan Census data"}'
+```
+
+### 2. Test RAG System
+
+```bash
+# Test the RAG endpoint
+curl -X POST https://mcp-hack--rag-vllm-optimized-ragmodel-query.modal.run \
+  -H "Content-Type: application/json" \
+  -d '{"question": "What insurance products are available?"}'
+```
+
+### 3. Monitor Performance
+
+- Check latency metrics in responses
+- Verify <3s response times
+- Monitor GPU utilization on Modal dashboard
+
+## 🚀 Short Term (This Week)
+
+### Fine-Tuning Improvements
+- [ ] Run evaluation script to assess model quality
+- [ ] Collect more training data if needed
+- [ ] Experiment with different LoRA parameters
+- [ ] Test on diverse queries
+
+### RAG Enhancements
+- [ ] Add more insurance documents to volume
+- [ ] Re-index with updated documents
+- [ ] Test retrieval quality
+- [ ] Optimize chunk sizes if needed
+
+### Documentation
+- [ ] Add API usage examples
+- [ ] Create deployment guide
+- [ ] Document troubleshooting steps
+
+## 📊 Medium Term (Next 2 Weeks)
+
+### Model Optimization
+1. **Fine-tuning iterations**
+   - Analyze evaluation results
+   - Adjust training parameters
+   - Re-train if needed
+
+2. **RAG improvements**
+   - Experiment with different embedding models
+   - Optimize retrieval parameters (top-k, similarity threshold)
+   - Add query rewriting
+
+3. **Performance monitoring**
+   - Set up logging
+   - Track latency trends
+   - Monitor costs
+
+### Feature Additions
+- [ ] Add streaming responses
+- [ ] Implement caching layer
+- [ ] Add query history
+- [ ] Create admin dashboard
+
+## 🎨 Long Term (Next Month)
+
+### Production Readiness
+1. **Deployment**
+   - Set up CI/CD pipeline
+   - Configure monitoring and alerts
+   - Implement rate limiting
+   - Add authentication if needed
+
+2. **Scaling**
+   - Optimize container scaling
+   - Implement load balancing
+   - Add caching (Redis)
+   - Set up CDN for static assets
+
+3. **Advanced Features**
+   - Multi-modal support (images, tables)
+   - Batch processing
+   - A/B testing framework
+   - Analytics dashboard
+
+## 🔧 Technical Debt
+
+- [ ] Remove `bkp/` directory (old backup files)
+- [ ] Clean up unused dependencies
+- [ ] Add comprehensive tests
+- [ ] Improve error handling
+- [ ] Add input validation
+
+## 📈 Metrics to Track
+
+**Performance:**
+- Inference latency (target: <3s)
+- Retrieval accuracy
+- GPU utilization
+- Cost per query
+
+**Quality:**
+- Model accuracy on evaluation set
+- RAG relevance scores
+- User satisfaction (if applicable)
+
+## 🤔 Decision Points
+
+1. **Model Selection:**
+   - [ ] Continue with Phi-3-mini
+   - [ ] Experiment with larger models
+   - [ ] Try different base models
+
+2. **Infrastructure:**
+   - [ ] Stay with Modal (current)
+   - [ ] Migrate to other platform
+   - [ ] Self-hosted deployment
+
+3. **Data Strategy:**
+   - [ ] Expand training dataset
+   - [ ] Add domain-specific data
+   - [ ] Implement data versioning
+
+## 📚 Quick Reference
+
+### Key Commands
+```bash
+# Fine-tuning
+./venv/bin/modal run src/finetune/finetune_modal.py
+
+# Model merging
+./venv/bin/modal run src/finetune/merge_model.py
+
+# Deploy vLLM endpoint (fine-tuned)
+./venv/bin/modal deploy src/finetune/api_endpoint_vllm.py
+
+# Deploy RAG endpoint
+./venv/bin/modal deploy src/rag/rag_vllm.py
+
+# Evaluation
+./venv/bin/modal run src/finetune/eval_finetuned.py
+```
+
+### Documentation
+- **Main Guide:** `docs/HOW_TO_RUN.md`
+- **Architecture:** `diagrams/` folder
+- **Testing:** `docs/TESTING.md`
+- **Agent Design:** `docs/agentdesign.md`
+
+## 🎯 Success Criteria
+
+**Phase 1 (Current):**
+- ✅ <3s inference latency
+- ✅ vLLM optimization working
+- ✅ RAG retrieval functional
+
+**Phase 2 (Next):**
+- [ ] >90% accuracy on evaluation set
+- [ ] <2s average latency
+- [ ] Production deployment complete
+
+**Phase 3 (Future):**
+- [ ] Multi-user support
+- [ ] Advanced analytics
+- [ ] Cost optimization (<$X per 1K queries)
diff --git a/QUICK_START.md b/docs/QUICK_START.md
similarity index 100%
rename from QUICK_START.md
rename to docs/QUICK_START.md
diff --git a/docs/QUICK_START_API.md b/docs/QUICK_START_API.md
new file mode 100644
index 0000000000000000000000000000000000000000..5f998bd8302fcee841815fb609eb1d2d513b9369
--- /dev/null
+++ b/docs/QUICK_START_API.md
@@ -0,0 +1,75 @@
+# Quick Start: RAG API
+
+Fast API endpoint for querying product design documents with <3 second response times.
+
+## Deploy the API
+
+```bash
+# Deploy to Modal
+modal deploy src/rag/rag_api.py
+
+# Get the API URL
+modal app show insurance-rag-api
+```
+
+## Use the API
+
+### Python Client
+
+```python
+from src.rag.api_client import RAGAPIClient
+
+# Initialize client
+client = RAGAPIClient(base_url="https://your-api-url.modal.run")
+
+# Query
+result = client.query("What are the three product tiers?")
+print(result['answer'])
+print(f"Response time: {result['total_time']:.2f}s")
+```
+
+### cURL
+
+```bash
+curl -X POST https://your-api-url.modal.run/query \
+  -H "Content-Type: application/json" \
+  -d '{"question": "What are the three product tiers?"}'
+```
+
+### JavaScript
+
+```javascript
+const response = await fetch('https://your-api-url.modal.run/query', {
+  method: 'POST',
+  headers: { 'Content-Type': 'application/json' },
+  body: JSON.stringify({ question: 'What are the three product tiers?' })
+});
+
+const data = await response.json();
+console.log(data.answer);
+```
+
+## Test Performance
+
+```bash
+# Test with default URL
+python tests/test_api.py
+
+# Test with custom URL
+python tests/test_api.py --url https://your-api-url.modal.run
+```
+
+## Performance Target
+
+- **Target**: <3 seconds per query
+- **Typical**: 1.5-2.5 seconds
+- **Optimizations**: Warm containers, reduced tokens, limited context
+
+## API Endpoints
+
+- `GET /health` - Health check
+- `POST /query` - Query the RAG system
+- `GET /` - API information
+
+See `docs/api/RAG_API.md` for full documentation.
+
diff --git a/docs/README.md b/docs/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..276d8409a3a305af2663b61840304d7f73f47ed5
--- /dev/null
+++ b/docs/README.md
@@ -0,0 +1,58 @@
+# Documentation Index
+
+This directory contains all project documentation.
+
+## 📚 Main Guides
+
+### Getting Started
+- **[HOW_TO_RUN.md](HOW_TO_RUN.md)** - Complete guide to running the fine-tuning pipeline
+- **[QUICK_START.md](QUICK_START.md)** - Quick start guide for the project
+- **[QUICK_START_API.md](QUICK_START_API.md)** - API quick start guide
+
+### Fine-Tuning
+- **[finetune/](../finetune/)** - Fine-tuning documentation and guides
+  - Data preparation
+  - Dataset generation
+  - Model training
+  - Evaluation
+
+### RAG System
+- **[README_RAG.md](README_RAG.md)** - RAG system overview
+- **[guides/QUICK_START_RAG.md](guides/QUICK_START_RAG.md)** - RAG quick start
+- **[guides/RAG_SETUP_COMPLETE.md](guides/RAG_SETUP_COMPLETE.md)** - Complete RAG setup guide
+- **[api/RAG_API.md](api/RAG_API.md)** - RAG API documentation
+
+### Deployment
+- **[deployment/](deployment/)** - Deployment guides
+  - **[README.md](deployment/README.md)** - Deployment overview
+  - **[NEBIUS_DEPLOYMENT.md](deployment/NEBIUS_DEPLOYMENT.md)** - Nebius deployment guide
+
+### Reference
+- **[STRUCTURE.md](STRUCTURE.md)** - Project structure overview
+- **[TESTING.md](TESTING.md)** - Testing guide
+- **[MIGRATION_GUIDE.md](MIGRATION_GUIDE.md)** - Migration guide
+- **[VLLM_MIGRATION.md](VLLM_MIGRATION.md)** - vLLM migration guide
+- **[NEXT_STEPS.md](NEXT_STEPS.md)** - Next steps and roadmap
+
+### Agent Design
+- **[agentdesign.md](agentdesign.md)** - AI agent design for automated development workflow
+
+### Product Design
+- **[product-design/](product-design/)** - Product design guides and examples
+  - Product decision guide
+  - RAG setup for product design
+  - Example: Tokyo auto insurance product design
+
+## 🔧 Additional Resources
+
+### Data Sources
+- **[guides/estat_api_guide.md](guides/estat_api_guide.md)** - e-Stat API guide
+- **[guides/source_data.md](guides/source_data.md)** - Data source documentation
+- **[guides/ft_process.md](guides/ft_process.md)** - Fine-tuning process details
+
+### Troubleshooting
+- **[guides/TROUBLESHOOTING.md](guides/TROUBLESHOOTING.md)** - General troubleshooting
+- **[guides/WEB_TROUBLESHOOTING.md](guides/WEB_TROUBLESHOOTING.md)** - Web interface troubleshooting
+
+### Web Interface
+- **[guides/WEB_INTERFACE.md](guides/WEB_INTERFACE.md)** - Web interface documentation
diff --git a/README_RAG.md b/docs/README_RAG.md
similarity index 100%
rename from README_RAG.md
rename to docs/README_RAG.md
diff --git a/STRUCTURE.md b/docs/STRUCTURE.md
similarity index 100%
rename from STRUCTURE.md
rename to docs/STRUCTURE.md
diff --git a/TESTING.md b/docs/TESTING.md
similarity index 100%
rename from TESTING.md
rename to docs/TESTING.md
diff --git a/VLLM_MIGRATION.md b/docs/VLLM_MIGRATION.md
similarity index 100%
rename from VLLM_MIGRATION.md
rename to docs/VLLM_MIGRATION.md
diff --git a/docs/api/RAG_API.md b/docs/api/RAG_API.md
new file mode 100644
index 0000000000000000000000000000000000000000..313669e32c0c1cfa390dcff94db9b3bb678c34a8
--- /dev/null
+++ b/docs/api/RAG_API.md
@@ -0,0 +1,244 @@
+# RAG API Documentation
+
+Fast API endpoint for querying the product design RAG system with <3 second response times.
+
+## Quick Start
+
+### Deploy the API
+
+```bash
+# Deploy to Modal
+modal deploy src/rag/rag_api.py
+
+# Get the URL
+modal app list
+```
+
+### Use the API
+
+```python
+from src.rag.api_client import RAGAPIClient
+
+client = RAGAPIClient(base_url="https://your-modal-url.modal.run")
+result = client.query("What are the three product tiers?")
+print(result['answer'])
+```
+
+## API Endpoints
+
+### Health Check
+
+```http
+GET /health
+```
+
+**Response:**
+```json
+{
+  "status": "healthy",
+  "service": "rag-api"
+}
+```
+
+### Query
+
+```http
+POST /query
+Content-Type: application/json
+
+{
+  "question": "What are the three product tiers?",
+  "top_k": 5,
+  "max_tokens": 1024
+}
+```
+
+**Response:**
+```json
+{
+  "answer": "The three product tiers are...",
+  "retrieval_time": 0.45,
+  "generation_time": 1.23,
+  "total_time": 1.68,
+  "sources": [
+    {
+      "content": "...",
+      "metadata": {...}
+    }
+  ],
+  "success": true
+}
+```
+
+## Performance Optimization
+
+### Target: <3 Second Responses
+
+The API is optimized for fast responses:
+
+1. **Warm Containers**: `min_containers=1` keeps a container ready
+2. **Optimized LLM**: Reduced max_tokens (1024 vs 1536)
+3. **Limited Context**: Top 3 documents, 800 chars each
+4. **Prefix Caching**: Enabled for faster generation
+5. **Concurrent Requests**: Up to 10 concurrent requests
+
+### Response Time Breakdown
+
+- **Retrieval**: 0.3-0.8 seconds
+- **Generation**: 1.0-2.0 seconds
+- **Total**: 1.5-3.0 seconds (target: <3s)
+
+## Usage Examples
+
+### Python Client
+
+```python
+from src.rag.api_client import RAGAPIClient
+
+# Initialize
+client = RAGAPIClient(base_url="https://your-api-url.modal.run")
+
+# Health check
+health = client.health_check()
+print(health)
+
+# Query
+result = client.query("What are the premium ranges?")
+print(result['answer'])
+
+# Fast query (optimized for speed)
+result = client.query_fast("What are the three tiers?")
+print(result['answer'])
+```
+
+### cURL
+
+```bash
+# Health check
+curl https://your-api-url.modal.run/health
+
+# Query
+curl -X POST https://your-api-url.modal.run/query \
+  -H "Content-Type: application/json" \
+  -d '{
+    "question": "What are the three product tiers?",
+    "top_k": 5,
+    "max_tokens": 1024
+  }'
+```
+
+### JavaScript/TypeScript
+
+```javascript
+const response = await fetch('https://your-api-url.modal.run/query', {
+  method: 'POST',
+  headers: {
+    'Content-Type': 'application/json',
+  },
+  body: JSON.stringify({
+    question: 'What are the three product tiers?',
+    top_k: 5,
+    max_tokens: 1024
+  })
+});
+
+const data = await response.json();
+console.log(data.answer);
+```
+
+## Configuration
+
+### Environment Variables
+
+- `MODAL_APP_NAME`: App name (default: "insurance-rag-api")
+- `MODAL_VOLUME_NAME`: Volume name (default: "mcp-hack-ins-products")
+
+### API Parameters
+
+- `question` (required): The question to ask
+- `top_k` (optional, default: 5): Number of documents to retrieve
+- `max_tokens` (optional, default: 1024): Maximum response length
+
+## Performance Tips
+
+1. **Use Fast Query**: For speed-critical applications, use `query_fast()` method
+2. **Reduce top_k**: Lower `top_k` (e.g., 3) for faster retrieval
+3. **Reduce max_tokens**: Lower `max_tokens` (e.g., 512) for faster generation
+4. **Cache Results**: Cache common queries client-side
+5. **Batch Requests**: If possible, batch multiple queries
+
+## Error Handling
+
+```python
+result = client.query("your question")
+
+if result.get("success"):
+    print(result['answer'])
+else:
+    print(f"Error: {result.get('error', 'Unknown error')}")
+```
+
+## Monitoring
+
+### Response Times
+
+Monitor the `total_time` field in responses:
+- < 2s: Excellent
+- 2-3s: Good (target)
+- > 3s: May need optimization
+
+### Health Monitoring
+
+```python
+health = client.health_check()
+if health.get("status") != "healthy":
+    # Handle unhealthy state
+    pass
+```
+
+## Deployment
+
+### Modal Deployment
+
+```bash
+# Deploy
+modal deploy src/rag/rag_api.py
+
+# Get URL
+modal app show insurance-rag-api
+```
+
+### Local Testing
+
+```bash
+# Run locally (for development)
+modal serve src/rag/rag_api.py
+```
+
+## Rate Limiting
+
+The API supports up to 10 concurrent requests. For higher throughput:
+- Deploy multiple instances
+- Use load balancer
+- Implement client-side rate limiting
+
+## Security
+
+- Add authentication if needed
+- Use HTTPS in production
+- Implement rate limiting
+- Validate input questions
+
+## Troubleshooting
+
+### Slow Responses (>3s)
+- Check if container is warm (`min_containers=1`)
+- Reduce `max_tokens`
+- Reduce `top_k`
+- Check network latency
+
+### Errors
+- Verify documents are indexed
+- Check Modal app status
+- Review error messages in response
+
diff --git a/docs/deployment/ADD_GUIDES_TO_RAG.md b/docs/deployment/ADD_GUIDES_TO_RAG.md
deleted file mode 100644
index aa673a586e45ef5be3961775d5e174785d3a1902..0000000000000000000000000000000000000000
--- a/docs/deployment/ADD_GUIDES_TO_RAG.md
+++ /dev/null
@@ -1,146 +0,0 @@
-# RAG Indexing Configuration
-
-## Overview
-
-The RAG system indexes **only Word, PDF, and Excel files** containing product design information. **All markdown files are excluded** from indexing to keep the RAG focused on structured product documents.
-
-## Currently Indexed Files
-
-The system automatically indexes files that match these patterns:
-
-1. **Word Documents (.docx):**
-   - Files with `tokyo_auto_insurance` or `product_design` in the filename
-   - Example: `tokyo_auto_insurance_product_design.docx`
-
-2. **PDF Documents (.pdf):**
-   - Files with `tokyo_auto_insurance` or `product_design` in the filename
-   - Example: `tokyo_auto_insurance_product_design.pdf`
-
-3. **Excel Spreadsheets (.xlsx, .xls):**
-   - Files with `tokyo_auto_insurance` or `product_design` in the filename
-   - Example: `tokyo_auto_insurance_product_design.xlsx`
-
-## Excluded Files
-
-The following files are **NOT indexed**:
-
-- ❌ **All markdown files** (`.md`, `.markdown`) - completely excluded
-- ❌ Guide files (e.g., `QUICK_START_RAG.md`, `PRODUCT_DECISION_GUIDE.md`)
-- ❌ Setup guides (e.g., `setup_product_design_rag.md`)
-- ❌ Troubleshooting guides
-- ❌ Web interface guides
-- ❌ Any other file types (`.txt`, `.csv`, `.json`, etc.)
-
-## Files That Will Be Indexed
-
-Based on the current repository structure:
-
-✅ **Will be indexed (if uploaded to Modal volume):**
-- `tokyo_auto_insurance_product_design.docx` (Word document)
-- `tokyo_auto_insurance_product_design.pdf` (PDF document)
-- `tokyo_auto_insurance_product_design.xlsx` (Excel spreadsheet)
-- `tokyo_auto_insurance_product_design.xls` (Excel 97-2003)
-
-❌ **Will NOT be indexed (all excluded):**
-- `tokyo_auto_insurance_product_design.md` (markdown - excluded)
-- `tokyo_auto_insurance_product_design_filled.md` (markdown - excluded)
-- `QUICK_START_RAG.md` (markdown - excluded)
-- `PRODUCT_DECISION_GUIDE.md` (markdown - excluded)
-- `setup_product_design_rag.md` (markdown - excluded)
-- `TROUBLESHOOTING.md` (markdown - excluded)
-- `WEB_INTERFACE.md` (markdown - excluded)
-- All other markdown and non-supported file types
-
-## How to Add More Product Design Files
-
-### Option 1: Use Supported File Formats
-Convert your files to one of the supported formats:
-- **Word**: `.docx` format
-- **PDF**: `.pdf` format
-- **Excel**: `.xlsx` or `.xls` format
-
-**Important:** 
-- The file must contain `tokyo_auto_insurance` **OR** `product_design` in the filename
-- Markdown files (`.md`) are **not supported** and will be ignored
-
-### Option 2: Update the Loader
-Edit `src/rag/modal-rag-product-design.py` and modify the pattern matching:
-
-```python
-# Current pattern for PDF files (line ~81):
-if 'tokyo_auto_insurance' in file_lower or 'product_design' in file_lower:
-    pdf_files.append(full_path)
-
-# To add more patterns, modify to:
-if ('tokyo_auto_insurance' in file_lower or 
-    'product_design' in file_lower or
-    'your_custom_pattern' in file_lower):
-    pdf_files.append(full_path)
-```
-
-**Note:** All markdown files are intentionally excluded. Only Word, PDF, and Excel files are processed.
-
-## Uploading to Modal Volume
-
-To index product design documents, upload **only Word, PDF, or Excel files** to the Modal volume:
-
-```bash
-# Upload Word document
-modal volume put mcp-hack-ins-products \
-  docs/product-design/tokyo_auto_insurance_product_design.docx \
-  docs/product-design/tokyo_auto_insurance_product_design.docx
-
-# Upload PDF document (if you have one)
-modal volume put mcp-hack-ins-products \
-  docs/product-design/tokyo_auto_insurance_product_design.pdf \
-  docs/product-design/tokyo_auto_insurance_product_design.pdf
-
-# Upload Excel spreadsheet (if you have one)
-modal volume put mcp-hack-ins-products \
-  docs/product-design/tokyo_auto_insurance_product_design.xlsx \
-  docs/product-design/tokyo_auto_insurance_product_design.xlsx
-```
-
-**Important Notes:**
-- ❌ **Do NOT upload markdown files** (`.md`) - they will be ignored
-- ✅ Only `.docx`, `.pdf`, `.xlsx`, and `.xls` files are processed
-- ✅ Files must contain `tokyo_auto_insurance` or `product_design` in the filename
-
-## Re-indexing
-
-After uploading new files, re-index:
-
-```bash
-# Using CLI
-python src/web/query_product_design.py --index
-
-# Or direct Modal command
-modal run src/rag/modal-rag-product-design.py::index_product_design
-```
-
-## Benefits of Current Approach
-
-By focusing only on Word, PDF, and Excel files:
-- ✅ RAG answers are focused on structured product documents
-- ✅ No confusion from markdown guide/instruction content
-- ✅ Faster retrieval (smaller, more focused document set)
-- ✅ More accurate product-related answers from official documents
-- ✅ Better handling of tables and structured data (Excel, Word tables)
-- ✅ Cleaner source citations
-- ✅ Support for professional document formats
-
-## Example Queries
-
-With product design documents indexed, you can ask:
-
-```
-"What are the three product tiers and their premium ranges?"
-"What is the Year 3 premium volume projection?"
-"What are the FSA licensing requirements?"
-"What coverage does the Standard tier include?"
-"What is the target market size in Tokyo?"
-"Who are the main competitors?"
-```
-
-The RAG system will retrieve relevant sections from the product design documents only, ensuring answers are focused on product information.
-
diff --git a/docs/guides/HOW_TO_RUN.md b/docs/guides/HOW_TO_RUN.md
deleted file mode 100644
index 9ba7f6bf316ecb5f67f26ee6a94084aed8b8cc4f..0000000000000000000000000000000000000000
--- a/docs/guides/HOW_TO_RUN.md
+++ /dev/null
@@ -1,215 +0,0 @@
-# How to Run the Fine-Tuning Pipeline
-
-This guide walks you through the complete pipeline from data generation to model deployment.
-
----
-
-## 📊 Dataset Generation Results
-
-### Final Statistics
-- **Training Samples**: 201,651
-- **Validation Samples**: 22,407
-- **Total Dataset**: 224,058 high-quality QA pairs
-- **Improvement**: 150x more data than previous approach
-
-### Batch Performance
-| Batch | Files | Data Points | Status |
-|-------|-------|-------------|--------|
-| 1 | 1,000 | 100,611 | ✅ Excellent |
-| 2 | 1,000 | 39,960 | ✅ Good |
-| 3 | 1,000 | 0 | ⚠️ Complex files |
-| 4 | 1,000 | 600 | ⚠️ Runner issue |
-| 5 | 1,000 | 54,627 | ✅ Excellent |
-| 6 | 1,000 | 5,400 | ✅ Good |
-| 7 | 888 | 22,860 | ✅ Good |
-
----
-
-## 🚀 Step-by-Step Instructions
-
-### Step 1: Fine-Tune the Model
-
-Run the fine-tuning job on Modal with H200 GPU:
-
-```bash
-cd /Users/veeru/agents/mcp-hack
-
-# Start fine-tuning in detached mode
-./venv/bin/modal run --detach docs/finetune_modal.py
-```
-
-**What happens:**
-- Loads 201,651 training samples from `finetune-dataset` volume
-- Trains Phi-3-mini-4k-instruct with LoRA on H200 GPU
-- Runs for ~90-120 minutes
-- Saves model to `model-checkpoints` volume
-
-**Monitor progress:**
-```bash
-# View live logs
-modal app logs mcp-hack::finetune-phi3-modal
-```
-
----
-
-### Step 2: Evaluate the Model
-
-After training completes, test the model:
-
-```bash
-./venv/bin/modal run docs/eval_finetuned.py
-```
-
-This will run sample questions and show the model's answers.
-
----
-
-### Step 3: Deploy API Endpoint
-
-Deploy the inference API:
-
-**Option A: GPU Endpoint (A10G)**
-```bash
-./venv/bin/modal deploy docs/api_endpoint.py
-```
-
-**Option B: CPU Endpoint**
-```bash
-./venv/bin/modal deploy docs/api_endpoint_cpu.py
-```
-
-**Get the endpoint URL:**
-```bash
-modal app list
-```
-
----
-
-### Step 4: Test the API
-
-```bash
-# Example API call
-curl -X POST https://YOUR-MODAL-URL/ask \
-  -H "Content-Type: application/json" \
-  -d '{
-    "question": "What is the population of Tokyo?",
-    "context": "Japan Census data"
-  }'
-```
-
----
-
-## 📁 Key Files
-
-### Data Processing
-- `docs/prepare_finetune_data.py` - Generates dataset from CSV files
-- `docs/clean_sample.py` - Local testing script for data cleaning
-
-### Model Training
-- `docs/finetune_modal.py` - Fine-tuning script (H200 GPU)
-- `docs/eval_finetuned.py` - Evaluation script
-
-### API Deployment
-- `docs/api_endpoint.py` - GPU inference endpoint (A10G)
-- `docs/api_endpoint_cpu.py` - CPU inference endpoint
-
-### Documentation
-- `diagrams/finetuning.svg` - Visual pipeline diagram
-- `finetune/04-evaluation.md` - Evaluation results
-
----
-
-## 🔧 Modal Volumes
-
-The pipeline uses these Modal volumes:
-
-| Volume | Purpose | Size |
-|--------|---------|------|
-| `census-data` | Raw census CSV files | 6,838 files |
-| `economy-labor-data` | Raw economy CSV files | 50 files |
-| `finetune-dataset` | Generated JSONL training data | 224K samples |
-| `model-checkpoints` | Fine-tuned model weights | ~7GB |
-
----
-
-## 💡 Tips
-
-### If Training Fails
-```bash
-# Check logs for errors
-modal app logs mcp-hack::finetune-phi3-modal
-
-# Restart training
-./venv/bin/modal run --detach docs/finetune_modal.py
-```
-
-### If You Need to Regenerate Data
-```bash
-# Clear existing dataset
-./venv/bin/modal run docs/clear_dataset.py
-
-# Regenerate with new logic
-./venv/bin/modal run --detach docs/prepare_finetune_data.py
-```
-
-### View Volume Contents
-```bash
-# List files in a volume
-modal volume ls finetune-dataset
-
-# Download a file
-modal volume get finetune-dataset train.jsonl finetune/train.jsonl
-```
-
----
-
-## 📈 Expected Timeline
-
-| Step | Duration | Notes |
-|------|----------|-------|
-| Data Generation | ✅ Complete | 224K samples ready |
-| Fine-Tuning | ~90-120 min | H200 GPU |
-| Evaluation | ~5 min | Quick tests |
-| API Deployment | ~2 min | Instant after deploy |
-
----
-
-## 🎯 Next Steps
-
-1. **Run fine-tuning** (see Step 1 above)
-2. **Wait for completion** (~2 hours)
-3. **Evaluate results** (see Step 2)
-4. **Deploy API** (see Step 3)
-5. **Test with real queries** (see Step 4)
-
----
-
-## 📞 Troubleshooting
-
-**Issue**: "Volume not found"
-```bash
-# List all volumes
-modal volume list
-```
-
-**Issue**: "Out of memory during training"
-- Reduce `per_device_train_batch_size` in `finetune_modal.py`
-- Current: 2 (already optimized for H200)
-
-**Issue**: "Model not loading in API"
-- Ensure fine-tuning completed successfully
-- Check `model-checkpoints` volume has files
-
----
-
-## ✅ Success Criteria
-
-After completing all steps, you should have:
-- ✅ Fine-tuned Phi-3-mini model
-- ✅ Deployed API endpoint
-- ✅ Model answering questions about Japanese census/economy data
-- ✅ Improved accuracy over base model
-
----
-
-**Ready to start?** Run the fine-tuning command from Step 1!
diff --git a/docs/guides/SETUP_SUCCESS.md b/docs/guides/SETUP_SUCCESS.md
deleted file mode 100644
index 5c732394ce99d7522a3e25c9b35e614a95a4d6f7..0000000000000000000000000000000000000000
--- a/docs/guides/SETUP_SUCCESS.md
+++ /dev/null
@@ -1,63 +0,0 @@
-# ✅ RAG Setup Successful!
-
-## Status: Working
-
-The product design RAG system is now fully operational!
-
-### What Was Fixed
-
-1. **File Detection**: Updated to find files in both root and `docs/` subdirectory
-2. **GPU Fallback**: Added CPU fallback for embeddings (works without GPU)
-3. **Word Document**: Markdown file works perfectly (Word file has python-docx issue but markdown has all content)
-4. **Modal Command**: Auto-detects Modal in venv
-
-### Current Status
-
-✅ **Indexed**: 1 document (markdown), 56 chunks  
-✅ **Vector DB**: Created in ChromaDB collection `product_design`  
-✅ **Queries**: Working! Tested successfully  
-
-### Test Results
-
-```bash
-$ python3 query_product_design.py --query "What are the three product tiers?"
-```
-
-**Result**: ✅ Successfully retrieved and answered!
-
-## Usage
-
-### Query the Document
-
-```bash
-# Single query
-python3 query_product_design.py --query "What are the three product tiers?"
-
-# Interactive mode
-python3 query_product_design.py --interactive
-```
-
-### Example Questions
-
-- "What are the three product tiers and their premium ranges?"
-- "What is the Year 3 premium volume projection?"
-- "What coverage does the Standard tier include?"
-- "What are the FSA licensing requirements?"
-
-## Known Issues
-
-1. **Word Document**: The `.docx` file has a python-docx compatibility issue with Modal volumes, but the markdown file contains all the same content and works perfectly.
-
-2. **Answer Truncation**: Some answers may be truncated. This is normal - the system retrieves the most relevant chunks and generates concise answers.
-
-## Next Steps
-
-1. ✅ **Indexing**: Complete
-2. ✅ **Query System**: Working
-3. 🎯 **Ready to Use**: You can now query the product design document!
-
-Try it:
-```bash
-python3 query_product_design.py --interactive
-```
-
diff --git a/docs/guides/SUMMARY.md b/docs/guides/SUMMARY.md
deleted file mode 100644
index e7664df886e8d21071e468501f2a6174a8ceddf6..0000000000000000000000000000000000000000
--- a/docs/guides/SUMMARY.md
+++ /dev/null
@@ -1,114 +0,0 @@
-# ✅ Complete Setup Summary
-
-## What Was Accomplished
-
-### 1. Product Design Document ✅
-- **Created**: Comprehensive 1,600-line product design document
-- **Filled**: All sections with realistic fictional data for "TokyoDrive Insurance"
-- **Formats**: 
-  - Markdown: `docs/tokyo_auto_insurance_product_design_filled.md`
-  - Word: `docs/tokyo_auto_insurance_product_design.docx`
-- **Content**: 12 comprehensive sections covering all aspects of product design
-
-### 2. RAG System Extension ✅
-- **Created**: `src/modal-rag-product-design.py`
-- **Features**:
-  - Supports Markdown and Word documents
-  - Separate ChromaDB collection (doesn't interfere with existing RAG)
-  - GPU-accelerated with Phi-3 model
-  - Integrated with existing Modal infrastructure
-
-### 3. Query Interface ✅
-- **Created**: `query_product_design.py` - Simple CLI tool
-- **Features**:
-  - Interactive mode for continuous queries
-  - Single query mode
-  - Index command
-  - Clean, formatted output
-
-### 4. Documentation ✅
-- `docs/QUICK_START_RAG.md` - Quick start guide
-- `docs/setup_product_design_rag.md` - Detailed setup
-- `docs/next_steps_rag_recommendation.md` - Decision guide
-- `docs/RAG_SETUP_COMPLETE.md` - Complete setup info
-- `README_RAG.md` - Quick reference
-
-## File Structure
-
-```
-mcp-hack/
-├── src/
-│   └── modal-rag-product-design.py    # Extended RAG system
-├── query_product_design.py             # CLI query interface
-├── docs/
-│   ├── tokyo_auto_insurance_product_design_filled.md
-│   ├── tokyo_auto_insurance_product_design.docx
-│   ├── QUICK_START_RAG.md
-│   ├── setup_product_design_rag.md
-│   ├── next_steps_rag_recommendation.md
-│   ├── RAG_SETUP_COMPLETE.md
-│   └── SUMMARY.md (this file)
-└── README_RAG.md                       # Quick reference
-```
-
-## Next Steps to Use
-
-### Step 1: Index Documents (One-Time)
-```bash
-python query_product_design.py --index
-```
-⏱️ Takes 2-5 minutes
-
-### Step 2: Query the Document
-```bash
-# Single query
-python query_product_design.py --query "What are the three product tiers?"
-
-# Interactive mode
-python query_product_design.py --interactive
-```
-
-## Example Use Cases
-
-### For Development
-- Extract technical requirements
-- Get API specifications
-- Understand system architecture
-
-### For Sales/Marketing
-- Get pricing information
-- Understand product features
-- Compare tiers
-
-### For Compliance
-- Check regulatory requirements
-- Get licensing info
-- Understand data privacy rules
-
-### For Financial Planning
-- Get projections
-- Understand cost structure
-- Check break-even analysis
-
-## Key Features
-
-✅ **Comprehensive Document**: 12 sections, 1,600 lines, fully filled with realistic data  
-✅ **RAG System**: Semantic search + LLM for intelligent Q&A  
-✅ **Easy Interface**: Simple CLI tool, no complex setup  
-✅ **Fast Queries**: 3-5 seconds after initial warm-up  
-✅ **Separate Collection**: Doesn't interfere with existing insurance products RAG  
-
-## Status
-
-🎉 **Everything is ready!**
-
-1. ✅ Product design document created and filled
-2. ✅ Documents uploaded to Modal volume
-3. ✅ RAG system extended
-4. ✅ Query interface created
-5. ✅ Documentation complete
-
-**Ready to index and query!**
-
-Run: `python query_product_design.py --index`
-
diff --git a/docs/guides/modal-rag-optimization.md b/docs/guides/modal-rag-optimization.md
deleted file mode 100644
index b69ed0875284b79f40a87ed51203d5ab9fe8c428..0000000000000000000000000000000000000000
--- a/docs/guides/modal-rag-optimization.md
+++ /dev/null
@@ -1,370 +0,0 @@
-# Modal RAG Performance Optimization Guide
-
-**Current Performance**: >1 minute per query  
-**Target Performance**: <5 seconds per query
-
-## 🔍 Performance Bottleneck Analysis
-
-### Current Architecture Issues
-
-1. **Model Loading Time** (~30-45 seconds)
-   - Mistral-7B (13GB) loads on every cold start
-   - Embedding model loads separately
-   - No model caching between requests
-
-2. **LLM Inference Time** (~15-30 seconds)
-   - Mistral-7B is slow for inference
-   - Running on A10G GPU (good, but model is large)
-   - No inference optimization (quantization, etc.)
-
-3. **Network Latency** (~2-5 seconds)
-   - Remote ChromaDB calls
-   - Modal container communication overhead
-
----
-
-## 🚀 Optimization Strategies (Ranked by Impact)
-
-### 1. **Keep Containers Warm** ⭐⭐⭐⭐⭐
-**Impact**: Eliminates 30-45s cold start time
-
-**Current**:
-```python
-min_containers=1  # Already doing this ✅
-```
-
-**Why it helps**: Your container stays loaded with models in memory. First query after deployment is slow, but subsequent queries are fast.
-
-**Cost**: ~$0.50-1.00/hour for warm A10G container
-
----
-
-### 2. **Switch to Smaller/Faster LLM** ⭐⭐⭐⭐⭐
-**Impact**: Reduces inference from 15-30s to 2-5s
-
-**Options**:
-
-#### Option A: Mistral-7B-Instruct-v0.2 (Quantized)
-```python
-from transformers import AutoModelForCausalLM, BitsAndBytesConfig
-
-quantization_config = BitsAndBytesConfig(
-    load_in_4bit=True,
-    bnb_4bit_compute_dtype=torch.float16,
-    bnb_4bit_use_double_quant=True,
-    bnb_4bit_quant_type="nf4"
-)
-
-self.model = AutoModelForCausalLM.from_pretrained(
-    LLM_MODEL,
-    quantization_config=quantization_config,
-    device_map="auto"
-)
-```
-- **Speed**: 3-5x faster (5-10s → 1-3s)
-- **Quality**: Minimal degradation
-- **Memory**: 13GB → 3.5GB
-
-#### Option B: Switch to Phi-3-mini (3.8B)
-```python
-LLM_MODEL = "microsoft/Phi-3-mini-4k-instruct"
-```
-- **Speed**: 5-10x faster than Mistral-7B
-- **Quality**: Good for RAG tasks
-- **Memory**: ~8GB → 4GB
-- **Inference**: 2-4 seconds
-
-#### Option C: Use TinyLlama-1.1B
-```python
-LLM_MODEL = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
-```
-- **Speed**: 10-20x faster
-- **Quality**: Lower, but acceptable for simple queries
-- **Memory**: ~2GB
-- **Inference**: <1 second
-
----
-
-### 3. **Use vLLM for Inference** ⭐⭐⭐⭐
-**Impact**: 2-5x faster inference
-
-```python
-# Install vLLM
-image = modal.Image.debian_slim(python_version="3.11").pip_install(
-    "vllm==0.6.0",
-    # ... other packages
-)
-
-# In RAGModel.enter()
-from vllm import LLM, SamplingParams
-
-self.llm_engine = LLM(
-    model=LLM_MODEL,
-    tensor_parallel_size=1,
-    gpu_memory_utilization=0.9,
-    max_model_len=2048  # Shorter context for speed
-)
-
-# In query method
-sampling_params = SamplingParams(
-    temperature=0.7,
-    max_tokens=256,
-    top_p=0.9
-)
-outputs = self.llm_engine.generate([prompt], sampling_params)
-```
-
-**Benefits**:
-- Continuous batching
-- PagedAttention (efficient memory)
-- Optimized CUDA kernels
-- 2-5x faster than HuggingFace pipeline
-
----
-
-### 4. **Optimize Embedding Generation** ⭐⭐⭐
-**Impact**: Reduces query embedding time from 1-2s to 0.2-0.5s
-
-#### Option A: Use Smaller Embedding Model
-```python
-EMBEDDING_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
-# 384 dimensions vs 384 (bge-small is already good)
-```
-
-#### Option B: Use ONNX Runtime
-```python
-from optimum.onnxruntime import ORTModelForFeatureExtraction
-
-self.embeddings = ORTModelForFeatureExtraction.from_pretrained(
-    EMBEDDING_MODEL,
-    export=True,
-    provider="CUDAExecutionProvider"
-)
-```
-- **Speed**: 2-3x faster
-- **Quality**: Identical
-
----
-
-### 5. **Reduce Context Window** ⭐⭐⭐
-**Impact**: Faster LLM processing
-
-```python
-# In query method
-sampling_params = SamplingParams(
-    max_tokens=128,  # Instead of 256 or 512
-    temperature=0.7
-)
-
-# Reduce retrieved documents
-top_k = 2  # Instead of 3
-```
-
-**Why**: Less tokens to process = faster inference
-
----
-
-### 6. **Cache ChromaDB Queries** ⭐⭐
-**Impact**: Saves 1-2s on repeated queries
-
-```python
-from functools import lru_cache
-import hashlib
-
-@lru_cache(maxsize=100)
-def get_cached_docs(query_hash):
-    return self.retriever.get_relevant_documents(query)
-
-# In query method
-query_hash = hashlib.md5(question.encode()).hexdigest()
-docs = get_cached_docs(query_hash)
-```
-
----
-
-### 7. **Use Faster GPU** ⭐⭐
-**Impact**: 1.5-2x faster inference
-
-```python
-@app.cls(
-    gpu="A100",  # Instead of A10G
-    # or
-    gpu="H100",  # Even faster
-)
-```
-
-**Cost**: A100 is 2-3x more expensive than A10G
-
----
-
-### 8. **Parallel Processing** ⭐⭐
-**Impact**: Overlap embedding + retrieval
-
-```python
-import asyncio
-
-async def query_async(self, question: str):
-    # Run embedding and LLM prep in parallel
-    embedding_task = asyncio.create_task(
-        self.get_query_embedding(question)
-    )
-    
-    # ... rest of async pipeline
-```
-
----
-
-## 🎯 Recommended Implementation Plan
-
-### Phase 1: Quick Wins (Get to <10s)
-1. ✅ **Keep containers warm** (already done)
-2. **Add 4-bit quantization** to Mistral-7B
-3. **Reduce max_tokens** to 128
-4. **Use top_k=2** instead of 3
-
-**Expected**: 60s → 8-12s
-
----
-
-### Phase 2: Major Speedup (Get to <5s)
-1. **Switch to vLLM** for inference
-2. **Use Phi-3-mini** instead of Mistral-7B
-3. **Optimize embeddings** with ONNX
-
-**Expected**: 8-12s → 3-5s
-
----
-
-### Phase 3: Ultra-Fast (Get to <2s)
-1. **Use TinyLlama** for simple queries
-2. **Implement query caching**
-3. **Upgrade to A100 GPU**
-
-**Expected**: 3-5s → 1-2s
-
----
-
-## 📊 Performance Comparison Table
-
-| Configuration | Cold Start | Warm Query | Cost/Hour | Quality |
-|--------------|------------|------------|-----------|---------|
-| **Current** (Mistral-7B, A10G) | 45s | 15-30s | $0.50 | ⭐⭐⭐⭐⭐ |
-| **Phase 1** (Quantized, warm) | 30s | 8-12s | $0.50 | ⭐⭐⭐⭐ |
-| **Phase 2** (vLLM + Phi-3) | 20s | 3-5s | $0.50 | ⭐⭐⭐⭐ |
-| **Phase 3** (TinyLlama, A100) | 10s | 1-2s | $1.50 | ⭐⭐⭐ |
-
----
-
-## 🔧 Code Changes for Phase 2 (Recommended)
-
-### 1. Update model configuration
-```python
-LLM_MODEL = "microsoft/Phi-3-mini-4k-instruct"
-EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"  # Keep same
-```
-
-### 2. Add vLLM to dependencies
-```python
-image = modal.Image.debian_slim(python_version="3.11").pip_install(
-    "vllm==0.6.0",
-    "langchain==0.3.7",
-    # ... rest
-)
-```
-
-### 3. Update RAGModel.enter()
-```python
-from vllm import LLM, SamplingParams
-
-self.llm_engine = LLM(
-    model=LLM_MODEL,
-    tensor_parallel_size=1,
-    gpu_memory_utilization=0.85,
-    max_model_len=2048
-)
-
-self.sampling_params = SamplingParams(
-    temperature=0.7,
-    max_tokens=128,
-    top_p=0.9
-)
-```
-
-### 4. Update query method
-```python
-# Build prompt
-prompt = f"""Use the following context to answer the question.
-
-Context: {context}
-
-Question: {question}
-
-Answer:"""
-
-# Generate with vLLM
-outputs = self.llm_engine.generate([prompt], self.sampling_params)
-answer = outputs[0].outputs[0].text
-```
-
----
-
-## 💰 Cost vs Performance Trade-offs
-
-| Approach | Speed Gain | Cost Change | Implementation |
-|----------|-----------|-------------|----------------|
-| Quantization | 3-5x | $0 | Easy |
-| vLLM | 2-5x | $0 | Medium |
-| Smaller model | 5-10x | $0 | Easy |
-| A100 GPU | 1.5-2x | +200% | Easy |
-| Caching | Variable | $0 | Medium |
-
----
-
-## 🎬 Next Steps
-
-1. **Measure current performance** with logging
-2. **Implement Phase 1** (quantization + reduce tokens)
-3. **Test and measure** improvement
-4. **Implement Phase 2** if needed (vLLM + Phi-3)
-5. **Monitor** and iterate
-
----
-
-## 📝 Performance Monitoring Code
-
-Add this to track performance:
-
-```python
-import time
-
-@modal.method()
-def query(self, question: str, top_k: int = 2):
-    start = time.time()
-    
-    # Embedding time
-    embed_start = time.time()
-    retriever = self.RemoteChromaRetriever(...)
-    embed_time = time.time() - embed_start
-    
-    # Retrieval time
-    retrieval_start = time.time()
-    docs = retriever.get_relevant_documents(question)
-    retrieval_time = time.time() - retrieval_start
-    
-    # LLM time
-    llm_start = time.time()
-    result = chain.invoke({"question": question})
-    llm_time = time.time() - llm_start
-    
-    total_time = time.time() - start
-    
-    print(f"⏱️ Performance:")
-    print(f"  Embedding: {embed_time:.2f}s")
-    print(f"  Retrieval: {retrieval_time:.2f}s")
-    print(f"  LLM: {llm_time:.2f}s")
-    print(f"  Total: {total_time:.2f}s")
-    
-    return result
-```
-
-This will help you identify the exact bottleneck!
diff --git a/docs/guides/modal-rag-sequence.md b/docs/guides/modal-rag-sequence.md
deleted file mode 100644
index 9be43fb1f36d6bfb766cbf753e967d037e6d1ea6..0000000000000000000000000000000000000000
--- a/docs/guides/modal-rag-sequence.md
+++ /dev/null
@@ -1,168 +0,0 @@
-# Modal RAG System - Sequence Diagrams
-
-This document provides sequence diagrams for the Modal RAG (Retrieval Augmented Generation) application.
-
-## 1. Indexing Flow (create_vector_db)
-
-```mermaid
-sequenceDiagram
-    participant User
-    participant Modal
-    participant CreateVectorDB as create_vector_db()
-    participant PDFLoader
-    participant TextSplitter
-    participant Embeddings as HuggingFaceEmbeddings<br/>(CUDA)
-    participant ChromaDB as Remote ChromaDB
-
-    User->>Modal: modal run modal-rag.py::index
-    Modal->>CreateVectorDB: Execute function
-    
-    CreateVectorDB->>PDFLoader: Load PDFs from /insurance-data
-    PDFLoader-->>CreateVectorDB: Return documents
-    
-    CreateVectorDB->>TextSplitter: Split documents (chunk_size=1000)
-    TextSplitter-->>CreateVectorDB: Return chunks
-    
-    CreateVectorDB->>Embeddings: Initialize (device='cuda')
-    CreateVectorDB->>Embeddings: Generate embeddings for chunks
-    Embeddings-->>CreateVectorDB: Return embeddings
-    
-    CreateVectorDB->>ChromaDB: Connect to remote service
-    CreateVectorDB->>ChromaDB: Upsert chunks + embeddings
-    ChromaDB-->>CreateVectorDB: Confirm storage
-    
-    CreateVectorDB-->>Modal: Complete
-    Modal-->>User: Success message
-```
-
-## 2. Query Flow (RAGModel.query)
-
-```mermaid
-sequenceDiagram
-    participant User
-    participant Modal
-    participant QueryEntrypoint as query()
-    participant RAGModel
-    participant Embeddings as HuggingFaceEmbeddings<br/>(CUDA)
-    participant ChromaRetriever as RemoteChromaRetriever
-    participant ChromaDB as Remote ChromaDB
-    participant LLM as Mistral-7B<br/>(A10G GPU)
-    participant RAGChain as LangChain RAG
-
-    User->>Modal: modal run modal-rag.py::query --question "..."
-    Modal->>QueryEntrypoint: Execute local entrypoint
-    QueryEntrypoint->>RAGModel: Instantiate RAGModel()
-    
-    Note over RAGModel: @modal.enter() lifecycle
-    RAGModel->>Embeddings: Load embedding model (CUDA)
-    RAGModel->>ChromaDB: Connect to remote service
-    RAGModel->>LLM: Load Mistral-7B (A10G GPU)
-    RAGModel->>RAGModel: Initialize RemoteChromaRetriever
-    
-    QueryEntrypoint->>RAGModel: query.remote(question)
-    
-    RAGModel->>ChromaRetriever: Create retriever instance
-    RAGModel->>RAGChain: Build RAG chain
-    
-    RAGChain->>ChromaRetriever: Retrieve relevant docs
-    ChromaRetriever->>Embeddings: embed_query(question)
-    Embeddings-->>ChromaRetriever: Query embedding
-    ChromaRetriever->>ChromaDB: query(embedding, k=3)
-    ChromaDB-->>ChromaRetriever: Top-k documents
-    ChromaRetriever-->>RAGChain: Return documents
-    
-    RAGChain->>LLM: Generate answer with context
-    LLM-->>RAGChain: Generated answer
-    RAGChain-->>RAGModel: Return result
-    
-    RAGModel-->>QueryEntrypoint: Return {answer, sources}
-    QueryEntrypoint-->>User: Display answer + sources
-```
-
-## 3. Web Endpoint Flow (RAGModel.web_query)
-
-```mermaid
-sequenceDiagram
-    participant User
-    participant Browser
-    participant Modal as Modal Platform
-    participant WebEndpoint as RAGModel.web_query
-    participant QueryMethod as RAGModel.query
-    participant RAGChain
-    participant ChromaDB
-    participant LLM
-
-    User->>Browser: GET https://.../web_query?question=...
-    Browser->>Modal: HTTP GET request
-    Modal->>WebEndpoint: Route to @modal.fastapi_endpoint
-    
-    WebEndpoint->>QueryMethod: Call query.local(question)
-    
-    Note over QueryMethod,LLM: Same flow as Query diagram
-    QueryMethod->>RAGChain: Build chain
-    RAGChain->>ChromaDB: Retrieve docs
-    RAGChain->>LLM: Generate answer
-    LLM-->>QueryMethod: Return result
-    
-    QueryMethod-->>WebEndpoint: Return {answer, sources}
-    WebEndpoint-->>Modal: JSON response
-    Modal-->>Browser: HTTP 200 + JSON
-    Browser-->>User: Display result
-```
-
-## 4. Container Lifecycle (RAGModel)
-
-```mermaid
-sequenceDiagram
-    participant Modal
-    participant Container
-    participant RAGModel
-    participant GPU as A10G GPU
-    participant Volume as Modal Volume
-    participant ChromaDB
-
-    Modal->>Container: Start container (min_containers=1)
-    Container->>GPU: Allocate GPU
-    Container->>Volume: Mount /insurance-data
-    
-    Container->>RAGModel: Call @modal.enter()
-    
-    Note over RAGModel: Initialization phase
-    RAGModel->>RAGModel: Load HuggingFaceEmbeddings (CUDA)
-    RAGModel->>ChromaDB: Connect to remote service
-    RAGModel->>RAGModel: Load Mistral-7B (GPU)
-    RAGModel->>RAGModel: Create RemoteChromaRetriever class
-    
-    RAGModel-->>Container: Ready
-    Container-->>Modal: Container warm and ready
-    
-    Note over Modal,Container: Container stays warm (min_containers=1)
-    
-    loop Handle requests
-        Modal->>RAGModel: Invoke query() method
-        RAGModel-->>Modal: Return result
-    end
-    
-    Note over Modal,Container: Container persists until scaled down
-```
-
-## Key Components
-
-### Modal Configuration
-- **App Name**: `insurance-rag`
-- **Volume**: `mcp-hack-ins-products` mounted at `/insurance-data`
-- **GPU**: A10G for RAGModel class
-- **Autoscaling**: `min_containers=1`, `max_containers=1` (always warm)
-
-### Models
-- **LLM**: `mistralai/Mistral-7B-Instruct-v0.3` (GPU, float16)
-- **Embeddings**: `BAAI/bge-small-en-v1.5` (GPU, CUDA)
-
-### Storage
-- **Vector DB**: Remote ChromaDB service (`chroma-server-v2`)
-- **Collection**: `insurance_products`
-- **Chunk Size**: 1000 characters with 200 overlap
-
-### Endpoints
-- **Local Entrypoints**: `list`, `index`, `query`
-- **Web Endpoint**: `RAGModel.web_query` (FastAPI GET endpoint)
diff --git a/docs/guides/next_steps_rag_recommendation.md b/docs/guides/next_steps_rag_recommendation.md
deleted file mode 100644
index 6451d29dba8ac30f742165bffcf74c72ef7b6a10..0000000000000000000000000000000000000000
--- a/docs/guides/next_steps_rag_recommendation.md
+++ /dev/null
@@ -1,77 +0,0 @@
-# Next Steps: RAG for Product Design Document
-
-## Should You Add RAG?
-
-**Recommendation: YES, but with specific use cases in mind**
-
-### Benefits of Adding RAG:
-
-1. **Requirements Extraction**: Quickly find specific requirements from the 1,600-line document
-2. **Stakeholder Q&A**: Answer questions like "What's the premium for a 28-year-old in Shibuya?"
-3. **Design Validation**: Query coverage details, pricing tiers, compliance requirements
-4. **Development Planning**: Extract technical requirements, API specs, integration needs
-5. **Competitive Analysis**: Compare your product features vs competitors mentioned in the doc
-
-### When RAG is NOT Needed:
-
-- If you just need to read/search the document manually
-- If the document is small enough to navigate easily
-- If you don't need to answer complex questions across multiple sections
-
-## Implementation Options
-
-### Option 1: Extend Existing Modal RAG (Recommended)
-- Your existing `modal-rag.py` already handles PDFs
-- Can easily add support for markdown/Word documents
-- Leverages existing ChromaDB infrastructure
-- **Effort**: Low (30-60 minutes)
-
-### Option 2: Simple Document Search
-- Use grep/search tools for simple queries
-- **Effort**: None (already available)
-
-### Option 3: Full RAG with Fine-Tuning
-- Fine-tune model on insurance domain + your product spec
-- **Effort**: High (days/weeks)
-- **Benefit**: Best accuracy for insurance-specific queries
-
-## Recommended Next Steps
-
-1. **Add Product Design Doc to RAG** (30 min)
-   - Extend `modal-rag.py` to load markdown/Word docs
-   - Index the filled product design document
-   - Test with sample queries
-
-2. **Create Query Interface** (1-2 hours)
-   - Simple CLI or web interface
-   - Example queries:
-     - "What are the three product tiers and their premium ranges?"
-     - "What coverage does the Standard tier include?"
-     - "What are the Year 3 financial projections?"
-
-3. **Use Cases to Test**:
-   - Requirements extraction for development
-   - Pricing questions for sales team
-   - Compliance checklist generation
-   - Feature comparison queries
-
-## Quick Decision Matrix
-
-| Use Case | RAG Needed? | Alternative |
-|----------|-------------|-------------|
-| Find specific section | ❌ No | Use table of contents |
-| Answer "What's the premium for X?" | ✅ Yes | Manual search |
-| Extract all requirements | ✅ Yes | Manual extraction |
-| Compare product tiers | ✅ Yes | Manual comparison |
-| Generate compliance checklist | ✅ Yes | Manual review |
-| Simple fact lookup | ⚠️ Maybe | Grep/search |
-
-## Recommendation
-
-**Start with Option 1**: Extend your existing RAG to include the product design document. It's low effort, leverages existing infrastructure, and gives you the ability to query the spec as you develop the product.
-
-Would you like me to:
-1. Extend `modal-rag.py` to support the product design document?
-2. Create a simple query interface?
-3. Both?
-
diff --git a/scripts/__init__.py b/src/__init__.py
similarity index 100%
rename from scripts/__init__.py
rename to src/__init__.py
diff --git a/docs/clean_sample.py b/src/data/clean_sample.py
similarity index 100%
rename from docs/clean_sample.py
rename to src/data/clean_sample.py
diff --git a/scripts/data/cleanup_data.py b/src/data/cleanup_data.py
similarity index 100%
rename from scripts/data/cleanup_data.py
rename to src/data/cleanup_data.py
diff --git a/scripts/data/clear_census_volume.py b/src/data/clear_census_volume.py
similarity index 100%
rename from scripts/data/clear_census_volume.py
rename to src/data/clear_census_volume.py
diff --git a/scripts/data/convert_census_to_csv.py b/src/data/convert_census_to_csv.py
similarity index 100%
rename from scripts/data/convert_census_to_csv.py
rename to src/data/convert_census_to_csv.py
diff --git a/scripts/data/convert_economy_labor_to_csv.py b/src/data/convert_economy_labor_to_csv.py
similarity index 100%
rename from scripts/data/convert_economy_labor_to_csv.py
rename to src/data/convert_economy_labor_to_csv.py
diff --git a/scripts/data/convert_to_word.py b/src/data/convert_to_word.py
similarity index 100%
rename from scripts/data/convert_to_word.py
rename to src/data/convert_to_word.py
diff --git a/scripts/data/create_custom_qa.py b/src/data/create_custom_qa.py
similarity index 100%
rename from scripts/data/create_custom_qa.py
rename to src/data/create_custom_qa.py
diff --git a/docs/debug_parser.py b/src/data/debug_parser.py
similarity index 100%
rename from docs/debug_parser.py
rename to src/data/debug_parser.py
diff --git a/scripts/data/delete_census_csvs.py b/src/data/delete_census_csvs.py
similarity index 100%
rename from scripts/data/delete_census_csvs.py
rename to src/data/delete_census_csvs.py
diff --git a/scripts/data/download_census_api.py b/src/data/download_census_api.py
similarity index 100%
rename from scripts/data/download_census_api.py
rename to src/data/download_census_api.py
diff --git a/scripts/data/download_census_csv_modal.py b/src/data/download_census_csv_modal.py
similarity index 100%
rename from scripts/data/download_census_csv_modal.py
rename to src/data/download_census_csv_modal.py
diff --git a/scripts/data/download_census_data.py b/src/data/download_census_data.py
similarity index 100%
rename from scripts/data/download_census_data.py
rename to src/data/download_census_data.py
diff --git a/scripts/data/download_census_modal.py b/src/data/download_census_modal.py
similarity index 100%
rename from scripts/data/download_census_modal.py
rename to src/data/download_census_modal.py
diff --git a/scripts/data/download_economy_labor_modal.py b/src/data/download_economy_labor_modal.py
similarity index 100%
rename from scripts/data/download_economy_labor_modal.py
rename to src/data/download_economy_labor_modal.py
diff --git a/scripts/data/fix_csv_filenames.py b/src/data/fix_csv_filenames.py
similarity index 100%
rename from scripts/data/fix_csv_filenames.py
rename to src/data/fix_csv_filenames.py
diff --git a/scripts/data/prepare_economy_data.py b/src/data/prepare_economy_data.py
similarity index 100%
rename from scripts/data/prepare_economy_data.py
rename to src/data/prepare_economy_data.py
diff --git a/scripts/data/prepare_finetune_data.py b/src/data/prepare_finetune_data.py
similarity index 100%
rename from scripts/data/prepare_finetune_data.py
rename to src/data/prepare_finetune_data.py
diff --git a/scripts/data/remove_duplicate_csvs.py b/src/data/remove_duplicate_csvs.py
similarity index 100%
rename from scripts/data/remove_duplicate_csvs.py
rename to src/data/remove_duplicate_csvs.py
diff --git a/src/rag/api_client.py b/src/rag/api_client.py
new file mode 100644
index 0000000000000000000000000000000000000000..867066b22838f546668ae384a2a7f671236da161
--- /dev/null
+++ b/src/rag/api_client.py
@@ -0,0 +1,103 @@
+"""
+Client library for the RAG API
+Use this to call the API from Python code
+"""
+
+import requests
+from typing import Optional, Dict, List
+
+class RAGAPIClient:
+    """Client for the Product Design RAG API"""
+    
+    def __init__(self, base_url: str = "http://localhost:8000"):
+        """
+        Initialize the API client
+        
+        Args:
+            base_url: Base URL of the RAG API
+        """
+        self.base_url = base_url.rstrip('/')
+    
+    def health_check(self) -> Dict:
+        """Check if the API is healthy"""
+        try:
+            response = requests.get(f"{self.base_url}/health", timeout=5)
+            response.raise_for_status()
+            return response.json()
+        except Exception as e:
+            return {"status": "unhealthy", "error": str(e)}
+    
+    def query(
+        self, 
+        question: str, 
+        top_k: int = 5, 
+        max_tokens: int = 1024,
+        timeout: int = 5
+    ) -> Dict:
+        """
+        Query the RAG system
+        
+        Args:
+            question: The question to ask
+            top_k: Number of documents to retrieve
+            max_tokens: Maximum tokens in response
+            timeout: Request timeout in seconds
+        
+        Returns:
+            Dictionary with answer, timing, and sources
+        """
+        try:
+            response = requests.post(
+                f"{self.base_url}/query",
+                json={
+                    "question": question,
+                    "top_k": top_k,
+                    "max_tokens": max_tokens
+                },
+                timeout=timeout
+            )
+            response.raise_for_status()
+            return response.json()
+        except requests.exceptions.Timeout:
+            return {
+                "success": False,
+                "error": f"Request timed out after {timeout} seconds"
+            }
+        except requests.exceptions.RequestException as e:
+            return {
+                "success": False,
+                "error": f"Request failed: {str(e)}"
+            }
+    
+    def query_fast(self, question: str) -> Dict:
+        """
+        Fast query with optimized settings for <3 second responses
+        
+        Args:
+            question: The question to ask
+        
+        Returns:
+            Dictionary with answer, timing, and sources
+        """
+        return self.query(
+            question=question,
+            top_k=3,  # Fewer docs for speed
+            max_tokens=512,  # Shorter responses
+            timeout=5
+        )
+
+# Example usage
+if __name__ == "__main__":
+    # Initialize client
+    client = RAGAPIClient(base_url="http://localhost:8000")
+    
+    # Health check
+    print("Health check:", client.health_check())
+    
+    # Query
+    result = client.query("What are the three product tiers?")
+    print("\nQuery result:")
+    print(f"Answer: {result.get('answer', 'N/A')}")
+    print(f"Total time: {result.get('total_time', 0):.2f}s")
+    print(f"Success: {result.get('success', False)}")
+
diff --git a/src/rag/rag_api.py b/src/rag/rag_api.py
new file mode 100644
index 0000000000000000000000000000000000000000..3710dee136f5237b47865a5184c8750d3b8ab559
--- /dev/null
+++ b/src/rag/rag_api.py
@@ -0,0 +1,290 @@
+"""
+Fast API endpoint for RAG system - optimized for <3 second responses
+"""
+
+import modal
+
+app = modal.App("insurance-rag-api")
+
+# Reference your specific volume
+vol = modal.Volume.from_name("mcp-hack-ins-products", create_if_missing=True)
+
+# Model configuration
+LLM_MODEL = "microsoft/Phi-3-mini-4k-instruct"
+EMBEDDING_MODEL = "BAAI/bge-small-en-v1.5"
+
+# Build image with dependencies
+image = (
+    modal.Image.debian_slim(python_version="3.11")
+    .pip_install(
+        # Core ML dependencies (compatible versions)
+        "torch>=2.0.0",
+        "transformers>=4.30.0",
+        "sentence-transformers>=2.2.0",
+        "huggingface_hub>=0.15.0",
+
+        # LangChain (compatible versions)
+        "langchain>=0.1.0",
+        "langchain-community>=0.0.13",
+
+        # Document processing
+        "pypdf>=4.0.0",
+        "python-docx>=1.1.0",
+        "openpyxl>=3.1.0",
+        "pandas>=2.0.0",
+        "xlrd>=2.0.0",
+
+        # Vector database
+        "chromadb>=0.4.0",
+
+        # Web framework
+        "fastapi>=0.100.0",
+        "uvicorn[standard]>=0.20.0",
+
+        # LLM inference (vLLM - latest stable)
+        "vllm>=0.4.0",
+
+        # Utilities
+        "cryptography>=41.0.0",
+    )
+)
+
+@app.cls(
+    image=image,
+    volumes={"/insurance-data": vol},
+    gpu="A10G",
+    timeout=30,  # Shorter timeout for API
+    max_containers=2,  # Allow scaling
+    min_containers=1,  # Keep warm for fast responses
+    scaledown_window=300,  # Keep warm for 5 minutes
+)
+class FastRAGService:
+    """Optimized RAG service for fast API responses"""
+    
+    @modal.enter()
+    def enter(self):
+        from langchain_community.embeddings import HuggingFaceEmbeddings
+        from vllm import LLM, SamplingParams
+        from langchain.schema import Document
+        
+        print("🚀 Initializing Fast RAG Service...")
+        
+        # Initialize embeddings (faster model)
+        self.embeddings = HuggingFaceEmbeddings(
+            model_name=EMBEDDING_MODEL,
+            model_kwargs={'device': 'cuda'},
+            encode_kwargs={'normalize_embeddings': True}
+        )
+        
+        # Connect to Chroma
+        self.chroma_service = modal.Cls.from_name("chroma-server-v2", "ChromaDB")()
+        
+        # Custom retriever
+        class RemoteChromaRetriever:
+            def __init__(self, chroma_service, embeddings, k=5):
+                self.chroma_service = chroma_service
+                self.embeddings = embeddings
+                self.k = k
+            
+            def get_relevant_documents(self, query: str):
+                query_embedding = self.embeddings.embed_query(query)
+                results = self.chroma_service.query.remote(
+                    collection_name="product_design",
+                    query_embeddings=[query_embedding],
+                    n_results=self.k
+                )
+                
+                docs = []
+                if results and 'documents' in results and len(results['documents']) > 0:
+                    for i, doc_text in enumerate(results['documents'][0]):
+                        metadata = results.get('metadatas', [[{}]])[0][i] if 'metadatas' in results else {}
+                        docs.append(Document(page_content=doc_text, metadata=metadata))
+                
+                return docs
+        
+        self.Retriever = RemoteChromaRetriever
+        
+        # Load LLM with optimized settings for speed
+        print("   Loading LLM (optimized for speed)...")
+        self.llm_engine = LLM(
+            model=LLM_MODEL,
+            dtype="float16",
+            gpu_memory_utilization=0.9,  # Higher utilization for speed
+            max_model_len=4096,
+            trust_remote_code=True,
+            enforce_eager=True,
+            enable_prefix_caching=True,  # Cache prefixes for faster generation
+        )
+        
+        # Optimized sampling params for speed
+        self.default_sampling_params = SamplingParams(
+            temperature=0.7,
+            max_tokens=1024,  # Reduced from 1536 for faster responses
+            top_p=0.9,
+            stop=["\n\n\n", "Question:", "Context:", "<|end|>"]
+        )
+        
+        print("✅ Fast RAG Service ready!")
+    
+    @modal.method()
+    def query(self, question: str, top_k: int = 5, max_tokens: int = 1024):
+        """Fast query method optimized for <3 second responses"""
+        import time
+        start_time = time.time()
+        
+        # Retrieve documents
+        retrieval_start = time.time()
+        retriever = self.Retriever(
+            chroma_service=self.chroma_service,
+            embeddings=self.embeddings,
+            k=top_k
+        )
+        docs = retriever.get_relevant_documents(question)
+        retrieval_time = time.time() - retrieval_start
+        
+        if not docs:
+            return {
+                "answer": "No relevant information found in the product design document.",
+                "retrieval_time": retrieval_time,
+                "generation_time": 0,
+                "total_time": time.time() - start_time,
+                "sources": [],
+                "success": False
+            }
+        
+        # Build context (limit size for speed)
+        context = "\n\n".join([doc.page_content[:800] for doc in docs[:3]])  # Limit to top 3 docs, 800 chars each
+        
+        # Create prompt
+        prompt = f"""<|system|>
+You are a helpful AI assistant. Answer questions about the TokyoDrive Insurance product design document concisely and accurately.<|end|>
+<|user|>
+Context:
+{context}
+
+Question:
+{question}<|end|>
+<|assistant|>"""
+        
+        # Generate with optimized params
+        from vllm import SamplingParams
+        sampling_params = SamplingParams(
+            temperature=0.7,
+            max_tokens=max_tokens,
+            top_p=0.9,
+            stop=["\n\n\n", "Question:", "Context:", "<|end|>"]
+        )
+        
+        gen_start = time.time()
+        outputs = self.llm_engine.generate(prompts=[prompt], sampling_params=sampling_params)
+        answer = outputs[0].outputs[0].text.strip()
+        generation_time = time.time() - gen_start
+        
+        # Prepare sources (limited for speed)
+        sources = []
+        for doc in docs[:3]:  # Limit to 3 sources
+            sources.append({
+                "content": doc.page_content[:300],
+                "metadata": doc.metadata
+            })
+        
+        total_time = time.time() - start_time
+        
+        return {
+            "answer": answer,
+            "retrieval_time": retrieval_time,
+            "generation_time": generation_time,
+            "total_time": total_time,
+            "sources": sources,
+            "success": True
+        }
+
+# Deploy as web endpoint
+@app.function(
+    image=image,
+    volumes={"/insurance-data": vol},
+    allow_concurrent_inputs=10,  # Handle multiple requests
+)
+@modal.asgi_app()
+def fastapi_app():
+    """Deploy FastAPI app - all imports inside to avoid local dependency issues"""
+    from fastapi import FastAPI, HTTPException
+    from fastapi.middleware.cors import CORSMiddleware
+    from pydantic import BaseModel
+    
+    # Request/Response models
+    class QueryRequest(BaseModel):
+        question: str
+        top_k: int = 5
+        max_tokens: int = 1024  # Reduced for faster responses
+
+    class QueryResponse(BaseModel):
+        answer: str
+        retrieval_time: float
+        generation_time: float
+        total_time: float
+        sources: list
+        success: bool
+    
+    # FastAPI app
+    web_app = FastAPI(title="Product Design RAG API", version="1.0.0")
+    
+    # CORS
+    web_app.add_middleware(
+        CORSMiddleware,
+        allow_origins=["*"],
+        allow_credentials=True,
+        allow_methods=["*"],
+        allow_headers=["*"],
+    )
+    
+    # Initialize RAG service
+    rag_service = FastRAGService()
+    
+    @web_app.get("/health")
+    async def health():
+        """Health check endpoint"""
+        return {"status": "healthy", "service": "rag-api"}
+    
+    @web_app.post("/query", response_model=QueryResponse)
+    async def query_rag(request: QueryRequest):
+        """
+        Query the RAG system - optimized for <3 second responses
+        
+        Args:
+            question: The question to ask
+            top_k: Number of documents to retrieve (default: 5)
+            max_tokens: Maximum tokens in response (default: 1024)
+        
+        Returns:
+            QueryResponse with answer, timing, and sources
+        """
+        try:
+            result = rag_service.query.remote(
+                question=request.question,
+                top_k=request.top_k,
+                max_tokens=request.max_tokens
+            )
+            
+            if not result.get("success", True):
+                raise HTTPException(status_code=404, detail="No relevant information found")
+            
+            return QueryResponse(**result)
+        
+        except Exception as e:
+            raise HTTPException(status_code=500, detail=f"Error processing query: {str(e)}")
+    
+    @web_app.get("/")
+    async def root():
+        """API root endpoint"""
+        return {
+            "service": "Product Design RAG API",
+            "version": "1.0.0",
+            "endpoints": {
+                "health": "/health",
+                "query": "/query (POST)"
+            },
+            "target_response_time": "<3 seconds"
+        }
+    
+    return web_app
diff --git a/scripts/tools/api_endpoint.py b/src/tools/api_endpoint.py
similarity index 100%
rename from scripts/tools/api_endpoint.py
rename to src/tools/api_endpoint.py
diff --git a/scripts/tools/api_endpoint_cpu.py b/src/tools/api_endpoint_cpu.py
similarity index 100%
rename from scripts/tools/api_endpoint_cpu.py
rename to src/tools/api_endpoint_cpu.py
diff --git a/scripts/tools/ask_model.py b/src/tools/ask_model.py
similarity index 100%
rename from scripts/tools/ask_model.py
rename to src/tools/ask_model.py
diff --git a/scripts/tools/debug_list_csv.py b/src/tools/debug_list_csv.py
similarity index 100%
rename from scripts/tools/debug_list_csv.py
rename to src/tools/debug_list_csv.py
diff --git a/scripts/tools/eval_finetuned.py b/src/tools/eval_finetuned.py
similarity index 100%
rename from scripts/tools/eval_finetuned.py
rename to src/tools/eval_finetuned.py
diff --git a/scripts/tools/fill_product_design.py b/src/tools/fill_product_design.py
similarity index 100%
rename from scripts/tools/fill_product_design.py
rename to src/tools/fill_product_design.py
diff --git a/scripts/tools/finetune_modal.py b/src/tools/finetune_modal.py
similarity index 100%
rename from scripts/tools/finetune_modal.py
rename to src/tools/finetune_modal.py
diff --git a/scripts/tools/finetune_modal_simple.py b/src/tools/finetune_modal_simple.py
similarity index 100%
rename from scripts/tools/finetune_modal_simple.py
rename to src/tools/finetune_modal_simple.py
diff --git a/tests/test_api.py b/tests/test_api.py
new file mode 100755
index 0000000000000000000000000000000000000000..f98cc26b7546f94bc2e5a0fce8766be98e7b1dd5
--- /dev/null
+++ b/tests/test_api.py
@@ -0,0 +1,106 @@
+#!/usr/bin/env python3
+"""
+Test the RAG API for <3 second response times
+"""
+
+import sys
+import time
+from pathlib import Path
+
+# Add src to path
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+from src.rag.api_client import RAGAPIClient
+
+def test_api_performance(api_url: str = "http://localhost:8000"):
+    """Test API performance"""
+    print("="*70)
+    print("🧪 RAG API Performance Test")
+    print("="*70)
+    
+    client = RAGAPIClient(base_url=api_url)
+    
+    # Test 1: Health check
+    print("\n1. Health Check...")
+    health = client.health_check()
+    print(f"   Status: {health.get('status', 'unknown')}")
+    
+    if health.get("status") != "healthy":
+        print("❌ API is not healthy. Make sure it's deployed and running.")
+        return
+    
+    # Test 2: Performance test
+    print("\n2. Performance Test (<3s target)...")
+    test_questions = [
+        "What are the three product tiers?",
+        "What is the Year 3 premium volume?",
+        "What coverage does the Standard tier include?",
+    ]
+    
+    results = []
+    for i, question in enumerate(test_questions, 1):
+        print(f"\n   Query {i}: {question[:50]}...")
+        start = time.time()
+        result = client.query(question)
+        elapsed = time.time() - start
+        
+        if result.get("success"):
+            total_time = result.get("total_time", elapsed)
+            retrieval = result.get("retrieval_time", 0)
+            generation = result.get("generation_time", 0)
+            
+            status = "✅" if total_time < 3.0 else "⚠️"
+            print(f"   {status} Total: {total_time:.2f}s (Retrieval: {retrieval:.2f}s, Generation: {generation:.2f}s)")
+            
+            if total_time < 3.0:
+                print(f"   ✅ Meets <3s target!")
+            else:
+                print(f"   ⚠️  Exceeds 3s target by {total_time - 3.0:.2f}s")
+            
+            results.append({
+                "question": question,
+                "total_time": total_time,
+                "retrieval_time": retrieval,
+                "generation_time": generation,
+                "success": True
+            })
+        else:
+            print(f"   ❌ Failed: {result.get('error', 'Unknown error')}")
+            results.append({"success": False})
+    
+    # Summary
+    print("\n" + "="*70)
+    print("📊 Performance Summary")
+    print("="*70)
+    
+    successful = [r for r in results if r.get("success")]
+    if successful:
+        avg_time = sum(r["total_time"] for r in successful) / len(successful)
+        fastest = min(r["total_time"] for r in successful)
+        slowest = max(r["total_time"] for r in successful)
+        
+        print(f"Average response time: {avg_time:.2f}s")
+        print(f"Fastest: {fastest:.2f}s")
+        print(f"Slowest: {slowest:.2f}s")
+        print(f"Target: <3.0s")
+        
+        if avg_time < 3.0:
+            print("\n🎉 API meets performance target!")
+        else:
+            print(f"\n⚠️  API exceeds target by {avg_time - 3.0:.2f}s on average")
+    
+    print("\n" + "="*70)
+
+if __name__ == "__main__":
+    import argparse
+    
+    parser = argparse.ArgumentParser(description="Test RAG API performance")
+    parser.add_argument(
+        "--url",
+        default="http://localhost:8000",
+        help="API URL (default: http://localhost:8000)"
+    )
+    
+    args = parser.parse_args()
+    test_api_performance(args.url)
+