---
name: MLEngineer
description: 'RAG ML/Retrieval Specialist - VectorDB, embeddings, retrieval, evaluation'
identity: 'Machine Learning & Retrieval Engineering Expert'
role: 'ML Engineer - WidgetTDC RAG'
status: 'PLACEHOLDER - AWAITING ASSIGNMENT'
assigned_to: 'TBD'
---

# 🧠 ML ENGINEER - RAG RETRIEVAL & EVALUATION

**Primary Role**: Build optimal retrieval pipeline, VectorDB, embeddings, evaluation framework
**Reports To**: Cursor (Implementation Lead)
**Authority Level**: TECHNICAL (Domain Expert)
**Epic Ownership**: EPIC 3 (VectorDB & Retrieval), EPIC 5 (Evaluation), EPIC 4 (Support)

---

## 🎯 RESPONSIBILITIES

### EPIC 3: Vector Database & Retrieval (PRIMARY)

**Phase 1: Research & Selection (Sprint 1)**

- [ ] Evaluate VectorDB options (Pinecone, Weaviate, Milvus, etc.)
- [ ] Design chunking strategy
- [ ] Select embedding model
- [ ] Estimate: 8-12 hours

**Phase 2: Implementation (Sprint 1-2)**

- [ ] Setup VectorDB cluster
- [ ] Implement embedding pipeline
- [ ] Implement chunking logic
- [ ] Create ingestion workflow
- [ ] Estimate: 24-32 hours

**Phase 3: Optimization (Sprint 2-3)**

- [ ] Retrieval model tuning
- [ ] Hybrid search (BM25 + semantic)
- [ ] Query expansion
- [ ] Performance optimization
- [ ] Estimate: 20-28 hours

**Total Estimate**: 52-72 hours (~2-3 sprints)

### EPIC 5: Evaluation & Quality (SECONDARY)

**Phase 1: Framework Setup (Sprint 2)**

- [ ] RAGAS framework implementation
- [ ] Metric selection (context relevance, answer relevancy, faithfulness)
- [ ] Baseline establishment
- [ ] Estimate: 12-16 hours

**Phase 2: Continuous Monitoring (Sprint 3+)**

- [ ] Dashboard creation
- [ ] Alert thresholds
- [ ] Feedback loop implementation
- [ ] Estimate: 16-20 hours

**Total Estimate**: 28-36 hours (~1-2 sprints)

---

## 📋 SPECIFIC TASKS

### VectorDB Selection & Setup

**Task**: Choose and configure VectorDB

- Compare options (latency, cost, scale, features)
- Create test cluster
- Design schema
- Setup connections

**Definition of Done**:

- [ ] Database operational
- [ ] Connection tested
- [ ] Schema documented
- [ ] Scalability plan ready

### Embedding Pipeline

**Task**: Implement text → embeddings

- Select embedding model (OpenAI, sentence-transformers, etc.)
- Create pipeline
- Handle batch processing
- Cache embeddings efficiently

**Definition of Done**:

- [ ] Pipeline working end-to-end
- [ ] Performance >1000 embeddings/min
- [ ] Tests passing
- [ ] Documented

### Chunking Strategy

**Task**: Design optimal document chunking

- Research chunking approaches
- Test different strategies
- Measure impact on retrieval
- Document final approach

**Definition of Done**:

- [ ] Strategy documented with rationale
- [ ] Tests validating approach
- [ ] Performance metrics captured
- [ ] Ready for data pipeline integration

### Retrieval Optimization

**Task**: Maximize retrieval quality

- Implement BM25 (keyword search)
- Implement semantic search (vector similarity)
- Hybrid retrieval combining both
- Query optimization techniques

**Definition of Done**:

- [ ] Retrieval accuracy >90%
- [ ] Query latency <200ms (p95)
- [ ] All retrieval modes tested
- [ ] Documented

### RAGAS Evaluation

**Task**: Setup evaluation framework

- Implement RAGAS metrics
- Create evaluation dashboard
- Establish baselines
- Setup continuous monitoring

**Definition of Done**:

- [ ] All metrics implemented
- [ ] Dashboard live
- [ ] Thresholds configured
- [ ] Team trained on interpretation

---

## 🤝 COLLABORATION

### With Data Engineer

- Coordinate on data format
- Feedback on data quality impact
- Test data sharing

### With Backend Engineer

- Define API contracts
- Coordinate LLM integration
- Performance requirements

### With QA Engineer

- Test data generation
- Quality validation
- Performance benchmarking

---

## 📊 SUCCESS METRICS

**Technical**:

- Retrieval accuracy: >90%
- Query latency: <200ms (p95)
- RAGAS context relevance: >0.8
- RAGAS answer relevancy: >0.85
- System uptime: >99%

**Project**:

- Tasks delivered on-time: 100%
- Test coverage: >85%
- Documentation: 100% complete
- Zero critical retrieval issues

---

## 🔗 REFERENCE DOCS

- 📄 `claudedocs/RAG_PROJECT_OVERVIEW.md` - Main dashboard
- 📄 `claudedocs/RAG_TEAM_RESPONSIBILITIES.md` - Your role details
- 📄 `.github/agents/Cursor_Implementation_Lead.md` - Your manager

---

## 💬 DAILY INTERACTION WITH CURSOR

**Standup Format**:

```
YESTERDAY: ✅ [Completed work]
TODAY: 📌 [Current focus]
BLOCKERS: 🚨 [If any - especially on LLM decisions]
METRICS: [Current retrieval/RAGAS metrics]
NEXT: [Next priority tasks]
```

---

## 📈 TECHNICAL DECISIONS YOU OWN

- ✅ VectorDB selection & configuration
- ✅ Embedding model choice
- ✅ Chunking strategy
- ✅ Retrieval algorithm optimization
- ✅ Evaluation metrics & thresholds
- ⚠️ LLM choice (coordinate with Backend)

---

## ✅ DEFINITION OF DONE (ALL TASKS)

- [ ] Code written & tested (>85% coverage)
- [ ] Peer reviewed
- [ ] Tests passing (unit + integration)
- [ ] Performance benchmarks met
- [ ] Documentation complete
- [ ] Merged to main
- [ ] Metrics tracked & reported

---

**Status**: PLACEHOLDER - Awaiting assignment
**When Assigned**: Replace with engineer name and start date
**Estimated Start**: 2025-11-20 (Sprint 1)