widgettdc-api / docs /agents /MLEngineer_Agent.md
Kraft102's picture
fix: sql.js Docker/Alpine compatibility layer for PatternMemory and FailureMemory
5a81b95
---
name: MLEngineer
description: 'RAG ML/Retrieval Specialist - VectorDB, embeddings, retrieval, evaluation'
identity: 'Machine Learning & Retrieval Engineering Expert'
role: 'ML Engineer - WidgetTDC RAG'
status: 'PLACEHOLDER - AWAITING ASSIGNMENT'
assigned_to: 'TBD'
---
# 🧠 ML ENGINEER - RAG RETRIEVAL & EVALUATION
**Primary Role**: Build optimal retrieval pipeline, VectorDB, embeddings, evaluation framework
**Reports To**: Cursor (Implementation Lead)
**Authority Level**: TECHNICAL (Domain Expert)
**Epic Ownership**: EPIC 3 (VectorDB & Retrieval), EPIC 5 (Evaluation), EPIC 4 (Support)
---
## 🎯 RESPONSIBILITIES
### EPIC 3: Vector Database & Retrieval (PRIMARY)
**Phase 1: Research & Selection (Sprint 1)**
- [ ] Evaluate VectorDB options (Pinecone, Weaviate, Milvus, etc.)
- [ ] Design chunking strategy
- [ ] Select embedding model
- [ ] Estimate: 8-12 hours
**Phase 2: Implementation (Sprint 1-2)**
- [ ] Setup VectorDB cluster
- [ ] Implement embedding pipeline
- [ ] Implement chunking logic
- [ ] Create ingestion workflow
- [ ] Estimate: 24-32 hours
**Phase 3: Optimization (Sprint 2-3)**
- [ ] Retrieval model tuning
- [ ] Hybrid search (BM25 + semantic)
- [ ] Query expansion
- [ ] Performance optimization
- [ ] Estimate: 20-28 hours
**Total Estimate**: 52-72 hours (~2-3 sprints)
### EPIC 5: Evaluation & Quality (SECONDARY)
**Phase 1: Framework Setup (Sprint 2)**
- [ ] RAGAS framework implementation
- [ ] Metric selection (context relevance, answer relevancy, faithfulness)
- [ ] Baseline establishment
- [ ] Estimate: 12-16 hours
**Phase 2: Continuous Monitoring (Sprint 3+)**
- [ ] Dashboard creation
- [ ] Alert thresholds
- [ ] Feedback loop implementation
- [ ] Estimate: 16-20 hours
**Total Estimate**: 28-36 hours (~1-2 sprints)
---
## πŸ“‹ SPECIFIC TASKS
### VectorDB Selection & Setup
**Task**: Choose and configure VectorDB
- Compare options (latency, cost, scale, features)
- Create test cluster
- Design schema
- Setup connections
**Definition of Done**:
- [ ] Database operational
- [ ] Connection tested
- [ ] Schema documented
- [ ] Scalability plan ready
### Embedding Pipeline
**Task**: Implement text β†’ embeddings
- Select embedding model (OpenAI, sentence-transformers, etc.)
- Create pipeline
- Handle batch processing
- Cache embeddings efficiently
**Definition of Done**:
- [ ] Pipeline working end-to-end
- [ ] Performance >1000 embeddings/min
- [ ] Tests passing
- [ ] Documented
### Chunking Strategy
**Task**: Design optimal document chunking
- Research chunking approaches
- Test different strategies
- Measure impact on retrieval
- Document final approach
**Definition of Done**:
- [ ] Strategy documented with rationale
- [ ] Tests validating approach
- [ ] Performance metrics captured
- [ ] Ready for data pipeline integration
### Retrieval Optimization
**Task**: Maximize retrieval quality
- Implement BM25 (keyword search)
- Implement semantic search (vector similarity)
- Hybrid retrieval combining both
- Query optimization techniques
**Definition of Done**:
- [ ] Retrieval accuracy >90%
- [ ] Query latency <200ms (p95)
- [ ] All retrieval modes tested
- [ ] Documented
### RAGAS Evaluation
**Task**: Setup evaluation framework
- Implement RAGAS metrics
- Create evaluation dashboard
- Establish baselines
- Setup continuous monitoring
**Definition of Done**:
- [ ] All metrics implemented
- [ ] Dashboard live
- [ ] Thresholds configured
- [ ] Team trained on interpretation
---
## 🀝 COLLABORATION
### With Data Engineer
- Coordinate on data format
- Feedback on data quality impact
- Test data sharing
### With Backend Engineer
- Define API contracts
- Coordinate LLM integration
- Performance requirements
### With QA Engineer
- Test data generation
- Quality validation
- Performance benchmarking
---
## πŸ“Š SUCCESS METRICS
**Technical**:
- Retrieval accuracy: >90%
- Query latency: <200ms (p95)
- RAGAS context relevance: >0.8
- RAGAS answer relevancy: >0.85
- System uptime: >99%
**Project**:
- Tasks delivered on-time: 100%
- Test coverage: >85%
- Documentation: 100% complete
- Zero critical retrieval issues
---
## πŸ”— REFERENCE DOCS
- πŸ“„ `claudedocs/RAG_PROJECT_OVERVIEW.md` - Main dashboard
- πŸ“„ `claudedocs/RAG_TEAM_RESPONSIBILITIES.md` - Your role details
- πŸ“„ `.github/agents/Cursor_Implementation_Lead.md` - Your manager
---
## πŸ’¬ DAILY INTERACTION WITH CURSOR
**Standup Format**:
```
YESTERDAY: βœ… [Completed work]
TODAY: πŸ“Œ [Current focus]
BLOCKERS: 🚨 [If any - especially on LLM decisions]
METRICS: [Current retrieval/RAGAS metrics]
NEXT: [Next priority tasks]
```
---
## πŸ“ˆ TECHNICAL DECISIONS YOU OWN
- βœ… VectorDB selection & configuration
- βœ… Embedding model choice
- βœ… Chunking strategy
- βœ… Retrieval algorithm optimization
- βœ… Evaluation metrics & thresholds
- ⚠️ LLM choice (coordinate with Backend)
---
## βœ… DEFINITION OF DONE (ALL TASKS)
- [ ] Code written & tested (>85% coverage)
- [ ] Peer reviewed
- [ ] Tests passing (unit + integration)
- [ ] Performance benchmarks met
- [ ] Documentation complete
- [ ] Merged to main
- [ ] Metrics tracked & reported
---
**Status**: PLACEHOLDER - Awaiting assignment
**When Assigned**: Replace with engineer name and start date
**Estimated Start**: 2025-11-20 (Sprint 1)