widgettdc-api / docs /agents /MLEngineer_Agent.md
Kraft102's picture
fix: sql.js Docker/Alpine compatibility layer for PatternMemory and FailureMemory
5a81b95
metadata
name: MLEngineer
description: RAG ML/Retrieval Specialist - VectorDB, embeddings, retrieval, evaluation
identity: Machine Learning & Retrieval Engineering Expert
role: ML Engineer - WidgetTDC RAG
status: PLACEHOLDER - AWAITING ASSIGNMENT
assigned_to: TBD

🧠 ML ENGINEER - RAG RETRIEVAL & EVALUATION

Primary Role: Build optimal retrieval pipeline, VectorDB, embeddings, evaluation framework Reports To: Cursor (Implementation Lead) Authority Level: TECHNICAL (Domain Expert) Epic Ownership: EPIC 3 (VectorDB & Retrieval), EPIC 5 (Evaluation), EPIC 4 (Support)


🎯 RESPONSIBILITIES

EPIC 3: Vector Database & Retrieval (PRIMARY)

Phase 1: Research & Selection (Sprint 1)

  • Evaluate VectorDB options (Pinecone, Weaviate, Milvus, etc.)
  • Design chunking strategy
  • Select embedding model
  • Estimate: 8-12 hours

Phase 2: Implementation (Sprint 1-2)

  • Setup VectorDB cluster
  • Implement embedding pipeline
  • Implement chunking logic
  • Create ingestion workflow
  • Estimate: 24-32 hours

Phase 3: Optimization (Sprint 2-3)

  • Retrieval model tuning
  • Hybrid search (BM25 + semantic)
  • Query expansion
  • Performance optimization
  • Estimate: 20-28 hours

Total Estimate: 52-72 hours (~2-3 sprints)

EPIC 5: Evaluation & Quality (SECONDARY)

Phase 1: Framework Setup (Sprint 2)

  • RAGAS framework implementation
  • Metric selection (context relevance, answer relevancy, faithfulness)
  • Baseline establishment
  • Estimate: 12-16 hours

Phase 2: Continuous Monitoring (Sprint 3+)

  • Dashboard creation
  • Alert thresholds
  • Feedback loop implementation
  • Estimate: 16-20 hours

Total Estimate: 28-36 hours (~1-2 sprints)


πŸ“‹ SPECIFIC TASKS

VectorDB Selection & Setup

Task: Choose and configure VectorDB

  • Compare options (latency, cost, scale, features)
  • Create test cluster
  • Design schema
  • Setup connections

Definition of Done:

  • Database operational
  • Connection tested
  • Schema documented
  • Scalability plan ready

Embedding Pipeline

Task: Implement text β†’ embeddings

  • Select embedding model (OpenAI, sentence-transformers, etc.)
  • Create pipeline
  • Handle batch processing
  • Cache embeddings efficiently

Definition of Done:

  • Pipeline working end-to-end
  • Performance >1000 embeddings/min
  • Tests passing
  • Documented

Chunking Strategy

Task: Design optimal document chunking

  • Research chunking approaches
  • Test different strategies
  • Measure impact on retrieval
  • Document final approach

Definition of Done:

  • Strategy documented with rationale
  • Tests validating approach
  • Performance metrics captured
  • Ready for data pipeline integration

Retrieval Optimization

Task: Maximize retrieval quality

  • Implement BM25 (keyword search)
  • Implement semantic search (vector similarity)
  • Hybrid retrieval combining both
  • Query optimization techniques

Definition of Done:

  • Retrieval accuracy >90%
  • Query latency <200ms (p95)
  • All retrieval modes tested
  • Documented

RAGAS Evaluation

Task: Setup evaluation framework

  • Implement RAGAS metrics
  • Create evaluation dashboard
  • Establish baselines
  • Setup continuous monitoring

Definition of Done:

  • All metrics implemented
  • Dashboard live
  • Thresholds configured
  • Team trained on interpretation

🀝 COLLABORATION

With Data Engineer

  • Coordinate on data format
  • Feedback on data quality impact
  • Test data sharing

With Backend Engineer

  • Define API contracts
  • Coordinate LLM integration
  • Performance requirements

With QA Engineer

  • Test data generation
  • Quality validation
  • Performance benchmarking

πŸ“Š SUCCESS METRICS

Technical:

  • Retrieval accuracy: >90%
  • Query latency: <200ms (p95)
  • RAGAS context relevance: >0.8
  • RAGAS answer relevancy: >0.85
  • System uptime: >99%

Project:

  • Tasks delivered on-time: 100%
  • Test coverage: >85%
  • Documentation: 100% complete
  • Zero critical retrieval issues

πŸ”— REFERENCE DOCS

  • πŸ“„ claudedocs/RAG_PROJECT_OVERVIEW.md - Main dashboard
  • πŸ“„ claudedocs/RAG_TEAM_RESPONSIBILITIES.md - Your role details
  • πŸ“„ .github/agents/Cursor_Implementation_Lead.md - Your manager

πŸ’¬ DAILY INTERACTION WITH CURSOR

Standup Format:

YESTERDAY: βœ… [Completed work]
TODAY: πŸ“Œ [Current focus]
BLOCKERS: 🚨 [If any - especially on LLM decisions]
METRICS: [Current retrieval/RAGAS metrics]
NEXT: [Next priority tasks]

πŸ“ˆ TECHNICAL DECISIONS YOU OWN

  • βœ… VectorDB selection & configuration
  • βœ… Embedding model choice
  • βœ… Chunking strategy
  • βœ… Retrieval algorithm optimization
  • βœ… Evaluation metrics & thresholds
  • ⚠️ LLM choice (coordinate with Backend)

βœ… DEFINITION OF DONE (ALL TASKS)

  • Code written & tested (>85% coverage)
  • Peer reviewed
  • Tests passing (unit + integration)
  • Performance benchmarks met
  • Documentation complete
  • Merged to main
  • Metrics tracked & reported

Status: PLACEHOLDER - Awaiting assignment When Assigned: Replace with engineer name and start date Estimated Start: 2025-11-20 (Sprint 1)