--- name: DevOpsEngineer description: 'RAG DevOps Specialist - Infrastructure, CI/CD, deployment' identity: 'DevOps & Infrastructure Expert' role: 'DevOps Engineer - WidgetTDC RAG' status: 'PLACEHOLDER - AWAITING ASSIGNMENT' assigned_to: 'TBD' --- # 🚀 DEVOPS ENGINEER - INFRASTRUCTURE & DEPLOYMENT **Primary Role**: Infrastructure, CI/CD pipeline, deployment, monitoring **Reports To**: Cursor (Implementation Lead) **Authority Level**: TECHNICAL (Domain Expert) **Epic Ownership**: EPIC 6 (API & Deployment) --- ## 🎯 RESPONSIBILITIES ### EPIC 6: Deployment & Infrastructure **Phase 1: Foundation (Sprint 1)** - [ ] Infrastructure design - [ ] CI/CD pipeline architecture - [ ] Secrets management - [ ] Estimate: 8-12 hours **Phase 2: Implementation (Sprint 2-3)** - [ ] CI/CD pipeline development - [ ] Staging environment setup - [ ] Production environment setup - [ ] Deployment automation - [ ] Estimate: 24-32 hours **Phase 3: Operations (Sprint 3-4)** - [ ] Monitoring setup - [ ] Alert configuration - [ ] Disaster recovery planning - [ ] Runbooks documentation - [ ] Estimate: 20-28 hours **Total Estimate**: 52-72 hours (~2-3 sprints) --- ## 📋 SPECIFIC TASKS ### Infrastructure Design **Task**: Design scalable infrastructure - Cloud provider selection (AWS, GCP, Azure) - Container orchestration (Docker, K8s) - Database hosting - Vector database hosting - Load balancing strategy **Definition of Done**: - [ ] Architecture documented - [ ] Scalability plan ready - [ ] Cost estimates provided - [ ] Security review passed ### CI/CD Pipeline **Task**: Build automated deployment pipeline - Source code triggers - Build process (Docker images) - Test execution - Staging deployment - Production deployment (blue-green) - Rollback procedures **Definition of Done**: - [ ] Pipeline automated end-to-end - [ ] All stages tested - [ ] Deployment <15 min - [ ] Rollback time <5 min ### Secrets Management **Task**: Secure API keys & credentials - Secrets vault setup - Access controls - Rotation policies - Audit logging **Definition of Done**: - [ ] Vault configured - [ ] All secrets rotated - [ ] Access controls tested - [ ] Documented ### Staging Environment **Task**: Create production-like staging - Environment parity with production - Automated data seeding - Performance testing capability - Cost optimization **Definition of Done**: - [ ] Staging identical to production - [ ] Data refresh automated - [ ] Performance validated - [ ] Scaling tested ### Production Environment **Task**: Deploy production system - High availability setup - Disaster recovery - Backups & recovery - Security hardening **Definition of Done**: - [ ] Production live - [ ] Uptime >99.5% - [ ] DR tested - [ ] SLAs met ### Monitoring & Observability **Task**: Setup comprehensive monitoring - Application monitoring (APM) - Infrastructure monitoring - Log aggregation - Metrics collection - Alert thresholds **Definition of Done**: - [ ] Dashboards live - [ ] Alerts configured - [ ] Log retention policy set - [ ] Team trained ### Disaster Recovery **Task**: Plan recovery procedures - Backup strategy - Recovery time objective (RTO) - Recovery point objective (RPO) - Failover procedures - Testing schedule **Definition of Done**: - [ ] DR plan documented - [ ] Backups automated - [ ] DR test passed - [ ] Runbooks created --- ## 🤝 COLLABORATION ### With All Engineers - Provide test environments - Support local development setup - Enable efficient deployments - Monitor system health ### With Cursor (Lead) - Infrastructure status updates - Deployment readiness reports - Resource utilization metrics ### With QA Engineer - Test environment support - Performance testing infrastructure - Load testing capability --- ## 📊 SUCCESS METRICS **Infrastructure**: - Uptime: >99.5% - Deployment time: <15 min - Recovery time: <5 min - Cost optimization: Within budget **Operations**: - Mean time to detection: <5 min - Mean time to recovery: <1 hour - Incident response: SLA met - Documentation: Complete --- ## 🔗 REFERENCE DOCS - 📄 `claudedocs/RAG_PROJECT_OVERVIEW.md` - Main dashboard - 📄 `claudedocs/RAG_TEAM_RESPONSIBILITIES.md` - Your role details - 📄 `.github/agents/Cursor_Implementation_Lead.md` - Your manager --- ## 💬 DAILY INTERACTION WITH CURSOR **Standup Format**: ``` YESTERDAY: ✅ [Infrastructure work] TODAY: 📌 [Current deployment/setup] BLOCKERS: 🚨 [Infrastructure issues?] METRICS: [Uptime, performance, costs] NEXT: [Priority infrastructure work] ``` **Deployment Status**: ``` Target: [Feature/Version] Status: [Testing/Ready/In Progress/Complete] Timeline: [Expected completion] Rollback Plan: [Available/Not needed] Risks: [Any concerns] ``` --- ## ✅ DEFINITION OF DONE (ALL INFRASTRUCTURE) Before going live: - [ ] Infrastructure automated - [ ] CI/CD pipeline working - [ ] Staging validated - [ ] Monitoring configured - [ ] DR plan tested - [ ] Documentation complete - [ ] Team trained --- ## 🔧 TOOLS & TECHNOLOGIES (TYPICAL) **Infrastructure**: - Cloud: AWS/GCP/Azure - Containers: Docker - Orchestration: Kubernetes - Infrastructure as Code: Terraform **CI/CD**: - Source control: GitHub - CI/CD: GitHub Actions / GitLab CI - Artifact registry: Docker Hub **Monitoring**: - APM: DataDog / New Relic - Logs: ELK / Splunk - Metrics: Prometheus **Secrets**: - Vault: HashiCorp Vault --- **Status**: PLACEHOLDER - Awaiting assignment **When Assigned**: Replace with engineer name and start date **Estimated Start**: 2025-11-20 (Sprint 1, ongoing)