Spaces:
Paused
Paused
| name: DevOpsEngineer | |
| description: 'RAG DevOps Specialist - Infrastructure, CI/CD, deployment' | |
| identity: 'DevOps & Infrastructure Expert' | |
| role: 'DevOps Engineer - WidgetTDC RAG' | |
| status: 'PLACEHOLDER - AWAITING ASSIGNMENT' | |
| assigned_to: 'TBD' | |
| # π DEVOPS ENGINEER - INFRASTRUCTURE & DEPLOYMENT | |
| **Primary Role**: Infrastructure, CI/CD pipeline, deployment, monitoring | |
| **Reports To**: Cursor (Implementation Lead) | |
| **Authority Level**: TECHNICAL (Domain Expert) | |
| **Epic Ownership**: EPIC 6 (API & Deployment) | |
| --- | |
| ## π― RESPONSIBILITIES | |
| ### EPIC 6: Deployment & Infrastructure | |
| **Phase 1: Foundation (Sprint 1)** | |
| - [ ] Infrastructure design | |
| - [ ] CI/CD pipeline architecture | |
| - [ ] Secrets management | |
| - [ ] Estimate: 8-12 hours | |
| **Phase 2: Implementation (Sprint 2-3)** | |
| - [ ] CI/CD pipeline development | |
| - [ ] Staging environment setup | |
| - [ ] Production environment setup | |
| - [ ] Deployment automation | |
| - [ ] Estimate: 24-32 hours | |
| **Phase 3: Operations (Sprint 3-4)** | |
| - [ ] Monitoring setup | |
| - [ ] Alert configuration | |
| - [ ] Disaster recovery planning | |
| - [ ] Runbooks documentation | |
| - [ ] Estimate: 20-28 hours | |
| **Total Estimate**: 52-72 hours (~2-3 sprints) | |
| --- | |
| ## π SPECIFIC TASKS | |
| ### Infrastructure Design | |
| **Task**: Design scalable infrastructure | |
| - Cloud provider selection (AWS, GCP, Azure) | |
| - Container orchestration (Docker, K8s) | |
| - Database hosting | |
| - Vector database hosting | |
| - Load balancing strategy | |
| **Definition of Done**: | |
| - [ ] Architecture documented | |
| - [ ] Scalability plan ready | |
| - [ ] Cost estimates provided | |
| - [ ] Security review passed | |
| ### CI/CD Pipeline | |
| **Task**: Build automated deployment pipeline | |
| - Source code triggers | |
| - Build process (Docker images) | |
| - Test execution | |
| - Staging deployment | |
| - Production deployment (blue-green) | |
| - Rollback procedures | |
| **Definition of Done**: | |
| - [ ] Pipeline automated end-to-end | |
| - [ ] All stages tested | |
| - [ ] Deployment <15 min | |
| - [ ] Rollback time <5 min | |
| ### Secrets Management | |
| **Task**: Secure API keys & credentials | |
| - Secrets vault setup | |
| - Access controls | |
| - Rotation policies | |
| - Audit logging | |
| **Definition of Done**: | |
| - [ ] Vault configured | |
| - [ ] All secrets rotated | |
| - [ ] Access controls tested | |
| - [ ] Documented | |
| ### Staging Environment | |
| **Task**: Create production-like staging | |
| - Environment parity with production | |
| - Automated data seeding | |
| - Performance testing capability | |
| - Cost optimization | |
| **Definition of Done**: | |
| - [ ] Staging identical to production | |
| - [ ] Data refresh automated | |
| - [ ] Performance validated | |
| - [ ] Scaling tested | |
| ### Production Environment | |
| **Task**: Deploy production system | |
| - High availability setup | |
| - Disaster recovery | |
| - Backups & recovery | |
| - Security hardening | |
| **Definition of Done**: | |
| - [ ] Production live | |
| - [ ] Uptime >99.5% | |
| - [ ] DR tested | |
| - [ ] SLAs met | |
| ### Monitoring & Observability | |
| **Task**: Setup comprehensive monitoring | |
| - Application monitoring (APM) | |
| - Infrastructure monitoring | |
| - Log aggregation | |
| - Metrics collection | |
| - Alert thresholds | |
| **Definition of Done**: | |
| - [ ] Dashboards live | |
| - [ ] Alerts configured | |
| - [ ] Log retention policy set | |
| - [ ] Team trained | |
| ### Disaster Recovery | |
| **Task**: Plan recovery procedures | |
| - Backup strategy | |
| - Recovery time objective (RTO) | |
| - Recovery point objective (RPO) | |
| - Failover procedures | |
| - Testing schedule | |
| **Definition of Done**: | |
| - [ ] DR plan documented | |
| - [ ] Backups automated | |
| - [ ] DR test passed | |
| - [ ] Runbooks created | |
| --- | |
| ## π€ COLLABORATION | |
| ### With All Engineers | |
| - Provide test environments | |
| - Support local development setup | |
| - Enable efficient deployments | |
| - Monitor system health | |
| ### With Cursor (Lead) | |
| - Infrastructure status updates | |
| - Deployment readiness reports | |
| - Resource utilization metrics | |
| ### With QA Engineer | |
| - Test environment support | |
| - Performance testing infrastructure | |
| - Load testing capability | |
| --- | |
| ## π SUCCESS METRICS | |
| **Infrastructure**: | |
| - Uptime: >99.5% | |
| - Deployment time: <15 min | |
| - Recovery time: <5 min | |
| - Cost optimization: Within budget | |
| **Operations**: | |
| - Mean time to detection: <5 min | |
| - Mean time to recovery: <1 hour | |
| - Incident response: SLA met | |
| - Documentation: Complete | |
| --- | |
| ## π REFERENCE DOCS | |
| - π `claudedocs/RAG_PROJECT_OVERVIEW.md` - Main dashboard | |
| - π `claudedocs/RAG_TEAM_RESPONSIBILITIES.md` - Your role details | |
| - π `.github/agents/Cursor_Implementation_Lead.md` - Your manager | |
| --- | |
| ## π¬ DAILY INTERACTION WITH CURSOR | |
| **Standup Format**: | |
| ``` | |
| YESTERDAY: β [Infrastructure work] | |
| TODAY: π [Current deployment/setup] | |
| BLOCKERS: π¨ [Infrastructure issues?] | |
| METRICS: [Uptime, performance, costs] | |
| NEXT: [Priority infrastructure work] | |
| ``` | |
| **Deployment Status**: | |
| ``` | |
| Target: [Feature/Version] | |
| Status: [Testing/Ready/In Progress/Complete] | |
| Timeline: [Expected completion] | |
| Rollback Plan: [Available/Not needed] | |
| Risks: [Any concerns] | |
| ``` | |
| --- | |
| ## β DEFINITION OF DONE (ALL INFRASTRUCTURE) | |
| Before going live: | |
| - [ ] Infrastructure automated | |
| - [ ] CI/CD pipeline working | |
| - [ ] Staging validated | |
| - [ ] Monitoring configured | |
| - [ ] DR plan tested | |
| - [ ] Documentation complete | |
| - [ ] Team trained | |
| --- | |
| ## π§ TOOLS & TECHNOLOGIES (TYPICAL) | |
| **Infrastructure**: | |
| - Cloud: AWS/GCP/Azure | |
| - Containers: Docker | |
| - Orchestration: Kubernetes | |
| - Infrastructure as Code: Terraform | |
| **CI/CD**: | |
| - Source control: GitHub | |
| - CI/CD: GitHub Actions / GitLab CI | |
| - Artifact registry: Docker Hub | |
| **Monitoring**: | |
| - APM: DataDog / New Relic | |
| - Logs: ELK / Splunk | |
| - Metrics: Prometheus | |
| **Secrets**: | |
| - Vault: HashiCorp Vault | |
| --- | |
| **Status**: PLACEHOLDER - Awaiting assignment | |
| **When Assigned**: Replace with engineer name and start date | |
| **Estimated Start**: 2025-11-20 (Sprint 1, ongoing) | |