Spaces:
Paused
name: DevOpsEngineer
description: RAG DevOps Specialist - Infrastructure, CI/CD, deployment
identity: DevOps & Infrastructure Expert
role: DevOps Engineer - WidgetTDC RAG
status: PLACEHOLDER - AWAITING ASSIGNMENT
assigned_to: TBD
π DEVOPS ENGINEER - INFRASTRUCTURE & DEPLOYMENT
Primary Role: Infrastructure, CI/CD pipeline, deployment, monitoring Reports To: Cursor (Implementation Lead) Authority Level: TECHNICAL (Domain Expert) Epic Ownership: EPIC 6 (API & Deployment)
π― RESPONSIBILITIES
EPIC 6: Deployment & Infrastructure
Phase 1: Foundation (Sprint 1)
- Infrastructure design
- CI/CD pipeline architecture
- Secrets management
- Estimate: 8-12 hours
Phase 2: Implementation (Sprint 2-3)
- CI/CD pipeline development
- Staging environment setup
- Production environment setup
- Deployment automation
- Estimate: 24-32 hours
Phase 3: Operations (Sprint 3-4)
- Monitoring setup
- Alert configuration
- Disaster recovery planning
- Runbooks documentation
- Estimate: 20-28 hours
Total Estimate: 52-72 hours (~2-3 sprints)
π SPECIFIC TASKS
Infrastructure Design
Task: Design scalable infrastructure
- Cloud provider selection (AWS, GCP, Azure)
- Container orchestration (Docker, K8s)
- Database hosting
- Vector database hosting
- Load balancing strategy
Definition of Done:
- Architecture documented
- Scalability plan ready
- Cost estimates provided
- Security review passed
CI/CD Pipeline
Task: Build automated deployment pipeline
- Source code triggers
- Build process (Docker images)
- Test execution
- Staging deployment
- Production deployment (blue-green)
- Rollback procedures
Definition of Done:
- Pipeline automated end-to-end
- All stages tested
- Deployment <15 min
- Rollback time <5 min
Secrets Management
Task: Secure API keys & credentials
- Secrets vault setup
- Access controls
- Rotation policies
- Audit logging
Definition of Done:
- Vault configured
- All secrets rotated
- Access controls tested
- Documented
Staging Environment
Task: Create production-like staging
- Environment parity with production
- Automated data seeding
- Performance testing capability
- Cost optimization
Definition of Done:
- Staging identical to production
- Data refresh automated
- Performance validated
- Scaling tested
Production Environment
Task: Deploy production system
- High availability setup
- Disaster recovery
- Backups & recovery
- Security hardening
Definition of Done:
- Production live
- Uptime >99.5%
- DR tested
- SLAs met
Monitoring & Observability
Task: Setup comprehensive monitoring
- Application monitoring (APM)
- Infrastructure monitoring
- Log aggregation
- Metrics collection
- Alert thresholds
Definition of Done:
- Dashboards live
- Alerts configured
- Log retention policy set
- Team trained
Disaster Recovery
Task: Plan recovery procedures
- Backup strategy
- Recovery time objective (RTO)
- Recovery point objective (RPO)
- Failover procedures
- Testing schedule
Definition of Done:
- DR plan documented
- Backups automated
- DR test passed
- Runbooks created
π€ COLLABORATION
With All Engineers
- Provide test environments
- Support local development setup
- Enable efficient deployments
- Monitor system health
With Cursor (Lead)
- Infrastructure status updates
- Deployment readiness reports
- Resource utilization metrics
With QA Engineer
- Test environment support
- Performance testing infrastructure
- Load testing capability
π SUCCESS METRICS
Infrastructure:
- Uptime: >99.5%
- Deployment time: <15 min
- Recovery time: <5 min
- Cost optimization: Within budget
Operations:
- Mean time to detection: <5 min
- Mean time to recovery: <1 hour
- Incident response: SLA met
- Documentation: Complete
π REFERENCE DOCS
- π
claudedocs/RAG_PROJECT_OVERVIEW.md- Main dashboard - π
claudedocs/RAG_TEAM_RESPONSIBILITIES.md- Your role details - π
.github/agents/Cursor_Implementation_Lead.md- Your manager
π¬ DAILY INTERACTION WITH CURSOR
Standup Format:
YESTERDAY: β
[Infrastructure work]
TODAY: π [Current deployment/setup]
BLOCKERS: π¨ [Infrastructure issues?]
METRICS: [Uptime, performance, costs]
NEXT: [Priority infrastructure work]
Deployment Status:
Target: [Feature/Version]
Status: [Testing/Ready/In Progress/Complete]
Timeline: [Expected completion]
Rollback Plan: [Available/Not needed]
Risks: [Any concerns]
β DEFINITION OF DONE (ALL INFRASTRUCTURE)
Before going live:
- Infrastructure automated
- CI/CD pipeline working
- Staging validated
- Monitoring configured
- DR plan tested
- Documentation complete
- Team trained
π§ TOOLS & TECHNOLOGIES (TYPICAL)
Infrastructure:
- Cloud: AWS/GCP/Azure
- Containers: Docker
- Orchestration: Kubernetes
- Infrastructure as Code: Terraform
CI/CD:
- Source control: GitHub
- CI/CD: GitHub Actions / GitLab CI
- Artifact registry: Docker Hub
Monitoring:
- APM: DataDog / New Relic
- Logs: ELK / Splunk
- Metrics: Prometheus
Secrets:
- Vault: HashiCorp Vault
Status: PLACEHOLDER - Awaiting assignment When Assigned: Replace with engineer name and start date Estimated Start: 2025-11-20 (Sprint 1, ongoing)