# Configuration Drift Detection Tasks (Expert Tier) # # Each task provisions correct infrastructure via setup_commands, then the # DriftEngine randomly applies a subset of possible_drifts. The agent must # audit the environment, discover which resources drifted, and fix only those. - task_id: 24 description: > The following infrastructure should exist: S3 bucket 'config-store' with versioning enabled, a lifecycle rule named 'expire-old' that expires non-current object versions after 90 days, and server-side encryption using AES256. DynamoDB table 'sessions' with provisioned throughput of 100 RCU and 100 WCU. Some resources may have drifted from the desired specification. Audit the current state and fix any configuration that does not match. desired_state_spec: > S3 bucket 'config-store': versioning=Enabled, lifecycle rule 'expire-old' expiring non-current versions after 90 days, SSE with AES256. DynamoDB table 'sessions': 100 RCU, 100 WCU. setup_commands: - aws s3api create-bucket --bucket config-store - >- aws s3api put-bucket-versioning --bucket config-store --versioning-configuration Status=Enabled - >- aws s3api put-bucket-lifecycle-configuration --bucket config-store --lifecycle-configuration '{"Rules":[{"ID":"expire-old","Status":"Enabled","NoncurrentVersionExpiration":{"NoncurrentDays":90},"Filter":{"Prefix":""}}]}' - >- aws s3api put-bucket-encryption --bucket config-store --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}' - >- aws dynamodb create-table --table-name sessions --attribute-definitions AttributeName=id,AttributeType=S --key-schema AttributeName=id,KeyType=HASH --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100 possible_drifts: - command: >- aws s3api put-bucket-versioning --bucket config-store --versioning-configuration Status=Suspended description: Versioning disabled on 'config-store' - command: >- aws s3api delete-bucket-lifecycle --bucket config-store description: Lifecycle rule removed from 'config-store' - command: >- aws s3api delete-bucket-encryption --bucket config-store description: Encryption removed from 'config-store' - command: >- aws dynamodb update-table --table-name sessions --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=100 description: DynamoDB RCU reduced to 5 - command: >- aws dynamodb update-table --table-name sessions --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=5 description: DynamoDB WCU reduced to 5 success_criteria: services: - s3 - dynamodb state_checks: - command: aws s3api get-bucket-versioning --bucket config-store output_contains: "Enabled" - command: aws s3api get-bucket-lifecycle-configuration --bucket config-store output_contains: "expire-old" - command: aws s3api get-bucket-encryption --bucket config-store output_contains: "AES256" - command: aws dynamodb describe-table --table-name sessions json_path: "$.Table.ProvisionedThroughput.ReadCapacityUnits" expected: 100 - command: aws dynamodb describe-table --table-name sessions json_path: "$.Table.ProvisionedThroughput.WriteCapacityUnits" expected: 100 - task_id: 25 description: > The following infrastructure should exist: SNS topic 'ops-alerts' with an SQS queue 'ops-inbox' subscribed to it. IAM role 'ops-automation' with the AmazonSNSFullAccess and AmazonSQSFullAccess policies attached. Lambda function 'alert-handler' using the 'ops-automation' role. Some resources may have drifted. Audit and fix. desired_state_spec: > SNS topic 'ops-alerts' with SQS subscription 'ops-inbox'. IAM role 'ops-automation' with AmazonSNSFullAccess and AmazonSQSFullAccess. Lambda 'alert-handler' using role 'ops-automation'. setup_commands: - aws sns create-topic --name ops-alerts - aws sqs create-queue --queue-name ops-inbox - >- aws sns subscribe --topic-arn arn:aws:sns:us-east-1:000000000000:ops-alerts --protocol sqs --notification-endpoint arn:aws:sqs:us-east-1:000000000000:ops-inbox - >- aws iam create-role --role-name ops-automation --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' - >- aws iam attach-role-policy --role-name ops-automation --policy-arn arn:aws:iam::aws:policy/AmazonSNSFullAccess - >- aws iam attach-role-policy --role-name ops-automation --policy-arn arn:aws:iam::aws:policy/AmazonSQSFullAccess - >- aws lambda create-function --function-name alert-handler --runtime python3.12 --handler index.handler --role arn:aws:iam::000000000000:role/ops-automation --code S3Bucket=dummy,S3Key=dummy.zip possible_drifts: - command: >- aws iam detach-role-policy --role-name ops-automation --policy-arn arn:aws:iam::aws:policy/AmazonSNSFullAccess description: SNS policy detached from 'ops-automation' - command: >- aws iam detach-role-policy --role-name ops-automation --policy-arn arn:aws:iam::aws:policy/AmazonSQSFullAccess description: SQS policy detached from 'ops-automation' - command: aws lambda delete-function --function-name alert-handler description: Lambda 'alert-handler' deleted success_criteria: services: - sns - sqs - iam - lambda state_checks: - command: aws sns list-subscriptions-by-topic --topic-arn arn:aws:sns:us-east-1:000000000000:ops-alerts output_contains: "ops-inbox" - command: aws iam list-attached-role-policies --role-name ops-automation output_contains: "SNSFullAccess" - command: aws iam list-attached-role-policies --role-name ops-automation output_contains: "SQSFullAccess" - command: aws lambda get-function --function-name alert-handler output_contains: "alert-handler" - task_id: 128 description: > The following infrastructure should exist: IAM role 'api-executor' with AmazonDynamoDBFullAccess and AWSLambdaBasicExecutionRole policies attached. Lambda function 'api-handler' with 256MB memory, 30s timeout, runtime python3.12, and environment variable APP_ENV=production. Some resources may have drifted. Audit the current state and fix any configuration that does not match. desired_state_spec: > IAM role 'api-executor': AmazonDynamoDBFullAccess and AWSLambdaBasicExecutionRole attached. Lambda 'api-handler': 256MB memory, 30s timeout, python3.12, env APP_ENV=production. setup_commands: - >- aws iam create-role --role-name api-executor --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' - >- aws iam attach-role-policy --role-name api-executor --policy-arn arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess - >- aws iam attach-role-policy --role-name api-executor --policy-arn arn:aws:iam::aws:policy/AWSLambdaBasicExecutionRole - >- aws lambda create-function --function-name api-handler --runtime python3.12 --handler index.handler --role arn:aws:iam::000000000000:role/api-executor --code S3Bucket=dummy,S3Key=dummy.zip --memory-size 256 --timeout 30 --environment '{"Variables":{"APP_ENV":"production"}}' possible_drifts: - command: >- aws iam detach-role-policy --role-name api-executor --policy-arn arn:aws:iam::aws:policy/AmazonDynamoDBFullAccess description: DynamoDB policy detached from 'api-executor' - command: >- aws lambda update-function-configuration --function-name api-handler --memory-size 128 description: Lambda memory changed from 256MB to 128MB - command: >- aws lambda update-function-configuration --function-name api-handler --timeout 3 description: Lambda timeout changed from 30s to 3s - command: >- aws lambda update-function-configuration --function-name api-handler --environment '{"Variables":{}}' description: Environment variables removed from 'api-handler' - command: >- aws lambda update-function-configuration --function-name api-handler --runtime python3.9 description: Lambda runtime changed from python3.12 to python3.9 success_criteria: services: - iam - lambda state_checks: - command: aws iam list-attached-role-policies --role-name api-executor output_contains: "DynamoDBFullAccess" - command: aws iam list-attached-role-policies --role-name api-executor output_contains: "LambdaBasicExecutionRole" - command: aws lambda get-function-configuration --function-name api-handler json_path: "$.MemorySize" expected: 256 - command: aws lambda get-function-configuration --function-name api-handler json_path: "$.Timeout" expected: 30 - command: aws lambda get-function-configuration --function-name api-handler json_path: "$.Runtime" expected: "python3.12" - command: aws lambda get-function-configuration --function-name api-handler output_contains: "APP_ENV" - task_id: 129 description: > The following infrastructure should exist: RDS instance 'app-db' with instance class db.t3.micro, engine mysql, multi-AZ enabled, and 7-day backup retention. Secrets Manager secret 'app-db/credentials' with description 'Database credentials for app-db'. Some resources may have drifted. Audit the current state and fix any configuration that does not match. desired_state_spec: > RDS 'app-db': db.t3.micro, mysql, multi-AZ enabled, 7-day backup retention. Secret 'app-db/credentials': description 'Database credentials for app-db'. setup_commands: - >- aws rds create-db-instance --db-instance-identifier app-db --db-instance-class db.t3.micro --engine mysql --master-username admin --master-user-password SecurePass123 --multi-az --backup-retention-period 7 - >- aws secretsmanager create-secret --name app-db/credentials --description 'Database credentials for app-db' --secret-string '{"username":"admin","password":"SecurePass123"}' possible_drifts: - command: >- aws rds modify-db-instance --db-instance-identifier app-db --no-multi-az --apply-immediately description: Multi-AZ disabled on 'app-db' - command: >- aws rds modify-db-instance --db-instance-identifier app-db --backup-retention-period 1 --apply-immediately description: Backup retention changed from 7 days to 1 day - command: >- aws rds modify-db-instance --db-instance-identifier app-db --db-instance-class db.t3.small --apply-immediately description: Instance class changed from db.t3.micro to db.t3.small - command: >- aws secretsmanager update-secret --secret-id app-db/credentials --description '' description: Description removed from secret 'app-db/credentials' success_criteria: services: - rds - secretsmanager state_checks: - command: aws rds describe-db-instances --db-instance-identifier app-db json_path: "$.DBInstances[0].MultiAZ" expected: true - command: aws rds describe-db-instances --db-instance-identifier app-db json_path: "$.DBInstances[0].BackupRetentionPeriod" expected: 7 - command: aws rds describe-db-instances --db-instance-identifier app-db json_path: "$.DBInstances[0].DBInstanceClass" expected: "db.t3.micro" - command: aws secretsmanager describe-secret --secret-id app-db/credentials output_contains: "Database credentials for app-db" - task_id: 131 description: > The following infrastructure should exist: ECS cluster 'web-cluster', task definition 'web-task' (family web-task, container 'app' using nginx:latest on port 80), ECS service 'web-service' with desired count 3. IAM role 'ecs-task-role' with AmazonS3ReadOnlyAccess attached. Some resources may have drifted. Audit the current state and fix any configuration that does not match. desired_state_spec: > ECS cluster 'web-cluster', task definition 'web-task' (nginx:latest, port 80), service 'web-service' desired count 3. IAM role 'ecs-task-role': AmazonS3ReadOnlyAccess attached. setup_commands: - aws ecs create-cluster --cluster-name web-cluster - >- aws iam create-role --role-name ecs-task-role --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"ecs-tasks.amazonaws.com"},"Action":"sts:AssumeRole"}]}' - >- aws iam attach-role-policy --role-name ecs-task-role --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess - >- aws ecs register-task-definition --family web-task --container-definitions '[{"name":"app","image":"nginx:latest","portMappings":[{"containerPort":80}],"memory":256}]' --task-role-arn arn:aws:iam::000000000000:role/ecs-task-role - >- aws ecs create-service --cluster web-cluster --service-name web-service --task-definition web-task --desired-count 3 possible_drifts: - command: >- aws ecs update-service --cluster web-cluster --service web-service --desired-count 0 description: Service desired count changed from 3 to 0 - command: >- aws iam detach-role-policy --role-name ecs-task-role --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess description: S3ReadOnlyAccess policy detached from 'ecs-task-role' - command: >- aws ecs update-service --cluster web-cluster --service web-service --task-definition web-task --desired-count 1 description: Service desired count changed from 3 to 1 success_criteria: services: - ecs - iam state_checks: - command: aws ecs describe-services --cluster web-cluster --services web-service json_path: "$.services[0].desiredCount" expected: 3 - command: aws iam list-attached-role-policies --role-name ecs-task-role output_contains: "S3ReadOnlyAccess" - command: aws iam get-role --role-name ecs-task-role output_contains: "ecs-task-role" - command: aws ecs describe-clusters --clusters web-cluster output_contains: "web-cluster" - task_id: 133 description: > The following infrastructure should exist: SSM parameter '/app/db-host' (type String, value 'db.example.com'), SSM parameter '/app/db-port' (type String, value '5432'). Lambda function 'config-reader' with 128MB memory and 10s timeout. Some resources may have drifted. Audit the current state and fix any configuration that does not match. desired_state_spec: > SSM '/app/db-host': String, 'db.example.com'. SSM '/app/db-port': String, '5432'. Lambda 'config-reader': 128MB memory, 10s timeout. setup_commands: - >- aws ssm put-parameter --name /app/db-host --type String --value db.example.com - >- aws ssm put-parameter --name /app/db-port --type String --value 5432 - >- aws iam create-role --role-name config-reader-role --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' - >- aws lambda create-function --function-name config-reader --runtime python3.12 --handler index.handler --role arn:aws:iam::000000000000:role/config-reader-role --code S3Bucket=dummy,S3Key=dummy.zip --memory-size 128 --timeout 10 possible_drifts: - command: >- aws ssm put-parameter --name /app/db-host --type String --value localhost --overwrite description: SSM '/app/db-host' value changed to 'localhost' - command: >- aws ssm put-parameter --name /app/db-port --type String --value 3306 --overwrite description: SSM '/app/db-port' value changed to '3306' - command: >- aws lambda update-function-configuration --function-name config-reader --memory-size 512 description: Lambda memory changed from 128MB to 512MB - command: >- aws lambda update-function-configuration --function-name config-reader --timeout 60 description: Lambda timeout changed from 10s to 60s - command: aws ssm delete-parameter --name /app/db-port description: SSM parameter '/app/db-port' deleted success_criteria: services: - ssm - lambda state_checks: - command: aws ssm get-parameter --name /app/db-host output_contains: "db.example.com" - command: aws ssm get-parameter --name /app/db-port output_contains: "5432" - command: aws lambda get-function-configuration --function-name config-reader json_path: "$.MemorySize" expected: 128 - command: aws lambda get-function-configuration --function-name config-reader json_path: "$.Timeout" expected: 10 - task_id: 134 description: > The following infrastructure should exist: EventBridge rule 'nightly-cleanup' with schedule expression 'rate(1 day)' in enabled state, targeting Lambda function 'cleanup-handler'. Lambda 'cleanup-handler' with 256MB memory and 300s timeout. Some resources may have drifted. Audit the current state and fix any configuration that does not match. desired_state_spec: > EventBridge rule 'nightly-cleanup': schedule 'rate(1 day)', ENABLED. Lambda 'cleanup-handler': 256MB memory, 300s timeout, target of rule. setup_commands: - >- aws iam create-role --role-name cleanup-handler-role --assume-role-policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Allow","Principal":{"Service":"lambda.amazonaws.com"},"Action":"sts:AssumeRole"}]}' - >- aws lambda create-function --function-name cleanup-handler --runtime python3.12 --handler index.handler --role arn:aws:iam::000000000000:role/cleanup-handler-role --code S3Bucket=dummy,S3Key=dummy.zip --memory-size 256 --timeout 300 - >- aws events put-rule --name nightly-cleanup --schedule-expression 'rate(1 day)' --state ENABLED - >- aws events put-targets --rule nightly-cleanup --targets '[{"Id":"cleanup-target","Arn":"arn:aws:lambda:us-east-1:000000000000:function:cleanup-handler"}]' possible_drifts: - command: aws events disable-rule --name nightly-cleanup description: EventBridge rule 'nightly-cleanup' disabled - command: >- aws events put-rule --name nightly-cleanup --schedule-expression 'rate(7 days)' --state ENABLED description: Schedule changed from 'rate(1 day)' to 'rate(7 days)' - command: >- aws events remove-targets --rule nightly-cleanup --ids cleanup-target description: Lambda target removed from rule 'nightly-cleanup' - command: >- aws lambda update-function-configuration --function-name cleanup-handler --timeout 30 description: Lambda timeout changed from 300s to 30s - command: >- aws lambda update-function-configuration --function-name cleanup-handler --memory-size 128 description: Lambda memory changed from 256MB to 128MB success_criteria: services: - events - lambda state_checks: - command: aws events describe-rule --name nightly-cleanup output_contains: "ENABLED" - command: aws events describe-rule --name nightly-cleanup output_contains: "rate(1 day)" - command: aws events list-targets-by-rule --rule nightly-cleanup output_contains: "cleanup-handler" - command: aws lambda get-function-configuration --function-name cleanup-handler json_path: "$.MemorySize" expected: 256 - command: aws lambda get-function-configuration --function-name cleanup-handler json_path: "$.Timeout" expected: 300 - task_id: 135 description: > The following infrastructure should exist: S3 bucket 'analytics-raw' with versioning enabled and AES256 server-side encryption. Firehose delivery stream 'clickstream-firehose' delivering to 'analytics-raw' with prefix 'raw/' and buffer size of 5 MiB. Some resources may have drifted. Audit the current state and fix any configuration that does not match. desired_state_spec: > S3 'analytics-raw': versioning=Enabled, SSE with AES256. Firehose 'clickstream-firehose': destination analytics-raw, prefix 'raw/', buffer 5 MiB. setup_commands: - aws s3api create-bucket --bucket analytics-raw - >- aws s3api put-bucket-versioning --bucket analytics-raw --versioning-configuration Status=Enabled - >- aws s3api put-bucket-encryption --bucket analytics-raw --server-side-encryption-configuration '{"Rules":[{"ApplyServerSideEncryptionByDefault":{"SSEAlgorithm":"AES256"}}]}' - >- aws firehose create-delivery-stream --delivery-stream-name clickstream-firehose --s3-destination-configuration '{"RoleARN":"arn:aws:iam::000000000000:role/firehose-role","BucketARN":"arn:aws:s3:::analytics-raw","Prefix":"raw/","BufferingHints":{"SizeInMBs":5,"IntervalInSeconds":300}}' possible_drifts: - command: >- aws s3api put-bucket-versioning --bucket analytics-raw --versioning-configuration Status=Suspended description: Versioning suspended on 'analytics-raw' - command: aws s3api delete-bucket-encryption --bucket analytics-raw description: Encryption removed from 'analytics-raw' success_criteria: services: - firehose - s3 state_checks: - command: aws s3api get-bucket-versioning --bucket analytics-raw output_contains: "Enabled" - command: aws s3api get-bucket-encryption --bucket analytics-raw output_contains: "AES256" - command: aws firehose describe-delivery-stream --delivery-stream-name clickstream-firehose output_contains: "raw/" - command: aws firehose describe-delivery-stream --delivery-stream-name clickstream-firehose output_contains: "analytics-raw" - task_id: 139 description: > The following infrastructure should exist: DynamoDB table 'users' with provisioned throughput of 50 RCU and 50 WCU. DynamoDB table 'transactions' with provisioned throughput of 100 RCU and 100 WCU, and a global secondary index 'date-index' on the 'date' attribute provisioned at 100 RCU / 100 WCU. Some resources may have drifted from the desired specification. Audit the current state and fix any configuration that does not match. desired_state_spec: > DynamoDB 'users': 50 RCU, 50 WCU. DynamoDB 'transactions': 100 RCU, 100 WCU, GSI 'date-index' at 100 RCU / 100 WCU. setup_commands: - >- aws dynamodb create-table --table-name users --attribute-definitions AttributeName=id,AttributeType=S --key-schema AttributeName=id,KeyType=HASH --provisioned-throughput ReadCapacityUnits=50,WriteCapacityUnits=50 - >- aws dynamodb create-table --table-name transactions --attribute-definitions AttributeName=id,AttributeType=S AttributeName=date,AttributeType=S --key-schema AttributeName=id,KeyType=HASH --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100 --global-secondary-indexes '[{"IndexName":"date-index","KeySchema":[{"AttributeName":"date","KeyType":"HASH"}],"Projection":{"ProjectionType":"ALL"},"ProvisionedThroughput":{"ReadCapacityUnits":100,"WriteCapacityUnits":100}}]' possible_drifts: - command: >- aws dynamodb update-table --table-name users --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=50 description: Users table RCU reduced to 5 - command: >- aws dynamodb update-table --table-name users --provisioned-throughput ReadCapacityUnits=50,WriteCapacityUnits=5 description: Users table WCU reduced to 5 - command: >- aws dynamodb update-table --table-name transactions --provisioned-throughput ReadCapacityUnits=10,WriteCapacityUnits=100 description: Transactions table RCU reduced to 10 - command: >- aws dynamodb update-table --table-name transactions --provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=10 description: Transactions table WCU reduced to 10 - command: >- aws dynamodb update-table --table-name transactions --global-secondary-index-updates '[{"Update":{"IndexName":"date-index","ProvisionedThroughput":{"ReadCapacityUnits":5,"WriteCapacityUnits":5}}}]' description: GSI 'date-index' throughput reduced to 5 RCU / 5 WCU success_criteria: services: - dynamodb state_checks: - command: aws dynamodb describe-table --table-name users json_path: "$.Table.ProvisionedThroughput.ReadCapacityUnits" expected: 50 - command: aws dynamodb describe-table --table-name users json_path: "$.Table.ProvisionedThroughput.WriteCapacityUnits" expected: 50 - command: aws dynamodb describe-table --table-name transactions json_path: "$.Table.ProvisionedThroughput.ReadCapacityUnits" expected: 100 - command: aws dynamodb describe-table --table-name transactions json_path: "$.Table.ProvisionedThroughput.WriteCapacityUnits" expected: 100 - command: aws dynamodb describe-table --table-name transactions json_path: "$.Table.GlobalSecondaryIndexes[0].ProvisionedThroughput.ReadCapacityUnits" expected: 100