div18 commited on
Commit
85041bd
·
1 Parent(s): b379876

feat(deploy): add AWS deployment scripts and configurations for AntiAtropos

Browse files

- Add deploy/aws/deploy.sh to automate full AWS deployment on EKS
- Create eksctl cluster config with managed node groups and addons
- Add Kubernetes manifests for namespace, configmap, deployment, service,
secret, and optional ALB ingress
- Provide Prometheus agent Helm values for remote write to AMP workspace
- Include Grafana dashboard JSON placeholders for overview and live views
- Add IAM trust policy JSON for Grafana role setup
- Write comprehensive README.md with step-by-step AWS deployment guide,
instructions, and troubleshooting tips
- Enable IRSA setup for Prometheus and AntiAtropos service accounts
- Integrate building and pushing AntiAtropos Docker image to ECR
- Support ALB ingress with AWS Load Balancer Controller annotations
- Configurefeat(deploy/aws): add complete AWS deployment setup for AntiAtropos

- Add deploy script to create E metrics scraping, liveness/readiness probes, and resource limitsKS cluster, AMP workspace, IAM roles, and deploy app
- Provide eksctl cluster config with managed node

deploy/aws/README.md ADDED
@@ -0,0 +1,365 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # AntiAtropos AWS Deployment Guide
2
+
3
+ Complete guide for deploying AntiAtropos on AWS with EKS, Amazon Managed Prometheus (AMP), and Amazon Managed Grafana (AMG).
4
+
5
+ ## Architecture
6
+
7
+ ```
8
+ AWS Region (us-east-1)
9
+ ├── EKS Cluster
10
+ │ ├── AntiAtropos FastAPI pod
11
+ │ ├── Prometheus Agent pod (remote-writes to AMP)
12
+ │ └── Sample workload pods (optional, for live mode)
13
+ ├── Amazon Managed Prometheus (AMP)
14
+ │ └── Workspace: antiatropos-metrics
15
+ ├── Amazon Managed Grafana (AMG)
16
+ │ └── Workspace: antiatropos-dashboards
17
+ ├── ALB (Application Load Balancer)
18
+ │ └── / → FastAPI, /grafana → AMG
19
+ └── ECR (Container Registry)
20
+ └── antiatropos:latest
21
+ ```
22
+
23
+ ---
24
+
25
+ ## Phase 0: Prerequisites
26
+
27
+ ```bash
28
+ # Install CLI tools (if not already installed)
29
+ # AWS CLI v2
30
+ curl "https://awscli.amazonaws.com/AWSCLIV2.msi" -o "AWSCLIV2.msi"
31
+ msiexec /i AWSCLIV2.msi
32
+
33
+ # eksctl (EKS management)
34
+ choco install eksctl # or: winget install --id=FluxCD.eksctl
35
+
36
+ # kubectl
37
+ choco install kubernetes-cli
38
+
39
+ # Helm
40
+ choco install kubernetes-helm
41
+
42
+ # Authenticate AWS
43
+ aws configure
44
+ # Enter: Access Key ID, Secret Access Key, Region (us-east-1), Output (json)
45
+ ```
46
+
47
+ ---
48
+
49
+ ## Phase 1: Create the EKS Cluster
50
+
51
+ ### Option A: eksctl (recommended, fastest)
52
+
53
+ Create file `deploy/aws/eksctl-cluster.yaml` then run:
54
+
55
+ ```bash
56
+ eksctl create cluster -f deploy/aws/eksctl-cluster.yaml
57
+ ```
58
+
59
+ ### Option B: AWS Console
60
+
61
+ 1. Go to EKS → Create Cluster
62
+ 2. Name: `antiatropos`, Kubernetes 1.30
63
+ 3. Cluster service role: Create new (let EKS create it)
64
+ 4. Networking: Default VPC, all AZs
65
+ 5. Add node group: `linux-nodes`, t3.medium, 2-4 nodes
66
+ 6. Create and wait ~15 minutes
67
+
68
+ ### Verify
69
+
70
+ ```bash
71
+ aws eks update-kubeconfig --name antiatropos --region us-east-1
72
+ kubectl get nodes
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Phase 2: Set Up Amazon Managed Prometheus (AMP)
78
+
79
+ ### Create AMP Workspace
80
+
81
+ ```bash
82
+ aws amp create-workspace \
83
+ --alias antiatropos-metrics \
84
+ --region us-east-1
85
+
86
+ # Note the workspace ARN and ID from the output
87
+ aws amp list-workspaces --alias antiatropos-metrics --region us-east-1
88
+ ```
89
+
90
+ ### Install Prometheus Agent on EKS (remote-writes to AMP)
91
+
92
+ ```bash
93
+ # Add the Prometheus Community Helm repo
94
+ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
95
+ helm repo update
96
+
97
+ # Install prometheus agent with AMP remote write
98
+ # Replace WORKSPACE_ID with your AMP workspace ID
99
+ helm install prometheus-agent prometheus-community/prometheus \
100
+ --namespace monitoring --create-namespace \
101
+ -f deploy/aws/prometheus-agent-values.yaml \
102
+ --set prometheus.prometheusSpec.remoteWrite[0].url="https://aps-workspaces.us-east-1.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/remote_write"
103
+ ```
104
+
105
+ ### Verify AMP is Receiving Data
106
+
107
+ ```bash
108
+ # Port-forward to query AMP directly
109
+ aws amp query-status --workspace-id WORKSPACE_ID --region us-east-1
110
+
111
+ # Or use awscurl for instant queries
112
+ pip install awscurl
113
+ awscurl --service aps "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query?query=up" --region us-east-1
114
+ ```
115
+
116
+ ---
117
+
118
+ ## Phase 3: Set Up Amazon Managed Grafana (AMG)
119
+
120
+ ### Create AMG Workspace
121
+
122
+ ```bash
123
+ # First, create the IAM role for Grafana (allows it to read AMP)
124
+ aws iam create-role \
125
+ --role-name AntiAtroposGrafanaRole \
126
+ --assume-role-policy-document file://deploy/aws/grafana-trust-policy.json
127
+
128
+ aws iam attach-role-policy \
129
+ --role-name AntiAtroposGrafanaRole \
130
+ --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusQueryAccess
131
+
132
+ aws iam attach-role-policy \
133
+ --role-name AntiAtroposGrafanaRole \
134
+ --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess
135
+
136
+ # Create the Grafana workspace
137
+ aws grafana create-workspace \
138
+ --workspace-name antiatropos-dashboards \
139
+ --account-access-type CURRENT_ACCOUNT \
140
+ --authentication-method AWS_SSO \
141
+ --permission-type SERVICE_MANAGED \
142
+ --data-sources PROMETHEUS \
143
+ --region us-east-1
144
+
145
+ # Note the workspace URL from the output
146
+ aws grafana list-workspaces --region us-east-1
147
+ ```
148
+
149
+ ### Add AMP as a Data Source in AMG
150
+
151
+ 1. Open the AMG workspace URL in your browser
152
+ 2. Sign in with AWS SSO
153
+ 3. Go to Configuration → Data Sources
154
+ 4. AMP should auto-discover if in same account/region
155
+ 5. Select the `antiatropos-metrics` workspace
156
+
157
+ ### Import AntiAtropos Dashboards
158
+
159
+ ```bash
160
+ # Use the Grafana API to import dashboards
161
+ # Replace GRAFANA_URL and API_KEY
162
+ GRAFANA_URL="https://YOUR-WORKSPACE-id.grafana.us-east-1.amazonaws.com"
163
+ API_KEY="YOUR-API-KEY"
164
+
165
+ # Import the overview dashboard
166
+ curl -X POST "$GRAFANA_URL/api/dashboards/db" \
167
+ -H "Authorization: Bearer $API_KEY" \
168
+ -H "Content-Type: application/json" \
169
+ -d @deploy/aws/grafana-dashboard-overview.json
170
+
171
+ # Import the live dashboard
172
+ curl -X POST "$GRAFANA_URL/api/dashboards/db" \
173
+ -H "Authorization: Bearer $API_KEY" \
174
+ -H "Content-Type: application/json" \
175
+ -d @deploy/aws/grafana-dashboard-live.json
176
+ ```
177
+
178
+ ---
179
+
180
+ ## Phase 4: Build and Push the Docker Image
181
+
182
+ ```bash
183
+ # Create ECR repository
184
+ aws ecr create-repository \
185
+ --repository-name antiatropos \
186
+ --region us-east-1
187
+
188
+ # Login to ECR
189
+ aws ecr get-login-password --region us-east-1 | \
190
+ docker login --username AWS --password-stdin \
191
+ $(aws sts get-caller-identity --query Account --output text).dkr.ecr.us-east-1.amazonaws.com
192
+
193
+ # Build and push
194
+ ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
195
+ ECR_URI=$ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/antiatropos
196
+
197
+ docker build -t antiatropos:latest .
198
+ docker tag antiatropos:latest $ECR_URI:latest
199
+ docker push $ECR_URI:latest
200
+ ```
201
+
202
+ ---
203
+
204
+ ## Phase 5: Deploy AntiAtropos to EKS
205
+
206
+ ```bash
207
+ # Apply Kubernetes manifests
208
+ kubectl apply -f deploy/aws/k8s-namespace.yaml
209
+ kubectl apply -f deploy/aws/k8s-configmap.yaml
210
+ kubectl apply -f deploy/aws/k8s-deployment.yaml
211
+ kubectl apply -f deploy/aws/k8s-service.yaml
212
+
213
+ # If using AWS Load Balancer Controller (recommended)
214
+ kubectl apply -f deploy/aws/k8s-ingress.yaml
215
+
216
+ # Check rollout
217
+ kubectl rollout status deployment/antiatropos -n antiatropos
218
+ kubectl get pods -n antiatropos
219
+ kubectl logs -f deployment/antiatropos -n antiatropos
220
+ ```
221
+
222
+ ### Environment Variables for Live Mode
223
+
224
+ The deployment manifest sets these to connect AntiAtropos to real infrastructure:
225
+
226
+ ```yaml
227
+ env:
228
+ - name: ANTIATROPOS_ENV_MODE
229
+ value: "live"
230
+ - name: PROMETHEUS_URL
231
+ value: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/WORKSPACE_ID"
232
+ - name: KUBECONFIG
233
+ value: "" # Empty = use in-cluster config
234
+ - name: ANTIATROPOS_WORKLOAD_MAP
235
+ value: '{"node-0":{"deployment":"payments","namespace":"prod-sre"},"node-1":{"deployment":"checkout","namespace":"prod-sre"}}'
236
+ ```
237
+
238
+ ---
239
+
240
+ ## Phase 6: Access Your Deployment
241
+
242
+ ### Get the ALB URL
243
+
244
+ ```bash
245
+ kubectl get ingress -n antiatropos
246
+ # Copy the ADDRESS column
247
+ ```
248
+
249
+ ### Endpoints
250
+
251
+ | Endpoint | URL |
252
+ |---|---|
253
+ | Landing Page | `http://ALB_ADDRESS/` |
254
+ | API Health | `http://ALB_ADDRESS/health` |
255
+ | Prometheus Metrics | `http://ALB_ADDRESS/metrics` |
256
+ | Grafana Dashboards | AMG workspace URL (separate) |
257
+
258
+ ### Port-Forward for Local Debugging
259
+
260
+ ```bash
261
+ # FastAPI
262
+ kubectl port-forward -n antiatropos deployment/antiatropos 8000:8000
263
+
264
+ # Direct pod metrics
265
+ curl http://localhost:8000/metrics
266
+ ```
267
+
268
+ ---
269
+
270
+ ## Phase 7: IRSA (IAM Roles for Service Accounts)
271
+
272
+ This lets the AntiAtropos pod authenticate with AMP without hardcoded credentials.
273
+
274
+ ```bash
275
+ # Create OIDC provider for the EKS cluster
276
+ eksctl utils associate-iam-oidc-provider \
277
+ --cluster antiatropos --region us-east-1 --approve
278
+
279
+ # Create IAM role for the AntiAtropos service account
280
+ eksctl create iamserviceaccount \
281
+ --cluster antiatropos \
282
+ --namespace antiatropos \
283
+ --name antiatropos-sa \
284
+ --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusQueryAccess \
285
+ --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
286
+ --approve \
287
+ --override-existing-serviceaccounts
288
+
289
+ # Redeploy to pick up the new service account
290
+ kubectl rollout restart deployment/antiatropos -n antiatropos
291
+ ```
292
+
293
+ ---
294
+
295
+ ## Cost Estimates
296
+
297
+ | Resource | Config | Monthly Cost (approx) |
298
+ |---|---|---|
299
+ | EKS Control Plane | 1 cluster | $73 |
300
+ | EKS Nodes | 2x t3.medium | $60 |
301
+ | AMP | <10GB ingest | ~$3-5 |
302
+ | AMG | 1 editor + viewers | Free tier or ~$9 |
303
+ | ALB | 1 load balancer | $16 |
304
+ | ECR | <1GB storage | <$1 |
305
+ | **Total** | | **~$150-160/month** |
306
+
307
+ ### Cost-Saving Tips
308
+
309
+ - Use `t3.spot` for node groups (60-70% cheaper)
310
+ - Scale nodes to 0 when not training: `kubectl cordon` + drain
311
+ - Use Fargate profiles for the AntiAtropos pod (pay-per-pod-second)
312
+ - Delete the cluster between training runs with `eksctl delete cluster`
313
+
314
+ ---
315
+
316
+ ## Teardown
317
+
318
+ ```bash
319
+ # Delete everything in reverse order
320
+ kubectl delete -f deploy/aws/k8s-ingress.yaml
321
+ kubectl delete -f deploy/aws/k8s-service.yaml
322
+ kubectl delete -f deploy/aws/k8s-deployment.yaml
323
+ kubectl delete -f deploy/aws/k8s-configmap.yaml
324
+ kubectl delete -f deploy/aws/k8s-namespace.yaml
325
+
326
+ aws grafana delete-workspace --workspace-id AMG_WORKSPACE_ID
327
+ aws amp delete-workspace --workspace-id AMP_WORKSPACE_ID
328
+ aws ecr delete-repository --repository-name antiatropos --force
329
+
330
+ eksctl delete cluster --name antiatropos --region us-east-1
331
+ ```
332
+
333
+ ---
334
+
335
+ ## Troubleshooting
336
+
337
+ ### Pods not starting
338
+ ```bash
339
+ kubectl describe pod -n antiatropos -l app=antiatropos
340
+ kubectl logs -n antiatropos -l app=antiatropos --previous
341
+ ```
342
+
343
+ ### AMP not receiving metrics
344
+ ```bash
345
+ # Check the prometheus agent logs
346
+ kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus
347
+
348
+ # Verify remote-write endpoint
349
+ aws amp describe-workspace --workspace-id WORKSPACE_ID
350
+ ```
351
+
352
+ ### Can't reach AMP from pod
353
+ ```bash
354
+ # Verify IRSA is attached
355
+ kubectl get pod -n antiatropos -o yaml | grep -A5 serviceAccount
356
+
357
+ # Check pod can reach AMP
358
+ kubectl exec -n antiatropos deployment/antiatropos -- \
359
+ curl -s "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/query?query=up"
360
+ ```
361
+
362
+ ### Grafana dashboard shows no data
363
+ 1. Verify the data source URL in AMG points to the correct AMP workspace
364
+ 2. Check time range (AMP has a retention period; default 30 days)
365
+ 3. Verify the PromQL queries in dashboards match your metric names
deploy/aws/deploy.sh ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ # AntiAtropos AWS Quick Deploy Script
3
+ #
4
+ # Prerequisites: aws cli, eksctl, kubectl, helm, docker
5
+ #
6
+ # Usage:
7
+ # chmod +x deploy/aws/deploy.sh
8
+ # ./deploy/aws/deploy.sh
9
+ #
10
+ # This script creates all AWS resources and deploys AntiAtropos to EKS.
11
+ # Set these environment variables before running:
12
+ # OPENAI_API_KEY - Your OpenAI API key (required)
13
+ # AWS_REGION - AWS region (default: us-east-1)
14
+ # CLUSTER_NAME - EKS cluster name (default: antiatropos)
15
+
16
+ set -euo pipefail
17
+
18
+ REGION="${AWS_REGION:-us-east-1}"
19
+ CLUSTER_NAME="${CLUSTER_NAME:-antiatropos}"
20
+ AWS_DIR="$(cd "$(dirname "$0")" && pwd)"
21
+
22
+ echo "=== AntiAtropos AWS Deployment ==="
23
+ echo "Region: $REGION"
24
+ echo "Cluster: $CLUSTER_NAME"
25
+ echo ""
26
+
27
+ # --- Check prerequisites ---
28
+ for cmd in aws eksctl kubectl helm docker; do
29
+ if ! command -v "$cmd" &>/dev/null; then
30
+ echo "ERROR: $cmd is not installed. Please install it first."
31
+ exit 1
32
+ fi
33
+ done
34
+
35
+ if [ -z "${OPENAI_API_KEY:-}" ]; then
36
+ echo "ERROR: OPENAI_API_KEY environment variable is not set."
37
+ exit 1
38
+ fi
39
+
40
+ # --- Phase 1: Create EKS Cluster ---
41
+ echo ""
42
+ echo ">>> Phase 1: Creating EKS cluster..."
43
+ if eksctl get cluster --name "$CLUSTER_NAME" --region "$REGION" &>/dev/null; then
44
+ echo "Cluster $CLUSTER_NAME already exists, skipping creation."
45
+ else
46
+ eksctl create cluster -f "$AWS_DIR/eksctl-cluster.yaml"
47
+ echo "Cluster created."
48
+ fi
49
+
50
+ aws eks update-kubeconfig --name "$CLUSTER_NAME" --region "$REGION"
51
+ echo "kubeconfig updated."
52
+
53
+ # --- Phase 2: Create AMP Workspace ---
54
+ echo ""
55
+ echo ">>> Phase 2: Creating Amazon Managed Prometheus workspace..."
56
+ AMP_WS_ID=$(aws amp list-workspaces --alias antiatropos-metrics --region "$REGION" --query 'workspaces[0].workspaceId' --output text 2>/dev/null || echo "")
57
+
58
+ if [ -z "$AMP_WS_ID" ] || [ "$AMP_WS_ID" = "None" ]; then
59
+ AMP_WS_ID=$(aws amp create-workspace \
60
+ --alias antiatropos-metrics \
61
+ --region "$REGION" \
62
+ --query 'workspaceId' \
63
+ --output text)
64
+ echo "AMP workspace created: $AMP_WS_ID"
65
+ else
66
+ echo "AMP workspace already exists: $AMP_WS_ID"
67
+ fi
68
+
69
+ AMP_URL="https://aps-workspaces.$REGION.amazonaws.com/workspaces/$AMP_WS_ID"
70
+ echo "AMP URL: $AMP_URL"
71
+
72
+ # --- Phase 3: Set up IAM Roles for Service Accounts (IRSA) ---
73
+ echo ""
74
+ echo ">>> Phase 3: Setting up IRSA..."
75
+ CLUSTER_OIDC=$(aws eks describe-cluster --name "$CLUSTER_NAME" --region "$REGION" --query 'cluster.identity.oidc.issuer' --output text | sed 's|https://||')
76
+ ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
77
+
78
+ # Prometheus service account
79
+ if kubectl get serviceaccount prometheus-sa -n monitoring &>/dev/null; then
80
+ echo "prometheus-sa already exists."
81
+ else
82
+ eksctl create iamserviceaccount \
83
+ --cluster "$CLUSTER_NAME" \
84
+ --namespace monitoring \
85
+ --name prometheus-sa \
86
+ --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
87
+ --approve \
88
+ --override-existing-serviceaccounts
89
+ echo "prometheus-sa created."
90
+ fi
91
+
92
+ # AntiAtropos service account
93
+ if kubectl get serviceaccount antiatropos-sa -n antiatropos &>/dev/null; then
94
+ echo "antiatropos-sa already exists."
95
+ else
96
+ eksctl create iamserviceaccount \
97
+ --cluster "$CLUSTER_NAME" \
98
+ --namespace antiatropos \
99
+ --name antiatropos-sa \
100
+ --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusQueryAccess \
101
+ --approve \
102
+ --override-existing-serviceaccounts
103
+ echo "antiatropos-sa created."
104
+ fi
105
+
106
+ # --- Phase 4: Install Prometheus Agent ---
107
+ echo ""
108
+ echo ">>> Phase 4: Installing Prometheus Agent (remote-writes to AMP)..."
109
+ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 2>/dev/null || true
110
+ helm repo update
111
+
112
+ if helm status prometheus-agent -n monitoring &>/dev/null; then
113
+ echo "prometheus-agent already installed, upgrading..."
114
+ helm upgrade prometheus-agent prometheus-community/prometheus \
115
+ --namespace monitoring \
116
+ -f "$AWS_DIR/prometheus-agent-values.yaml" \
117
+ --set "prometheus.prometheusSpec.remoteWrite[0].url=$AMP_URL/api/v1/remote_write"
118
+ else
119
+ helm install prometheus-agent prometheus-community/prometheus \
120
+ --namespace monitoring --create-namespace \
121
+ -f "$AWS_DIR/prometheus-agent-values.yaml" \
122
+ --set "prometheus.prometheusSpec.remoteWrite[0].url=$AMP_URL/api/v1/remote_write"
123
+ echo "prometheus-agent installed."
124
+ fi
125
+
126
+ # --- Phase 5: Build and Push Docker Image ---
127
+ echo ""
128
+ echo ">>> Phase 5: Building and pushing Docker image to ECR..."
129
+ ECR_URI="$ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/antiatropos"
130
+
131
+ if aws ecr describe-repositories --repository-name antiatropos --region "$REGION" &>/dev/null; then
132
+ echo "ECR repository already exists."
133
+ else
134
+ aws ecr create-repository --repository-name antiatropos --region "$REGION"
135
+ echo "ECR repository created."
136
+ fi
137
+
138
+ aws ecr get-login-password --region "$REGION" | \
139
+ docker login --username AWS --password-stdin \
140
+ "$ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com"
141
+
142
+ docker build -t antiatropos:latest "$AWS_DIR/../.."
143
+ docker tag antiatropos:latest "$ECR_URI:latest"
144
+ docker push "$ECR_URI:latest"
145
+ echo "Image pushed to $ECR_URI:latest"
146
+
147
+ # --- Phase 6: Deploy AntiAtropos ---
148
+ echo ""
149
+ echo ">>> Phase 6: Deploying AntiAtropos to EKS..."
150
+
151
+ # Apply namespace first
152
+ kubectl apply -f "$AWS_DIR/k8s-namespace.yaml"
153
+
154
+ # Create/update configmap with the real AMP URL
155
+ kubectl create configmap antiatropos-config \
156
+ --from-literal=ANTIATROPOS_ENV_MODE=live \
157
+ --from-literal=ANTIATROPOS_STRICT_REAL=false \
158
+ --from-literal="PROMETHEUS_URL=$AMP_URL" \
159
+ --from-literal=KUBECONFIG="" \
160
+ --from-literal=ANTIATROPOS_K8S_NAMESPACE=default \
161
+ --from-literal=ANTIATROPOS_DEPLOYMENT_PREFIX="" \
162
+ --from-literal=ANTIATROPOS_MIN_REPLICAS=1 \
163
+ --from-literal=ANTIATROPOS_MAX_REPLICAS=20 \
164
+ --from-literal=ANTIATROPOS_SCALE_STEP=3 \
165
+ --from-literal=ANTIATROPOS_WORKLOAD_MAP='{}' \
166
+ --from-literal=ANTIATROPOS_NODE_DEPLOYMENT_MAP='{}' \
167
+ --from-literal=ANTIATROPOS_PROM_TIMEOUT_S=5.0 \
168
+ --from-literal=ANTIATROPOS_METRIC_AGGREGATION=sum \
169
+ -n antiatropos \
170
+ --dry-run=client -o yaml | kubectl apply -f -
171
+
172
+ # Create secret with OpenAI API key
173
+ kubectl create secret generic antiatropos-secrets \
174
+ --from-literal=openai-api-key="$OPENAI_API_KEY" \
175
+ -n antiatropos \
176
+ --dry-run=client -o yaml | kubectl apply -f -
177
+
178
+ # Apply deployment with the correct ECR image
179
+ sed "s|ACCOUNT_ID|$ACCOUNT_ID|g; s|us-east-1|$REGION|g" "$AWS_DIR/k8s-deployment.yaml" | kubectl apply -f -
180
+ kubectl apply -f "$AWS_DIR/k8s-service.yaml"
181
+
182
+ # Wait for rollout
183
+ echo "Waiting for deployment to be ready..."
184
+ kubectl rollout status deployment/antiatropos -n antiatropos --timeout=300s
185
+
186
+ # --- Done ---
187
+ echo ""
188
+ echo "=========================================="
189
+ echo " AntiAtropos AWS Deployment Complete!"
190
+ echo "=========================================="
191
+ echo ""
192
+ echo "AMP Workspace ID: $AMP_WS_ID"
193
+ echo "AMP URL: $AMP_URL"
194
+ echo ""
195
+ echo "Next steps:"
196
+ echo " 1. Create an Amazon Managed Grafana workspace:"
197
+ echo " aws grafana create-workspace --workspace-name antiatropos-dashboards \\"
198
+ echo " --account-access-type CURRENT_ACCOUNT --authentication-method AWS_SSO \\"
199
+ echo " --permission-type SERVICE_MANAGED --data-sources PROMETHEUS --region $REGION"
200
+ echo ""
201
+ echo " 2. Add AMP as a data source in AMG and import dashboards from:"
202
+ echo " deploy/grafana/provisioning/dashboards/json/"
203
+ echo ""
204
+ echo " 3. Get the AntiAtropos service URL:"
205
+ echo " kubectl get svc -n antiatropos antiatropos"
206
+ echo ""
207
+ echo " 4. Port-forward for local testing:"
208
+ echo " kubectl port-forward -n antiatropos deployment/antiatropos 8000:8000"
deploy/aws/eksctl-cluster.yaml ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: eksctl.io/v1alpha5
2
+ kind: ClusterConfig
3
+
4
+ metadata:
5
+ name: antiatropos
6
+ region: us-east-1
7
+ version: "1.30"
8
+ tags:
9
+ Project: AntiAtropos
10
+ Environment: production
11
+
12
+ iam:
13
+ withOIDC: true
14
+
15
+ addons:
16
+ - name: vpc-cni
17
+ version: latest
18
+ - name: coredns
19
+ version: latest
20
+ - name: kube-proxy
21
+ version: latest
22
+ - name: aws-ebs-csi-driver
23
+ version: latest
24
+ wellKnownPolicies:
25
+ ebsCSIController: true
26
+
27
+ managedNodeGroups:
28
+ - name: linux-nodes
29
+ instanceType: t3.medium
30
+ desiredCapacity: 2
31
+ minSize: 1
32
+ maxSize: 4
33
+ volumeSize: 50
34
+ volumeType: gp3
35
+ labels:
36
+ role: worker
37
+ tags:
38
+ Project: AntiAtropos
39
+ NodeGroup: linux-nodes
40
+ iam:
41
+ withAddonPolicies:
42
+ ebs: true
43
+ cloudWatch: true
44
+ autoScaler: true
45
+
46
+ # Optional: Spot instance group for cost savings
47
+ # - name: spot-nodes
48
+ # instanceType: t3.medium
49
+ # desiredCapacity: 1
50
+ # minSize: 0
51
+ # maxSize: 3
52
+ # volumeSize: 30
53
+ # volumeType: gp3
54
+ # spot: true
55
+ # labels:
56
+ # role: spot-worker
57
+ # tags:
58
+ # Project: AntiAtropos
59
+
60
+ cloudWatch:
61
+ clusterLogging:
62
+ enableTypes:
63
+ - api
64
+ - audit
65
+ - authenticator
deploy/aws/grafana-dashboard-live.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dashboard": {
3
+ "__inputs": [],
4
+ "__requires": [],
5
+ "annotations": {
6
+ "list": []
7
+ },
8
+ "description": "AntiAtropos SRE Environment Live Dashboard - AWS Deployment. Import the full dashboard from deploy/grafana/provisioning/dashboards/json/antiatropos-live.json via the AMG UI (Dashboards > Import > Upload JSON file). After import, change the data source from the local Prometheus (UID: PBFA97CFB590B2093) to your AMP workspace data source.",
9
+ "editable": true,
10
+ "fiscalYearStartMonth": 0,
11
+ "graphTooltip": 1,
12
+ "id": null,
13
+ "links": [],
14
+ "panels": [
15
+ {
16
+ "title": "Import the full dashboard",
17
+ "type": "text",
18
+ "gridPos": {"h": 4, "w": 24, "x": 0, "y": 0},
19
+ "options": {
20
+ "mode": "markdown",
21
+ "content": "## Import Instructions\n\nThis is a placeholder panel. Import the full dashboard from:\n\n```\ndeploy/grafana/provisioning/dashboards/json/antiatropos-live.json\n```\n\nvia the AMG UI: **Dashboards > Import > Upload JSON file**\n\nAfter import, change the data source from the local Prometheus (UID: `PBFA97CFB590B2093`) to your AMP workspace data source."
22
+ }
23
+ }
24
+ ],
25
+ "schemaVersion": 39,
26
+ "tags": ["antiatropos", "sre", "aws", "live"],
27
+ "templating": {"list": []},
28
+ "time": {"from": "now-5m", "to": "now"},
29
+ "timepicker": {},
30
+ "timezone": "utc",
31
+ "title": "AntiAtropos Live (AWS)",
32
+ "uid": "antiatropos-live-aws",
33
+ "version": 0,
34
+ "refresh": "5s"
35
+ },
36
+ "overwrite": true
37
+ }
deploy/aws/grafana-dashboard-overview.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dashboard": {
3
+ "__inputs": [],
4
+ "__requires": [],
5
+ "annotations": {
6
+ "list": []
7
+ },
8
+ "description": "AntiAtropos SRE Environment Overview - AWS Deployment. Import the full dashboard from deploy/grafana/provisioning/dashboards/json/antiatropos-overview.json via the AMG UI (Dashboards > Import > Upload JSON file). After import, change the data source from the local Prometheus (UID: PBFA97CFB590B2093) to your AMP workspace data source.",
9
+ "editable": true,
10
+ "fiscalYearStartMonth": 0,
11
+ "graphTooltip": 1,
12
+ "id": null,
13
+ "links": [],
14
+ "panels": [
15
+ {
16
+ "title": "Import the full dashboard",
17
+ "type": "text",
18
+ "gridPos": {"h": 4, "w": 24, "x": 0, "y": 0},
19
+ "options": {
20
+ "mode": "markdown",
21
+ "content": "## Import Instructions\n\nThis is a placeholder panel. Import the full dashboard from:\n\n```\ndeploy/grafana/provisioning/dashboards/json/antiatropos-overview.json\n```\n\nvia the AMG UI: **Dashboards > Import > Upload JSON file**\n\nAfter import, change the data source from the local Prometheus (UID: `PBFA97CFB590B2093`) to your AMP workspace data source."
22
+ }
23
+ }
24
+ ],
25
+ "schemaVersion": 39,
26
+ "tags": ["antiatropos", "sre", "aws"],
27
+ "templating": {"list": []},
28
+ "time": {"from": "now-1h", "to": "now"},
29
+ "timepicker": {},
30
+ "timezone": "utc",
31
+ "title": "AntiAtropos Overview (AWS)",
32
+ "uid": "antiatropos-overview-aws",
33
+ "version": 0
34
+ },
35
+ "overwrite": true
36
+ }
deploy/aws/grafana-trust-policy.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "Version": "2012-10-17",
3
+ "Statement": [
4
+ {
5
+ "Effect": "Allow",
6
+ "Principal": {
7
+ "Service": "grafana.amazonaws.com"
8
+ },
9
+ "Action": "sts:AssumeRole",
10
+ "Condition": {
11
+ "StringEquals": {
12
+ "sts:ExternalId": "YOUR_ACCOUNT_ID"
13
+ }
14
+ }
15
+ }
16
+ ]
17
+ }
deploy/aws/k8s-configmap.yaml ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: v1
2
+ kind: ConfigMap
3
+ metadata:
4
+ name: antiatropos-config
5
+ namespace: antiatropos
6
+ data:
7
+ # Environment mode: "simulated" for mock, "live" for real K8s + Prometheus
8
+ ANTIATROPOS_ENV_MODE: "live"
9
+ ANTIATROPOS_STRICT_REAL: "false"
10
+
11
+ # Prometheus URL — replace WORKSPACE_ID with your AMP workspace ID
12
+ # The pod uses IRSA to authenticate; no API key needed
13
+ PROMETHEUS_URL: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/WORKSPACE_ID"
14
+
15
+ # Kubernetes executor config — empty KUBECONFIG = in-cluster config
16
+ KUBECONFIG: ""
17
+ ANTIATROPOS_K8S_NAMESPACE: "default"
18
+ ANTIATROPOS_DEPLOYMENT_PREFIX: ""
19
+ ANTIATROPOS_MIN_REPLICAS: "1"
20
+ ANTIATROPOS_MAX_REPLICAS: "20"
21
+ ANTIATROPOS_SCALE_STEP: "3"
22
+
23
+ # Workload mapping — maps simulator nodes to real K8s deployments
24
+ # Customize this to match your actual deployments
25
+ ANTIATROPOS_WORKLOAD_MAP: |
26
+ {
27
+ "node-0": {"deployment": "payments", "namespace": "prod-sre"},
28
+ "node-1": {"deployment": "checkout", "namespace": "prod-sre"},
29
+ "node-2": {"deployment": "catalog", "namespace": "prod-sre"},
30
+ "node-3": {"deployment": "cart", "namespace": "prod-sre"},
31
+ "node-4": {"deployment": "auth", "namespace": "prod-sre"}
32
+ }
33
+ ANTIATROPOS_NODE_DEPLOYMENT_MAP: "{}"
34
+
35
+ # Prometheus query timeout
36
+ ANTIATROPOS_PROM_TIMEOUT_S: "5.0"
37
+
38
+ # Metric aggregation strategy
39
+ ANTIATROPOS_METRIC_AGGREGATION: "sum"
deploy/aws/k8s-deployment.yaml ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: apps/v1
2
+ kind: Deployment
3
+ metadata:
4
+ name: antiatropos
5
+ namespace: antiatropos
6
+ labels:
7
+ app: antiatropos
8
+ spec:
9
+ replicas: 1
10
+ selector:
11
+ matchLabels:
12
+ app: antiatropos
13
+ template:
14
+ metadata:
15
+ labels:
16
+ app: antiatropos
17
+ annotations:
18
+ prometheus.io/scrape: "true"
19
+ prometheus.io/port: "8000"
20
+ prometheus.io/path: "/metrics"
21
+ spec:
22
+ serviceAccountName: antiatropos-sa
23
+ containers:
24
+ - name: antiatropos
25
+ # Replace ACCOUNT_ID with your AWS account ID
26
+ image: ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/antiatropos:latest
27
+ ports:
28
+ - containerPort: 8000
29
+ name: http
30
+ protocol: TCP
31
+ envFrom:
32
+ - configMapRef:
33
+ name: antiatropos-config
34
+ env:
35
+ # Secrets should come from AWS Secrets Manager or Kubernetes Secrets
36
+ - name: OPENAI_API_KEY
37
+ valueFrom:
38
+ secretKeyRef:
39
+ name: antiatropos-secrets
40
+ key: openai-api-key
41
+ - name: ANTIATROPOS_TASK
42
+ value: "task-1"
43
+ resources:
44
+ requests:
45
+ cpu: 500m
46
+ memory: 512Mi
47
+ limits:
48
+ cpu: 2000m
49
+ memory: 2Gi
50
+ livenessProbe:
51
+ httpGet:
52
+ path: /health
53
+ port: 8000
54
+ initialDelaySeconds: 15
55
+ periodSeconds: 30
56
+ timeoutSeconds: 5
57
+ failureThreshold: 3
58
+ readinessProbe:
59
+ httpGet:
60
+ path: /health
61
+ port: 8000
62
+ initialDelaySeconds: 10
63
+ periodSeconds: 10
64
+ timeoutSeconds: 3
65
+ failureThreshold: 3
66
+ # Allow scheduling on spot instances too
67
+ tolerations:
68
+ - key: "spot"
69
+ operator: "Equal"
70
+ value: "true"
71
+ effect: "NoSchedule"
72
+ affinity:
73
+ nodeAffinity:
74
+ preferredDuringSchedulingIgnoredDuringExecution:
75
+ - weight: 50
76
+ preference:
77
+ matchExpressions:
78
+ - key: role
79
+ operator: In
80
+ values:
81
+ - worker
82
+ - spot-worker
deploy/aws/k8s-ingress.yaml ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ALB Ingress — requires AWS Load Balancer Controller installed on the cluster
2
+ # Install it first: https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html
3
+ #
4
+ # If you prefer a simpler setup, just use the k8s-service.yaml LoadBalancer
5
+ # and skip this Ingress entirely.
6
+
7
+ apiVersion: networking.k8s.io/v1
8
+ kind: Ingress
9
+ metadata:
10
+ name: antiatropos-ingress
11
+ namespace: antiatropos
12
+ annotations:
13
+ alb.ingress.kubernetes.io/scheme: internet-facing
14
+ alb.ingress.kubernetes.io/target-type: ip
15
+ alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
16
+ # Uncomment for HTTPS (requires ACM certificate):
17
+ # alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
18
+ # alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:ACCOUNT_ID:certificate/CERT_ID
19
+ alb.ingress.kubernetes.io/healthcheck-path: /health
20
+ alb.ingress.kubernetes.io/healthcheck-interval-seconds: "30"
21
+ alb.ingress.kubernetes.io/success-codes: "200"
22
+ alb.ingress.kubernetes.io/tags: Project=AntiAtropos
23
+ spec:
24
+ ingressClassName: alb
25
+ rules:
26
+ - http:
27
+ paths:
28
+ - path: /
29
+ pathType: Prefix
30
+ backend:
31
+ service:
32
+ name: antiatropos
33
+ port:
34
+ number: 80
deploy/aws/k8s-namespace.yaml ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ apiVersion: v1
2
+ kind: Namespace
3
+ metadata:
4
+ name: antiatropos
5
+ labels:
6
+ app.kubernetes.io/name: antiatropos
7
+ app.kubernetes.io/part-of: antiatropos
deploy/aws/k8s-secret.yaml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Create this secret before deploying AntiAtropos
2
+ # Replace the value with your actual OpenAI API key (base64-encoded)
3
+ #
4
+ # To encode your key:
5
+ # echo -n 'sk-your-key-here' | base64
6
+ #
7
+ # Then apply:
8
+ # kubectl apply -f k8s-secret.yaml
9
+
10
+ apiVersion: v1
11
+ kind: Secret
12
+ metadata:
13
+ name: antiatropos-secrets
14
+ namespace: antiatropos
15
+ type: Opaque
16
+ data:
17
+ # base64-encoded OpenAI API key
18
+ openai-api-key: U0tZT1VSX09QRU5BSV9BUElfS0VZX0hFUkU= # placeholder — replace!
deploy/aws/k8s-service.yaml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: v1
2
+ kind: Service
3
+ metadata:
4
+ name: antiatropos
5
+ namespace: antiatropos
6
+ labels:
7
+ app: antiatropos
8
+ annotations:
9
+ # Required for AWS Load Balancer Controller to create an NLB
10
+ service.beta.kubernetes.io/aws-load-balancer-type: external
11
+ service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
12
+ service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
13
+ spec:
14
+ type: LoadBalancer
15
+ selector:
16
+ app: antiatropos
17
+ ports:
18
+ - name: http
19
+ port: 80
20
+ targetPort: 8000
21
+ protocol: TCP
deploy/aws/prometheus-agent-values.yaml ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Helm values for Prometheus Agent that remote-writes to Amazon Managed Prometheus
2
+ #
3
+ # Usage:
4
+ # helm install prometheus-agent prometheus-community/prometheus \
5
+ # --namespace monitoring --create-namespace \
6
+ # -f prometheus-agent-values.yaml \
7
+ # --set prometheus.prometheusSpec.remoteWrite[0].url="https://aps-workspaces.us-east-1.amazonaws.com/workspaces/WORKSPACE_ID/api/v1/remote_write"
8
+ #
9
+ # Prerequisite: Create an IAM service account for the prometheus pod
10
+ # eksctl create iamserviceaccount \
11
+ # --cluster antiatropos \
12
+ # --namespace monitoring \
13
+ # --name prometheus-sa \
14
+ # --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
15
+ # --approve
16
+
17
+ prometheus:
18
+ prometheusSpec:
19
+ # Run as agent mode (remote-write only, no local query API)
20
+ agentMode: true
21
+
22
+ # Remote write — override via --set on the command line
23
+ remoteWrite:
24
+ - url: "https://aps-workspaces.us-east-1.amazonaws.com/workspaces/REPLACE_WORKSPACE_ID/api/v1/remote_write"
25
+ sigv4:
26
+ region: us-east-1
27
+
28
+ # Scrape the AntiAtropos FastAPI /metrics endpoint
29
+ additionalScrapeConfigs:
30
+ - job_name: antiatropos
31
+ metrics_path: /metrics
32
+ scrape_interval: 15s
33
+ kubernetes_sd_configs:
34
+ - role: pod
35
+ namespaces:
36
+ names:
37
+ - antiatropos
38
+ relabel_configs:
39
+ - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
40
+ action: keep
41
+ regex: true
42
+ - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
43
+ action: replace
44
+ target_label: __metrics_path__
45
+ regex: (.+)
46
+ - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
47
+ action: replace
48
+ regex: ([^:]+)(?::\d+)?;(\d+)
49
+ replacement: $1:$2
50
+ target_label: __address__
51
+ - action: labelmap
52
+ regex: __meta_kubernetes_pod_label_(.+)
53
+ - source_labels: [__meta_kubernetes_namespace]
54
+ action: replace
55
+ target_label: namespace
56
+ - source_labels: [__meta_kubernetes_pod_name]
57
+ action: replace
58
+ target_label: pod
59
+
60
+ resources:
61
+ requests:
62
+ cpu: 100m
63
+ memory: 256Mi
64
+ limits:
65
+ cpu: 500m
66
+ memory: 512Mi
67
+
68
+ # Short retention since we're remote-writing everything to AMP
69
+ retention: 2h
70
+
71
+ # Use the IAM service account for AMP authentication
72
+ serviceAccount:
73
+ name: prometheus-sa
74
+ create: false
75
+
76
+ # Disable alertmanager (AMP handles alerting if needed)
77
+ alertmanager:
78
+ enabled: false
79
+
80
+ # Disable pushgateway
81
+ pushgateway:
82
+ enabled: false
83
+
84
+ # Disable server (we only need the agent)
85
+ server:
86
+ enabled: false