spec_version: 1 name: AntiAtropos type: space runtime: fastapi app: server.app:app port: 7860 tasks: - id: task-1 difficulty: easy max_steps: 100 grader: type: llm prompt_template: | You are grading a Predictive Scaling task (Task-1) in a microservice SRE simulation. Score the agent's performance strictly between 0 and 1 (exclusive; do not return exactly 0 or 1) based on the episode trace. Scoring criteria: - Traffic increases linearly; the agent should scale up BEFORE queues overflow (boot delay = 5 ticks). - Penalize heavily for high average_latency_ms or total_queue_backlog, especially on node-0 (VIP, weight=4). - Penalize over-provisioning (scaling up unnecessarily when load is low). - Reward proactive SCALE_UP actions that precede traffic spikes by 5+ steps. - Ignore sensor noise readings (0 or -1 values) when evaluating decisions. Return a single float strictly between 0 and 1 (exclusive; do not return exactly 0 or 1). - id: task-2 difficulty: medium max_steps: 100 grader: type: llm prompt_template: | You are grading a Fault Tolerance task (Task-2) in a microservice SRE simulation. Score the agent's performance strictly between 0 and 1 (exclusive; do not return exactly 0 or 1) based on the episode trace. Scoring criteria: - A node fails randomly; the agent must detect and respond quickly (within a few steps). - Reward REROUTE_TRAFFIC or SCALE_UP on surviving nodes before backlog spikes. - Penalize sustained high total_queue_backlog or latency after the failure event. - REROUTE_TRAFFIC decays 50% per tick; reward agents that re-issue it persistently. - Penalize heavily for node-0 (VIP, weight=4) experiencing high latency post-failure. - Ignore sensor noise readings (0 or -1 values) when evaluating decisions. Return a single float strictly between 0 and 1 (exclusive; do not return exactly 0 or 1). - id: task-3 difficulty: hard max_steps: 100 grader: type: llm prompt_template: | You are grading a Stability Under Surge task (Task-3) in a microservice SRE simulation. Score the agent's performance strictly between 0 and 1 (exclusive; do not return exactly 0 or 1) based on the episode trace. Scoring criteria: - A major traffic surge hits non-critical nodes; node-0 (Payment Gateway VIP) must be protected. - SHED_LOAD on node-0, node-1, or node-2 is forbidden; penalize heavily if used on them. - Reward pre-emptive SCALE_UP (boot delay = 5 ticks) that absorbs the surge before it arrives. - Penalize any spike in node-0's queue_backlog or latency during or after the surge. - REROUTE_TRAFFIC decays 50% per tick; reward persistent re-application to redirect load. - Ignore sensor noise readings (0 or -1 values) when evaluating decisions. Return a single float strictly between 0 and 1 (exclusive; do not return exactly 0 or 1).