Pranavkk
/

AntiAtropos

Model card Files Files and versions

AntiAtropos / openenv.yaml

Divyansh Agrawal

updated grader

fedca64 about 2 months ago

history blame contribute delete

3.09 kB

	spec_version: 1
	name: AntiAtropos
	type: space
	runtime: fastapi
	app: server.app:app
	port: 7860

	tasks:
	- id: task-1
	difficulty: easy
	max_steps: 100
	grader:
	type: llm
	prompt_template: \|
	You are grading a Predictive Scaling task (Task-1) in a microservice SRE simulation.
	Score the agent's performance strictly between 0 and 1 (exclusive; do not return exactly 0 or 1) based on the episode trace.

	Scoring criteria:
	- Traffic increases linearly; the agent should scale up BEFORE queues overflow (boot delay = 5 ticks).
	- Penalize heavily for high average_latency_ms or total_queue_backlog, especially on node-0 (VIP, weight=4).
	- Penalize over-provisioning (scaling up unnecessarily when load is low).
	- Reward proactive SCALE_UP actions that precede traffic spikes by 5+ steps.
	- Ignore sensor noise readings (0 or -1 values) when evaluating decisions.

	Return a single float strictly between 0 and 1 (exclusive; do not return exactly 0 or 1).

	- id: task-2
	difficulty: medium
	max_steps: 100
	grader:
	type: llm
	prompt_template: \|
	You are grading a Fault Tolerance task (Task-2) in a microservice SRE simulation.
	Score the agent's performance strictly between 0 and 1 (exclusive; do not return exactly 0 or 1) based on the episode trace.

	Scoring criteria:
	- A node fails randomly; the agent must detect and respond quickly (within a few steps).
	- Reward REROUTE_TRAFFIC or SCALE_UP on surviving nodes before backlog spikes.
	- Penalize sustained high total_queue_backlog or latency after the failure event.
	- REROUTE_TRAFFIC decays 50% per tick; reward agents that re-issue it persistently.
	- Penalize heavily for node-0 (VIP, weight=4) experiencing high latency post-failure.
	- Ignore sensor noise readings (0 or -1 values) when evaluating decisions.

	Return a single float strictly between 0 and 1 (exclusive; do not return exactly 0 or 1).

	- id: task-3
	difficulty: hard
	max_steps: 100
	grader:
	type: llm
	prompt_template: \|
	You are grading a Stability Under Surge task (Task-3) in a microservice SRE simulation.
	Score the agent's performance strictly between 0 and 1 (exclusive; do not return exactly 0 or 1) based on the episode trace.

	Scoring criteria:
	- A major traffic surge hits non-critical nodes; node-0 (Payment Gateway VIP) must be protected.
	- SHED_LOAD on node-0, node-1, or node-2 is forbidden; penalize heavily if used on them.
	- Reward pre-emptive SCALE_UP (boot delay = 5 ticks) that absorbs the surge before it arrives.
	- Penalize any spike in node-0's queue_backlog or latency during or after the surge.
	- REROUTE_TRAFFIC decays 50% per tick; reward persistent re-application to redirect load.
	- Ignore sensor noise readings (0 or -1 values) when evaluating decisions.

	Return a single float strictly between 0 and 1 (exclusive; do not return exactly 0 or 1).