Spaces:
Sleeping
Sleeping
Update openenv.yaml
Browse files- openenv.yaml +3 -168
openenv.yaml
CHANGED
|
@@ -2,171 +2,6 @@ name: api-gateway-defender
|
|
| 2 |
version: "1.0.0"
|
| 3 |
description: >
|
| 4 |
A simulated HTTP traffic monitoring environment where an AI agent acts as
|
| 5 |
-
a Site Reliability Engineer defending a web backend. The agent inspects
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
Models a real production incident domain: rate-limiting, WAF rule authoring,
|
| 10 |
-
and pattern-based traffic filtering — skills that are highly valued in DevOps,
|
| 11 |
-
SRE, and cybersecurity engineering.
|
| 12 |
-
|
| 13 |
-
author: "API Gateway Defender Team"
|
| 14 |
-
license: "Apache-2.0"
|
| 15 |
-
|
| 16 |
-
tags:
|
| 17 |
-
- openenv
|
| 18 |
-
- cybersecurity
|
| 19 |
-
- web-security
|
| 20 |
-
- sre
|
| 21 |
-
- real-world
|
| 22 |
-
- devops
|
| 23 |
-
- rate-limiting
|
| 24 |
-
- waf
|
| 25 |
-
|
| 26 |
-
tasks:
|
| 27 |
-
- id: easy
|
| 28 |
-
name: "Volumetric IP Flood Defense"
|
| 29 |
-
difficulty: easy
|
| 30 |
-
max_score: 1.0
|
| 31 |
-
description: >
|
| 32 |
-
A single IP address is flooding the /login endpoint with POST requests.
|
| 33 |
-
The agent must identify the malicious IP from traffic logs and block it
|
| 34 |
-
(or apply a rate limit). Tests pattern recognition under high-volume noise.
|
| 35 |
-
success_criteria: >
|
| 36 |
-
block_ip or add_rate_limit action targeting the flooding IP address,
|
| 37 |
-
achieving ≥0.95 detection rate with <10% false positive rate.
|
| 38 |
-
|
| 39 |
-
- id: medium
|
| 40 |
-
name: "Scraper Bot Detection"
|
| 41 |
-
difficulty: medium
|
| 42 |
-
max_score: 1.0
|
| 43 |
-
description: >
|
| 44 |
-
A scraper bot harvests the /api/data endpoint from 50 different IP addresses,
|
| 45 |
-
rotating them to evade IP-based blocks. All malicious requests share one
|
| 46 |
-
identical unusual User-Agent string. The agent must identify and block it.
|
| 47 |
-
success_criteria: >
|
| 48 |
-
block_user_agent action with the exact malicious User-Agent string,
|
| 49 |
-
achieving ≥0.95 detection rate with <10% false positive rate.
|
| 50 |
-
|
| 51 |
-
- id: hard
|
| 52 |
-
name: "SQL Injection Middleware Defense"
|
| 53 |
-
difficulty: hard
|
| 54 |
-
max_score: 1.0
|
| 55 |
-
description: >
|
| 56 |
-
An attacker probes the database via SQL injection. They rotate IP addresses
|
| 57 |
-
AND User-Agents on every request to evade simple rules. Every malicious
|
| 58 |
-
request contains a SQL injection payload in the query string. The agent
|
| 59 |
-
must write a regex-based middleware rule to detect and block all payloads.
|
| 60 |
-
success_criteria: >
|
| 61 |
-
write_custom_middleware action with a regex that matches 'UNION SELECT'
|
| 62 |
-
pattern (case-insensitive), achieving ≥0.95 detection rate with <10% FP rate.
|
| 63 |
-
|
| 64 |
-
observation_space:
|
| 65 |
-
type: structured
|
| 66 |
-
description: "Snapshot of recent HTTP traffic and active gateway configuration."
|
| 67 |
-
fields:
|
| 68 |
-
- name: recent_requests
|
| 69 |
-
type: "list[dict]"
|
| 70 |
-
description: "Last 100 HTTP requests. Each has: ip, method, path, user_agent, query_string, status_code."
|
| 71 |
-
- name: active_rules
|
| 72 |
-
type: "list[str]"
|
| 73 |
-
description: "Human-readable list of firewall rules currently active."
|
| 74 |
-
- name: current_task
|
| 75 |
-
type: string
|
| 76 |
-
description: "Task ID: 'easy', 'medium', or 'hard'."
|
| 77 |
-
- name: task_description
|
| 78 |
-
type: string
|
| 79 |
-
description: "Natural language description of the attack to defend against."
|
| 80 |
-
- name: step_count
|
| 81 |
-
type: integer
|
| 82 |
-
description: "Number of rules submitted in the current episode."
|
| 83 |
-
- name: hint
|
| 84 |
-
type: string
|
| 85 |
-
description: "Statistical hint about suspicious patterns in the visible traffic window."
|
| 86 |
-
|
| 87 |
-
action_space:
|
| 88 |
-
type: discrete_parameterized
|
| 89 |
-
description: "Submit one firewall rule to the gateway middleware."
|
| 90 |
-
fields:
|
| 91 |
-
- name: action_type
|
| 92 |
-
type: string
|
| 93 |
-
required: true
|
| 94 |
-
choices:
|
| 95 |
-
- block_ip
|
| 96 |
-
- add_rate_limit
|
| 97 |
-
- block_user_agent
|
| 98 |
-
- write_custom_middleware
|
| 99 |
-
description: "Which type of rule to apply."
|
| 100 |
-
- name: target_ip
|
| 101 |
-
type: string
|
| 102 |
-
required: false
|
| 103 |
-
description: "IP address. Required for block_ip and add_rate_limit."
|
| 104 |
-
- name: target_user_agent
|
| 105 |
-
type: string
|
| 106 |
-
required: false
|
| 107 |
-
description: "Exact User-Agent string. Required for block_user_agent."
|
| 108 |
-
- name: regex_pattern
|
| 109 |
-
type: string
|
| 110 |
-
required: false
|
| 111 |
-
description: "Python regex matched against '{path}?{query_string}'. Required for write_custom_middleware."
|
| 112 |
-
- name: max_requests
|
| 113 |
-
type: integer
|
| 114 |
-
required: false
|
| 115 |
-
default: 60
|
| 116 |
-
description: "Requests per minute cap. Used with add_rate_limit."
|
| 117 |
-
|
| 118 |
-
reward:
|
| 119 |
-
range: [0.0, 1.0]
|
| 120 |
-
type: continuous
|
| 121 |
-
formula: >
|
| 122 |
-
detection_rate = malicious_blocked / total_malicious
|
| 123 |
-
false_positive_rate = legitimate_blocked / total_legitimate
|
| 124 |
-
if false_positive_rate > 0.10:
|
| 125 |
-
score = 0.0
|
| 126 |
-
else:
|
| 127 |
-
score = clamp(detection_rate - false_positive_rate * 5.0, 0.0, 1.0)
|
| 128 |
-
description: >
|
| 129 |
-
Rewards accurate detection of malicious traffic. Penalises false positives
|
| 130 |
-
(blocking legitimate users) with a 5x multiplier. Zeroed entirely if
|
| 131 |
-
false positive rate exceeds 10% — models real operational constraints
|
| 132 |
-
where blocking paying customers is unacceptable.
|
| 133 |
-
|
| 134 |
-
episode:
|
| 135 |
-
max_steps: 5
|
| 136 |
-
termination_conditions:
|
| 137 |
-
- "score >= 0.95 (success)"
|
| 138 |
-
- "step_count >= 5 (step limit)"
|
| 139 |
-
reset_required: true
|
| 140 |
-
|
| 141 |
-
evaluation:
|
| 142 |
-
grader_type: programmatic
|
| 143 |
-
deterministic: true
|
| 144 |
-
train_seed: 42
|
| 145 |
-
test_seed: 137
|
| 146 |
-
description: >
|
| 147 |
-
Rules are graded against a hidden test traffic set (seed 137) distinct from
|
| 148 |
-
the visible training sample (seed 42). This prevents agents from overfitting
|
| 149 |
-
to specific IPs/UAs in the observation window.
|
| 150 |
-
|
| 151 |
-
api:
|
| 152 |
-
framework: FastAPI
|
| 153 |
-
port: 7860
|
| 154 |
-
endpoints:
|
| 155 |
-
- "POST /reset"
|
| 156 |
-
- "POST /step"
|
| 157 |
-
- "GET /state"
|
| 158 |
-
- "GET /tasks"
|
| 159 |
-
- "GET /grader"
|
| 160 |
-
- "POST /baseline"
|
| 161 |
-
- "GET /health"
|
| 162 |
-
|
| 163 |
-
baseline:
|
| 164 |
-
agent_type: heuristic
|
| 165 |
-
scores:
|
| 166 |
-
easy: 1.0
|
| 167 |
-
medium: 1.0
|
| 168 |
-
hard: 1.0
|
| 169 |
-
note: >
|
| 170 |
-
Heuristic agent reads the visible traffic sample, identifies the attack
|
| 171 |
-
pattern statistically, and applies the optimal rule. Scores are fully
|
| 172 |
-
reproducible with fixed seeds.
|
|
|
|
| 2 |
version: "1.0.0"
|
| 3 |
description: >
|
| 4 |
A simulated HTTP traffic monitoring environment where an AI agent acts as
|
| 5 |
+
a Site Reliability Engineer defending a web backend. The agent inspects
|
| 6 |
+
incoming HTTP requests and must configure middleware firewall rules to block
|
| 7 |
+
malicious traffic while preserving legitimate user requests.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|