iamsentinel / openenv.yaml
Nampally Tejasri
Initial OpenEnv submission deploy
ca83593
env_id: iamsentinel-v1
name: IAMSentinel
version: "1.0.0"
description: >
IAMSentinel simulates a cloud Identity & Access Management (IAM) control plane
where an AI agent acts as a security analyst. The agent must detect misconfigurations,
trace privilege escalation paths, and attribute live attacks β€” all tasks that real
cloud security engineers perform daily. The environment generates realistic IAM graphs
with users, roles, policies, trust relationships, and CloudTrail-style audit logs,
with deterministically-injected vulnerabilities for reproducible evaluation.
tags:
- openenv
- security
- cloud
- iam
- cybersecurity
- reinforcement-learning
author: IAMSentinel Team
license: MIT
tasks:
- id: task1
name: Misconfiguration Scanner
difficulty: easy
max_steps: 30
description: >
Identify all IAM security misconfigurations in the environment.
Target finding types: wildcard_policy, mfa_disabled, stale_admin_role, exposed_trust_policy.
grader:
type: programmatic
score_range: [0.0, 1.0]
components:
- name: recall
weight: 0.70
description: Fraction of real vulnerability types correctly identified
- name: precision
weight: 0.30
description: Fraction of flagged findings that are genuine vulnerabilities
bonus: 0.05 per correctly attributed finding (up to +0.20)
expected_score_range: [0.50, 0.75]
- id: task2
name: Privilege Escalation Path Detection
difficulty: medium
max_steps: 40
description: >
Identify the complete privilege escalation chain from a low-privilege user to admin.
The chain involves iam:PassRole β†’ lambda:CreateFunction β†’ iam:CreateUser.
Use trace_escalation_path and flag_finding to document the full path.
grader:
type: programmatic
score_range: [0.0, 1.0]
components:
- name: path_found
weight: 0.40
description: Binary β€” agent flagged a privilege_escalation_path finding
- name: path_completeness
weight: 0.40
description: Jaccard similarity of agent's path ARNs vs ground truth
- name: step_detail
weight: 0.20
description: Fraction of key escalation techniques identified (PassRole, CreateFunction, CreateUser)
expected_score_range: [0.30, 0.55]
- id: task3
name: Live Attack Attribution
difficulty: hard
max_steps: 60
description: >
Investigate a CloudTrail audit log stream containing a live multi-step attack.
A compromised low-privilege user performs privilege escalation via Lambda PassRole
to create a backdoor admin user. Identify the compromised principal, attack technique
(MITRE ATT&CK), lateral movement path, and recommend containment actions.
Finish with attribute_attack to submit your attribution.
grader:
type: programmatic
score_range: [0.0, 1.0]
components:
- name: actor_correct
weight: 0.30
description: Correctly identified the compromised principal ARN
- name: technique_score
weight: 0.25
description: MITRE ATT&CK technique recall + technique description quality
- name: lateral_movement
weight: 0.25
description: Jaccard similarity of lateral movement path vs ground truth
- name: containment
weight: 0.20
description: Quality of recommended containment actions
expected_score_range: [0.15, 0.40]
observation_space:
type: structured
description: >
Pydantic Observation model containing: task context, query results
(principals, policies, audit events, escalation paths), accumulated findings,
episode state (step number, budget remaining, done flag).
fields:
- name: task_id
type: str
- name: task_description
type: str
- name: account_id
type: str
- name: step_number
type: int
- name: max_steps
type: int
- name: budget_remaining
type: int
- name: principals
type: "list[PrincipalSummary] | null"
- name: policies
type: "list[PolicySummary] | null"
- name: audit_events
type: "list[AuditEvent] | null"
- name: escalation_paths
type: "list[EscalationPath] | null"
- name: role_trust_policy
type: "dict | null"
- name: findings
type: "list[Finding]"
- name: hints
type: "list[str]"
- name: done
type: bool
- name: task_completed
type: bool
action_space:
type: structured
description: >
Discriminated union of typed action models. Agent selects one action per step.
actions:
- name: list_principals
description: List all users and/or roles in the account
- name: list_policies
description: List all policies, optionally filtered by principal
- name: get_policy
description: Get detailed policy statements by ARN
- name: get_principal
description: Get detailed principal info by ARN
- name: get_role_trust
description: Get the trust policy for a role (who can assume it)
- name: query_audit_log
description: Filter and query CloudTrail-style audit log events
- name: trace_escalation_path
description: Graph-traverse IAM graph to find privilege escalation paths
- name: flag_finding
description: Record a security finding with type, severity, description, evidence
- name: remediate
description: Apply a remediation action to a resource
- name: attribute_attack
description: Submit full attack attribution (T3 only β€” ends episode)
reward_function:
type: shaped
range: [-1.0, 1.0]
description: >
Provides signal over the full trajectory. Step-level rewards for productive
actions and new correct findings. Episode-end reward from grader score.
Penalties for false positives, duplicate findings, and destructive remediations.
components:
- name: step_reward
description: +0.01 for productive queries, +0.15 for new correct findings, βˆ’0.05 for false positives
- name: final_score
description: Episode-end grader score (0.0–1.0) converted to reward
- name: irrelevant_action_penalty
description: βˆ’0.02 per irrelevant/unknown action (max βˆ’0.20)
- name: destructive_action_penalty
description: βˆ’0.10 per remediation targeting innocent principal (max βˆ’0.30)
environment_config:
seeds_tested: [42, 123, 456, 789, 1337]
complexity_levels: [easy, medium, hard]
default_complexity: medium
deterministic: true
requires_internet: false
requires_gpu: false
dependencies:
python: ">=3.11"
packages:
- pydantic>=2.0
- networkx>=3.0
- openai>=1.0
baseline:
model: gpt-4o-mini
seed: 42
complexity: medium
scores:
task1: 0.58
task2: 0.42
task3: 0.28
average: 0.43