| name: opensoc |
| version: "1.0.0" |
| description: > |
| OpenSOC is a self-play OpenEnv environment for training Security Operations Center |
| (SOC) triage agents. An attacker LLM crafts synthetic security incidents from a |
| constrained schema; a defender LLM (the trainee) reads the resulting alert and log |
| window and decides how to triage it. Ground-truth triage labels are computed by a |
| deterministic schema-side verifier — never read from attacker text — so the |
| environment is fully RLVR-compatible and resistant to self-play reward hacking. |
| |
| author: opensoc |
| tags: |
| - openenv |
| - cybersecurity |
| - soc |
| - self-play |
| - multi-agent |
| - rlvr |
| - threat-detection |
|
|
| |
| |
| |
| tasks: |
| - id: stage1_basic |
| difficulty: easy |
| description: > |
| Single-event incidents drawn from a small set of unambiguous templates |
| (one obviously benign, one obviously malicious per category). Used to |
| bootstrap defender format learning. |
| max_steps: 2 |
| reward_range: [-1.5, 1.1] |
|
|
| - id: stage2_multi |
| difficulty: medium |
| description: > |
| Multi-event incidents where the malicious signal is spread across a |
| short log window. Tests temporal reasoning and rationale citation. |
| max_steps: 2 |
| reward_range: [-1.5, 1.1] |
|
|
| - id: stage3_mixed |
| difficulty: hard |
| description: > |
| Incidents where benign events are interleaved with malicious ones, and |
| some benign templates closely mimic malicious patterns. Tests |
| false-positive suppression. |
| max_steps: 2 |
| reward_range: [-1.5, 1.1] |
|
|
| - id: stage4_adversarial |
| difficulty: adversarial |
| description: > |
| Attacker-controlled distribution (when run in self-play) or |
| held-out adversarial set (when run with a fixed dataset). Used as the |
| eval benchmark for trained agents. |
| max_steps: 2 |
| reward_range: [-1.5, 1.1] |
|
|
| observation_space: |
| type: object |
| fields: |
| role: |
| type: string |
| enum: [attacker, defender] |
| description: Which side is expected to act on this turn. |
| alert: |
| type: object |
| description: SIEM-style alert summary visible to the defender. |
| fields: |
| alert_id: { type: string } |
| category: { type: string } |
| severity: { type: string, enum: [info, low, medium, high, critical] } |
| summary: { type: string } |
| host: { type: string } |
| user: { type: string } |
| log_window: |
| type: array |
| description: > |
| Ordered list of log events surrounding the alert. Each event is a dict |
| with log_id, timestamp, source, event_type, and a fields object. |
| attacker_brief: |
| type: object |
| description: > |
| Only populated on the attacker turn; tells the attacker the target |
| ground-truth label slot it should produce an incident for. |
| fields: |
| target_label: { type: string, enum: [dismiss, monitor, quarantine_host, block_ip, escalate] } |
| difficulty: { type: string, enum: [easy, medium, hard, adversarial] } |
| category_hint: { type: string } |
| step: { type: integer } |
| max_steps: { type: integer } |
| last_action_feedback: { type: string } |
| done: { type: boolean } |
|
|
| action_space: |
| type: object |
| description: > |
| Exactly one of `craft_incident` (attacker turn) or `submit_triage` |
| (defender turn) should be non-null per /step call. Tool names are |
| deliberately non-reserved (no reset/step/state/close). |
| fields: |
| craft_incident: |
| type: object |
| description: > |
| Attacker action. The attacker proposes incident parameters; the env |
| validates them and computes the ground-truth label deterministically |
| from the params (NOT from any text the attacker writes). |
| fields: |
| target_label: { type: string, enum: [dismiss, monitor, quarantine_host, block_ip, escalate] } |
| category: { type: string } |
| events: |
| type: array |
| description: List of structured events to materialize. |
| items: |
| type: object |
| fields: |
| event_type: { type: string } |
| fields: { type: object } |
| narrative: |
| type: string |
| description: Free-text scratchpad; ignored by the verifier. |
| submit_triage: |
| type: object |
| description: Defender action; choose one triage action and cite a log id. |
| fields: |
| action: { type: string, enum: [dismiss, monitor, quarantine_host, block_ip, escalate] } |
| cited_log_id: { type: string, description: "ID of the log event that drove the decision." } |
| rationale: { type: string } |
|
|
| reward: |
| type: float |
| range: [-1.5, 1.1] |
| description: > |
| Per-turn reward. |
| |
| Defender (per submit_triage): |
| +1.0 if action matches the env-computed ground-truth label |
| -1.0 if action == 'dismiss' but ground truth is malicious (missed-malicious) |
| -0.3 if action ∈ {block_ip, quarantine_host} and ground truth is benign (over-react) |
| -0.05 if action == 'escalate' and ground truth would have been a cheaper action |
| +0.1 bonus if cited_log_id matches the schema-flagged triggering event id |
|
|
| Attacker (per craft_incident, scored after the defender turn): |
| +1.0 iff defender misclassified AND incident passed the plausibility check |
| -0.5 if the schema validator rejected the params |
| +0.2 novelty bonus on rare feature combos within the rolling batch |
| 0.0 if implausible (gibberish penalty) |
|
|
| endpoints: |
| reset: POST /reset |
| step: POST /step |
| state: GET /state |
| grade: POST /grade |
| tasks: GET /tasks |
| health: GET /health |
| demo: GET /demo |
|
|
| docker: |
| port: 7860 |
|
|
| baseline_scores: |
| stage1_basic: 0.65 |
| stage2_multi: 0.45 |
| stage3_mixed: 0.30 |
| stage4_adversarial: 0.15 |
|
|