Spaces:
Runtime error
Runtime error
Aaron Brown commited on
Commit ·
018fa0c
1
Parent(s): cebc7ff
Fix mermaid rendering errors
Browse files
README.md
CHANGED
|
@@ -73,8 +73,8 @@ sequenceDiagram
|
|
| 73 |
|
| 74 |
T->>E: reset()
|
| 75 |
E->>B: Manifest + mutation directive
|
| 76 |
-
B->>B: Generate structured JSON spec
|
| 77 |
-
B->>C: Render templates
|
| 78 |
C->>C: Restart affected services
|
| 79 |
E->>V: Validate range
|
| 80 |
V->>V: Phase A: LLM review
|
|
@@ -103,22 +103,22 @@ Every call to `reset()` triggers a **mutation** -- the Builder LLM swaps vulnera
|
|
| 103 |
|
| 104 |
```mermaid
|
| 105 |
flowchart LR
|
| 106 |
-
subgraph Episode 1
|
| 107 |
A1[SQLi in search form] --> F1[Flag in DB]
|
| 108 |
end
|
| 109 |
-
subgraph Episode 2
|
| 110 |
A2[Command injection<br/>in ping utility] --> F2[Flag on disk]
|
| 111 |
end
|
| 112 |
-
subgraph Episode 3
|
| 113 |
-
A3[SSRF
|
| 114 |
end
|
| 115 |
|
| 116 |
-
|
| 117 |
-
|
| 118 |
|
| 119 |
-
style
|
| 120 |
-
style
|
| 121 |
-
style
|
| 122 |
```
|
| 123 |
|
| 124 |
Agents must **generalize** across vulnerability classes, not memorize exploit chains.
|
|
@@ -154,28 +154,28 @@ All rewards are **verifiable** -- grounded in real container state, not LLM judg
|
|
| 154 |
|
| 155 |
```mermaid
|
| 156 |
flowchart TB
|
| 157 |
-
subgraph Red Rewards
|
| 158 |
RF[Flag Capture<br/>docker exec cat flag<br/>binary match]
|
| 159 |
RE[Efficiency<br/>gamma^steps]
|
| 160 |
RS[Stealth<br/>Did Blue detect?]
|
| 161 |
RH[Anti-hallucination<br/>-0.3 per fake flag]
|
| 162 |
end
|
| 163 |
|
| 164 |
-
subgraph Blue Rewards
|
| 165 |
-
BD[Detection<br/>TP rate vs Red
|
| 166 |
BP[Patch<br/>Golden path re-run fails]
|
| 167 |
BA[Availability<br/>Healthcheck fraction]
|
| 168 |
BF[False Positive<br/>-0.2 per NPC flagged]
|
| 169 |
end
|
| 170 |
|
| 171 |
-
subgraph Coupling
|
| 172 |
RS -.-|depends on| BD
|
| 173 |
BD -.-|depends on| RF
|
| 174 |
end
|
| 175 |
|
| 176 |
-
style
|
| 177 |
-
style
|
| 178 |
-
style
|
| 179 |
```
|
| 180 |
|
| 181 |
## Golden Path Validation
|
|
@@ -185,11 +185,11 @@ Every generated range passes a **7-check validation pipeline** before any agent
|
|
| 185 |
```mermaid
|
| 186 |
flowchart LR
|
| 187 |
S1[1. Services up<br/>nc -z ports] --> S2[2. Flags exist<br/>docker exec cat]
|
| 188 |
-
S2 --> S3[3. Network isolation<br/>external
|
| 189 |
S3 --> S4[4. Golden path<br/>execute exploit steps]
|
| 190 |
S4 --> S5[5. Difficulty<br/>steps within 20%]
|
| 191 |
S5 --> S6[6. No leaks<br/>grep description]
|
| 192 |
-
S6 --> S7[7. Inverse mutation<br/>revert vuln
|
| 193 |
|
| 194 |
S7 -->|All pass| PASS[VALID]
|
| 195 |
S7 -->|Any fail| FAIL[RETRY<br/>Builder gets error context]
|
|
@@ -207,25 +207,25 @@ Difficulty grows **horizontally** -- more hosts, more networks, more services. N
|
|
| 207 |
|
| 208 |
```mermaid
|
| 209 |
flowchart TD
|
| 210 |
-
subgraph Tier 1 - Basic
|
| 211 |
W1[web<br/>nginx + PHP] --> D1[db<br/>MySQL]
|
| 212 |
end
|
| 213 |
|
| 214 |
-
subgraph Tier 2 - Corporate
|
| 215 |
W2[web] --> D2[db]
|
| 216 |
W2 --> M2[mail<br/>SMTP]
|
| 217 |
FW2[firewall<br/>iptables] --> W2
|
| 218 |
end
|
| 219 |
|
| 220 |
-
subgraph Tier 3 - Enterprise
|
| 221 |
W3[web] --> D3[db]
|
| 222 |
-
W3 --> DC3[DC<br/>LDAP
|
| 223 |
FS3[files<br/>SMB] --> DC3
|
| 224 |
end
|
| 225 |
|
| 226 |
-
style
|
| 227 |
-
style
|
| 228 |
-
style
|
| 229 |
```
|
| 230 |
|
| 231 |
| Tier | Hosts | Networks | Services | Golden Steps |
|
|
@@ -240,37 +240,37 @@ flowchart TD
|
|
| 240 |
|
| 241 |
```mermaid
|
| 242 |
sequenceDiagram
|
| 243 |
-
participant Red as Red Agent
|
| 244 |
-
participant Env as Range
|
| 245 |
-
participant Blue as Blue Agent
|
| 246 |
|
| 247 |
-
Note over Red,Blue: Episode begins -
|
| 248 |
|
| 249 |
Red->>Env: nmap -sV web
|
| 250 |
Env-->>Red: 80/tcp open http nginx
|
| 251 |
Note right of Env: Action logged
|
| 252 |
|
| 253 |
Blue->>Env: tail_log access.log
|
| 254 |
-
Env-->>Blue:
|
| 255 |
-
Blue->>Env: submit_finding
|
| 256 |
Note left of Blue: True positive!
|
| 257 |
|
| 258 |
-
Red->>Env: curl
|
| 259 |
Env-->>Red: Database results + flag
|
| 260 |
Note right of Env: Action logged
|
| 261 |
|
| 262 |
-
Red->>Env: submit_flag FLAG
|
| 263 |
Env-->>Red: Correct! reward=1.0
|
| 264 |
|
| 265 |
-
Blue->>Env: grep_log
|
| 266 |
Env-->>Blue: SQLi pattern found
|
| 267 |
-
Blue->>Env: patch search.php
|
| 268 |
Env-->>Blue: Patch applied
|
| 269 |
|
| 270 |
Note over Env: Re-run golden path exploit
|
| 271 |
-
Note over Env: Exploit FAILS
|
| 272 |
|
| 273 |
-
Note over Red,Blue: Red stealth
|
| 274 |
```
|
| 275 |
|
| 276 |
## Project Structure
|
|
|
|
| 73 |
|
| 74 |
T->>E: reset()
|
| 75 |
E->>B: Manifest + mutation directive
|
| 76 |
+
B->>B: Generate structured JSON spec
|
| 77 |
+
B->>C: Render templates, hot-swap configs
|
| 78 |
C->>C: Restart affected services
|
| 79 |
E->>V: Validate range
|
| 80 |
V->>V: Phase A: LLM review
|
|
|
|
| 103 |
|
| 104 |
```mermaid
|
| 105 |
flowchart LR
|
| 106 |
+
subgraph ep1 [Episode 1]
|
| 107 |
A1[SQLi in search form] --> F1[Flag in DB]
|
| 108 |
end
|
| 109 |
+
subgraph ep2 [Episode 2]
|
| 110 |
A2[Command injection<br/>in ping utility] --> F2[Flag on disk]
|
| 111 |
end
|
| 112 |
+
subgraph ep3 [Episode 3]
|
| 113 |
+
A3[SSRF to internal SQLi] --> F3[Flag in internal DB]
|
| 114 |
end
|
| 115 |
|
| 116 |
+
ep1 -->|reset| ep2
|
| 117 |
+
ep2 -->|reset| ep3
|
| 118 |
|
| 119 |
+
style ep1 fill:#ff6b6b22,stroke:#ff6b6b
|
| 120 |
+
style ep2 fill:#ffd93d22,stroke:#ffd93d
|
| 121 |
+
style ep3 fill:#6bcb7722,stroke:#6bcb77
|
| 122 |
```
|
| 123 |
|
| 124 |
Agents must **generalize** across vulnerability classes, not memorize exploit chains.
|
|
|
|
| 154 |
|
| 155 |
```mermaid
|
| 156 |
flowchart TB
|
| 157 |
+
subgraph red [Red Rewards]
|
| 158 |
RF[Flag Capture<br/>docker exec cat flag<br/>binary match]
|
| 159 |
RE[Efficiency<br/>gamma^steps]
|
| 160 |
RS[Stealth<br/>Did Blue detect?]
|
| 161 |
RH[Anti-hallucination<br/>-0.3 per fake flag]
|
| 162 |
end
|
| 163 |
|
| 164 |
+
subgraph blue [Blue Rewards]
|
| 165 |
+
BD[Detection<br/>TP rate vs Red log]
|
| 166 |
BP[Patch<br/>Golden path re-run fails]
|
| 167 |
BA[Availability<br/>Healthcheck fraction]
|
| 168 |
BF[False Positive<br/>-0.2 per NPC flagged]
|
| 169 |
end
|
| 170 |
|
| 171 |
+
subgraph coupling [Coupling]
|
| 172 |
RS -.-|depends on| BD
|
| 173 |
BD -.-|depends on| RF
|
| 174 |
end
|
| 175 |
|
| 176 |
+
style red fill:#ff6b6b11,stroke:#ff6b6b
|
| 177 |
+
style blue fill:#4a9eff11,stroke:#4a9eff
|
| 178 |
+
style coupling fill:#ffd93d11,stroke:#ffd93d,stroke-dasharray: 5 5
|
| 179 |
```
|
| 180 |
|
| 181 |
## Golden Path Validation
|
|
|
|
| 185 |
```mermaid
|
| 186 |
flowchart LR
|
| 187 |
S1[1. Services up<br/>nc -z ports] --> S2[2. Flags exist<br/>docker exec cat]
|
| 188 |
+
S2 --> S3[3. Network isolation<br/>external blocked from internal]
|
| 189 |
S3 --> S4[4. Golden path<br/>execute exploit steps]
|
| 190 |
S4 --> S5[5. Difficulty<br/>steps within 20%]
|
| 191 |
S5 --> S6[6. No leaks<br/>grep description]
|
| 192 |
+
S6 --> S7[7. Inverse mutation<br/>revert vuln, step fails]
|
| 193 |
|
| 194 |
S7 -->|All pass| PASS[VALID]
|
| 195 |
S7 -->|Any fail| FAIL[RETRY<br/>Builder gets error context]
|
|
|
|
| 207 |
|
| 208 |
```mermaid
|
| 209 |
flowchart TD
|
| 210 |
+
subgraph t1 [Tier 1 - Basic]
|
| 211 |
W1[web<br/>nginx + PHP] --> D1[db<br/>MySQL]
|
| 212 |
end
|
| 213 |
|
| 214 |
+
subgraph t2 [Tier 2 - Corporate]
|
| 215 |
W2[web] --> D2[db]
|
| 216 |
W2 --> M2[mail<br/>SMTP]
|
| 217 |
FW2[firewall<br/>iptables] --> W2
|
| 218 |
end
|
| 219 |
|
| 220 |
+
subgraph t3 [Tier 3 - Enterprise]
|
| 221 |
W3[web] --> D3[db]
|
| 222 |
+
W3 --> DC3[DC<br/>LDAP + Kerberos]
|
| 223 |
FS3[files<br/>SMB] --> DC3
|
| 224 |
end
|
| 225 |
|
| 226 |
+
style t1 fill:#6bcb7722,stroke:#6bcb77
|
| 227 |
+
style t2 fill:#ffd93d22,stroke:#ffd93d
|
| 228 |
+
style t3 fill:#ff6b6b22,stroke:#ff6b6b
|
| 229 |
```
|
| 230 |
|
| 231 |
| Tier | Hosts | Networks | Services | Golden Steps |
|
|
|
|
| 240 |
|
| 241 |
```mermaid
|
| 242 |
sequenceDiagram
|
| 243 |
+
participant Red as Red Agent
|
| 244 |
+
participant Env as Range
|
| 245 |
+
participant Blue as Blue Agent
|
| 246 |
|
| 247 |
+
Note over Red,Blue: Episode begins - Builder mutated range
|
| 248 |
|
| 249 |
Red->>Env: nmap -sV web
|
| 250 |
Env-->>Red: 80/tcp open http nginx
|
| 251 |
Note right of Env: Action logged
|
| 252 |
|
| 253 |
Blue->>Env: tail_log access.log
|
| 254 |
+
Env-->>Blue: NPC traffic + Red scan mixed
|
| 255 |
+
Blue->>Env: submit_finding port scan detected
|
| 256 |
Note left of Blue: True positive!
|
| 257 |
|
| 258 |
+
Red->>Env: curl web/search?q= OR 1=1
|
| 259 |
Env-->>Red: Database results + flag
|
| 260 |
Note right of Env: Action logged
|
| 261 |
|
| 262 |
+
Red->>Env: submit_flag FLAG abc123
|
| 263 |
Env-->>Red: Correct! reward=1.0
|
| 264 |
|
| 265 |
+
Blue->>Env: grep_log UNION SELECT OR 1
|
| 266 |
Env-->>Blue: SQLi pattern found
|
| 267 |
+
Blue->>Env: patch search.php parameterize query
|
| 268 |
Env-->>Blue: Patch applied
|
| 269 |
|
| 270 |
Note over Env: Re-run golden path exploit
|
| 271 |
+
Note over Env: Exploit FAILS, patch valid
|
| 272 |
|
| 273 |
+
Note over Red,Blue: Red stealth LOW, Blue detection HIGH
|
| 274 |
```
|
| 275 |
|
| 276 |
## Project Structure
|