flowchart LR
A[TrafficGenerator] --> E[FirewallEnvironment]
B[ThreatEngine] --> E
E --> C[RewardEngine]
E --> D[Graders]
E --> F[FastAPI App]
F --> G[Client / Agent]
G --> F
Runtime Data Flow
sequenceDiagram
participant Agent
participant Env as FirewallEnvironment
participant TG as TrafficGenerator
participant TH as ThreatEngine
participant RW as RewardEngine
Agent->>Env: reset(task, seed)
Env->>TG: generate_benign_sessions
Env->>TH: maybe_spawn_attacker + generate_attack_sessions
Env-->>Agent: state
Agent->>Env: step(action_map) or step_single(action)
Env->>RW: reward(action, is_malicious, budget_remaining, phase)
Env-->>Agent: reward, done, info, next state