# Architecture ## System Diagram ```mermaid flowchart LR A[TrafficGenerator] --> E[FirewallEnvironment] B[ThreatEngine] --> E E --> C[RewardEngine] E --> D[Graders] E --> F[FastAPI App] F --> G[Client / Agent] G --> F ``` ## Runtime Data Flow ```mermaid sequenceDiagram participant Agent participant Env as FirewallEnvironment participant TG as TrafficGenerator participant TH as ThreatEngine participant RW as RewardEngine Agent->>Env: reset(task, seed) Env->>TG: generate_benign_sessions Env->>TH: maybe_spawn_attacker + generate_attack_sessions Env-->>Agent: state Agent->>Env: step(action_map) or step_single(action) Env->>RW: reward(action, is_malicious, budget_remaining, phase) Env-->>Agent: reward, done, info, next state ``` ## Core Components | Component | Responsibility | Key Outputs | |---|---|---| | `firewall_environment.py` | Episode orchestration, budget tracking, session lifecycle, metrics | `state()`, `step()`, `step_single()`, tool APIs | | `traffic_generator.py` | Benign + malicious metadata generation, normalization, scenario shaping | 22-dim normalized observation vectors | | `threat_engine.py` | Multi-attacker orchestration, adaptation, lifecycle and outcomes | Attack sessions, attacker status map | | `reward_engine.py` | Multi-objective reward calculation and action-cost accounting | scalar reward + component breakdown | | `graders.py` | Deterministic task scoring and pass/fail gating | score in `[0,1]`, pass constraints | | `baseline/evaluate.py` | Policy benchmarking across tasks | JSON report for random/heuristic/block/allow | ## Environment Modes - **Multi-session mode**: `step(action_map)` handles a variable batch of sessions per tick. - **Single-session mode**: `step_single(action)` exposes one decision at a time with `Discrete(6)` semantics. - **Inspect workflow**: inspect is first-stage evidence collection; follow-up action resolves the session.