meta_ai_hackathon / docs /ARCHITECTURE.md
GOOD CAT
Final submission prep
ec8c511
# Architecture
## System Diagram
```mermaid
flowchart LR
A[TrafficGenerator] --> E[FirewallEnvironment]
B[ThreatEngine] --> E
E --> C[RewardEngine]
E --> D[Graders]
E --> F[FastAPI App]
F --> G[Client / Agent]
G --> F
```
## Runtime Data Flow
```mermaid
sequenceDiagram
participant Agent
participant Env as FirewallEnvironment
participant TG as TrafficGenerator
participant TH as ThreatEngine
participant RW as RewardEngine
Agent->>Env: reset(task, seed)
Env->>TG: generate_benign_sessions
Env->>TH: maybe_spawn_attacker + generate_attack_sessions
Env-->>Agent: state
Agent->>Env: step(action_map) or step_single(action)
Env->>RW: reward(action, is_malicious, budget_remaining, phase)
Env-->>Agent: reward, done, info, next state
```
## Core Components
| Component | Responsibility | Key Outputs |
|---|---|---|
| `firewall_environment.py` | Episode orchestration, budget tracking, session lifecycle, metrics | `state()`, `step()`, `step_single()`, tool APIs |
| `traffic_generator.py` | Benign + malicious metadata generation, normalization, scenario shaping | 22-dim normalized observation vectors |
| `threat_engine.py` | Multi-attacker orchestration, adaptation, lifecycle and outcomes | Attack sessions, attacker status map |
| `reward_engine.py` | Multi-objective reward calculation and action-cost accounting | scalar reward + component breakdown |
| `graders.py` | Deterministic task scoring and pass/fail gating | score in `[0,1]`, pass constraints |
| `baseline/evaluate.py` | Policy benchmarking across tasks | JSON report for random/heuristic/block/allow |
## Environment Modes
- **Multi-session mode**: `step(action_map)` handles a variable batch of sessions per tick.
- **Single-session mode**: `step_single(action)` exposes one decision at a time with `Discrete(6)` semantics.
- **Inspect workflow**: inspect is first-stage evidence collection; follow-up action resolves the session.