# SENTINEL Visual System
This file is the diagram source of truth. Every diagram used in README, UI, blog, or slides should be derived from here.
## Diagram Inventory
| Diagram | Purpose | Status |
| --- | --- | --- |
| System stack | show the code architecture | ready |
| Episode lifecycle | explain `reset()` to terminal reward | ready |
| Trust and reward flow | show how state turns into learning signal | ready |
| Reward engine v2 | show process-aware reward components | ready |
| Before / after | show why SENTINEL matters | ready |
| Theme fit | map the project to the hackathon | ready |
| Training loop | show OpenEnv -> TRL / Unsloth pipeline | ready |
---
## 1. System Stack
```mermaid
flowchart TD
A["HTTP client / UI / inference.py"] --> B["app.py
FastAPI on port 7860"]
B --> C["SentinelEnv
environment.py"]
B --> D["_sessions
session_id -> SentinelEnv"]
C --> E["TaskGraph
task_graph.py"]
C --> F["TrustLedger
trust_ledger.py"]
C --> G["SpecialistPool
specialists.py"]
C --> H["RewardEngine
graders.py"]
C --> I["Scenario dataset
scenarios.py"]
C --> J["Typed models
models.py"]
B --> K["openenv.yaml"]
B --> L["static/index.html"]
```
---
## 2. Episode Lifecycle
```mermaid
flowchart TD
A["reset(task_type, seed)"] --> B["sample scenario"]
B --> C["reshuffle hidden specialist profiles"]
C --> D["set trust priors to 0.50"]
D --> E["build task graph"]
E --> F["return first observation"]
F --> G["orchestrator chooses action"]
G --> H["delegate / verify / self solve / skip"]
H --> I["specialist or self execution"]
I --> J["record outcome in TaskGraph"]
J --> K["update TrustLedger"]
K --> L["compute step reward"]
L --> M{"done?"}
M -- "no" --> N["return next observation"]
N --> G
M -- "yes" --> O["compute terminal reward"]
O --> P["return done=True with final info"]
```
---
## 3. Trust And Reward Flow
```mermaid
flowchart LR
A["Observation
subtask, stakes, trust snapshot"] --> B["Action choice"]
B --> C["Specialist result
outcome, confidence, adversarial flag, step_cost"]
C --> D["TaskGraph update"]
C --> E["TrustLedger Bayesian update"]
D --> F["completion, detections, poisonings"]
E --> G["calibration state"]
F --> H["RewardEngine"]
G --> H
H --> I["step reward"]
H --> J["terminal reward"]
```
---
## 4. Reward Engine V2
```mermaid
flowchart LR
A["Specialist result
outcome, confidence, metadata"] --> B["Step reward"]
C["TaskGraph
completion, detections, poisonings"] --> D["Terminal reward"]
E["TrustLedger
calibration, fingerprints"] --> D
B --> B1["task accuracy"]
B --> B2["stakes awareness"]
B --> B3["efficiency"]
B --> B4["confidence alignment"]
B --> B5["verification quality"]
B --> B6["domain routing"]
D --> D1["completion rate"]
D --> D2["detection rate"]
D --> D3["trust calibration"]
D --> D4["episode efficiency"]
B --> R["reward-report endpoint"]
D --> R
R --> T["component trace for judges"]
```
---
## 5. Before / After
```mermaid
flowchart LR
subgraph BEFORE["Before SENTINEL"]
A1["Uniform trust"] --> A2["Blind delegation"]
A2 --> A3["Poison accepted at high stakes"]
A3 --> A4["Downstream subtasks inherit bad state"]
A4 --> A5["Mission drifts or fails"]
end
subgraph AFTER["After SENTINEL"]
B1["Behavior updates trust"] --> B2["Low-trust high-stakes node detected"]
B2 --> B3["Verify instead of delegate"]
B3 --> B4["Poison blocked before cascade"]
B4 --> B5["Mission completes cleanly"]
end
```
---
## 6. Theme Fit
```mermaid
flowchart TD
S["SENTINEL"] --> T1["Theme 1
multi-agent interaction"]
S --> T2["Theme 2
long-horizon planning"]
S --> T4["Theme 4
self-improvement"]
S --> T5["Theme 5
wild card"]
T1 --> B1["orchestrator + five specialists
partial observability
adversarial dynamics"]
T2 --> B2["task graph
step budget pressure
delayed terminal reward"]
T4 --> B3["profile reshuffle
auto-curriculum
no memorization"]
T5 --> B4["real production weakness
blind trust in agent pipelines"]
```
---
## 7. Training Loop
```mermaid
flowchart LR
A["Prompt / observation"] --> B["Model rollout"]
B --> C["Action text or structured action"]
C --> D["SENTINEL environment"]
D --> E["Reward + next observation"]
E --> F["TRL / GRPO trainer"]
F --> G["updated policy"]
G --> B
H["training/evaluate.py"] --> I["random / heuristic / oracle-lite"]
I --> J["evaluation_results.json"]
I --> K["baseline_comparison.png"]
```
---
## Use Rules
1. Do not invent new component names in slide decks that do not exist in code.
2. Use `SentinelEnv`, `TrustLedger`, `SpecialistPool`, `TaskGraph`, `RewardEngine` consistently.
3. Use real baseline numbers in public before/after materials.
4. Export polished PNG versions from these mermaid sources later, but keep this file as the editable truth.