File size: 4,952 Bytes
2c0b609
 
 
 
 
 
 
 
 
 
 
b3b9bbd
2c0b609
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3b9bbd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2c0b609
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3b9bbd
2c0b609
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3b9bbd
2c0b609
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
# SENTINEL Visual System

This file is the diagram source of truth. Every diagram used in README, UI, blog, or slides should be derived from here.

## Diagram Inventory

| Diagram | Purpose | Status |
| --- | --- | --- |
| System stack | show the code architecture | ready |
| Episode lifecycle | explain `reset()` to terminal reward | ready |
| Trust and reward flow | show how state turns into learning signal | ready |
| Reward engine v2 | show process-aware reward components | ready |
| Before / after | show why SENTINEL matters | ready |
| Theme fit | map the project to the hackathon | ready |
| Training loop | show OpenEnv -> TRL / Unsloth pipeline | ready |

---

## 1. System Stack

```mermaid
flowchart TD
  A["HTTP client / UI / inference.py"] --> B["app.py<br/>FastAPI on port 7860"]
  B --> C["SentinelEnv<br/>environment.py"]
  B --> D["_sessions<br/>session_id -> SentinelEnv"]
  C --> E["TaskGraph<br/>task_graph.py"]
  C --> F["TrustLedger<br/>trust_ledger.py"]
  C --> G["SpecialistPool<br/>specialists.py"]
  C --> H["RewardEngine<br/>graders.py"]
  C --> I["Scenario dataset<br/>scenarios.py"]
  C --> J["Typed models<br/>models.py"]
  B --> K["openenv.yaml"]
  B --> L["static/index.html"]
```

---

## 2. Episode Lifecycle

```mermaid
flowchart TD
  A["reset(task_type, seed)"] --> B["sample scenario"]
  B --> C["reshuffle hidden specialist profiles"]
  C --> D["set trust priors to 0.50"]
  D --> E["build task graph"]
  E --> F["return first observation"]

  F --> G["orchestrator chooses action"]
  G --> H["delegate / verify / self solve / skip"]
  H --> I["specialist or self execution"]
  I --> J["record outcome in TaskGraph"]
  J --> K["update TrustLedger"]
  K --> L["compute step reward"]
  L --> M{"done?"}
  M -- "no" --> N["return next observation"]
  N --> G
  M -- "yes" --> O["compute terminal reward"]
  O --> P["return done=True with final info"]
```

---

## 3. Trust And Reward Flow

```mermaid
flowchart LR
  A["Observation<br/>subtask, stakes, trust snapshot"] --> B["Action choice"]
  B --> C["Specialist result<br/>outcome, confidence, adversarial flag, step_cost"]
  C --> D["TaskGraph update"]
  C --> E["TrustLedger Bayesian update"]
  D --> F["completion, detections, poisonings"]
  E --> G["calibration state"]
  F --> H["RewardEngine"]
  G --> H
  H --> I["step reward"]
  H --> J["terminal reward"]
```

---

## 4. Reward Engine V2

```mermaid
flowchart LR
  A["Specialist result<br/>outcome, confidence, metadata"] --> B["Step reward"]
  C["TaskGraph<br/>completion, detections, poisonings"] --> D["Terminal reward"]
  E["TrustLedger<br/>calibration, fingerprints"] --> D

  B --> B1["task accuracy"]
  B --> B2["stakes awareness"]
  B --> B3["efficiency"]
  B --> B4["confidence alignment"]
  B --> B5["verification quality"]
  B --> B6["domain routing"]

  D --> D1["completion rate"]
  D --> D2["detection rate"]
  D --> D3["trust calibration"]
  D --> D4["episode efficiency"]

  B --> R["reward-report endpoint"]
  D --> R
  R --> T["component trace for judges"]
```

---

## 5. Before / After

```mermaid
flowchart LR
  subgraph BEFORE["Before SENTINEL"]
    A1["Uniform trust"] --> A2["Blind delegation"]
    A2 --> A3["Poison accepted at high stakes"]
    A3 --> A4["Downstream subtasks inherit bad state"]
    A4 --> A5["Mission drifts or fails"]
  end

  subgraph AFTER["After SENTINEL"]
    B1["Behavior updates trust"] --> B2["Low-trust high-stakes node detected"]
    B2 --> B3["Verify instead of delegate"]
    B3 --> B4["Poison blocked before cascade"]
    B4 --> B5["Mission completes cleanly"]
  end
```

---

## 6. Theme Fit

```mermaid
flowchart TD
  S["SENTINEL"] --> T1["Theme 1<br/>multi-agent interaction"]
  S --> T2["Theme 2<br/>long-horizon planning"]
  S --> T4["Theme 4<br/>self-improvement"]
  S --> T5["Theme 5<br/>wild card"]

  T1 --> B1["orchestrator + five specialists<br/>partial observability<br/>adversarial dynamics"]
  T2 --> B2["task graph<br/>step budget pressure<br/>delayed terminal reward"]
  T4 --> B3["profile reshuffle<br/>auto-curriculum<br/>no memorization"]
  T5 --> B4["real production weakness<br/>blind trust in agent pipelines"]
```

---

## 7. Training Loop

```mermaid
flowchart LR
  A["Prompt / observation"] --> B["Model rollout"]
  B --> C["Action text or structured action"]
  C --> D["SENTINEL environment"]
  D --> E["Reward + next observation"]
  E --> F["TRL / GRPO trainer"]
  F --> G["updated policy"]
  G --> B

  H["training/evaluate.py"] --> I["random / heuristic / oracle-lite"]
  I --> J["evaluation_results.json"]
  I --> K["baseline_comparison.png"]
```

---

## Use Rules

1. Do not invent new component names in slide decks that do not exist in code.
2. Use `SentinelEnv`, `TrustLedger`, `SpecialistPool`, `TaskGraph`, `RewardEngine` consistently.
3. Use real baseline numbers in public before/after materials.
4. Export polished PNG versions from these mermaid sources later, but keep this file as the editable truth.