XcodeAddy commited on
Commit
2c0b609
·
1 Parent(s): cfbcd01

Add SENTINEL rollout and presentation spine

Browse files
README.md CHANGED
@@ -14,6 +14,14 @@ Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Hor
14
 
15
  SENTINEL is an OpenEnv-compatible RL environment for one core skill: training an orchestrator to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
16
 
 
 
 
 
 
 
 
 
17
  ## Why It Matters
18
 
19
  Modern agent systems fail in the same pattern:
 
14
 
15
  SENTINEL is an OpenEnv-compatible RL environment for one core skill: training an orchestrator to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
16
 
17
+ ## Rollout Source Of Truth
18
+
19
+ The phased execution plan and presentation assets now live in-repo:
20
+
21
+ - [Rollout](docs/ROLL_OUT.md)
22
+ - [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md)
23
+ - [Visual System](docs/diagrams/VISUAL_SYSTEM.md)
24
+
25
  ## Why It Matters
26
 
27
  Modern agent systems fail in the same pattern:
docs/ROLL_OUT.md ADDED
@@ -0,0 +1,231 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SENTINEL Rollout
2
+
3
+ This file is the execution spine for the project. The rule is simple:
4
+
5
+ 1. Finish one phase.
6
+ 2. Verify it.
7
+ 3. Only then move to the next phase.
8
+
9
+ SENTINEL wins if the repo, Space, README, UI, and pitch all tell the same story:
10
+
11
+ > Train an orchestrator to decide who to trust, when to verify, and how to recover in long multi-agent tasks when specialists are unreliable or adversarial.
12
+
13
+ ## Current Status
14
+
15
+ | Area | Status | Notes |
16
+ | --- | --- | --- |
17
+ | Environment core | Strong | `reset()`, `step()`, `state()`, rewards, task graph, specialists, trust ledger |
18
+ | OpenEnv / deploy | Strong | Space live, Docker passing, validation passing |
19
+ | UI clarity | Improving | Trust Mission Control is live, but still needs full judge-demo mode |
20
+ | Presentation assets | Partial | Story exists, but diagrams and finale pack need stronger structure |
21
+ | Training evidence | Partial | Baselines are real; final onsite GRPO curve still missing |
22
+ | Submission completeness | Partial | Mini-blog/video and final finale package still needed |
23
+
24
+ ## What We Borrow From MiroFish
25
+
26
+ We borrow **presentation discipline**, not product scope.
27
+
28
+ Use these MiroFish-style strengths:
29
+
30
+ - one sharp promise at the top
31
+ - visible workflow
32
+ - screenshot and diagram density
33
+ - live demo-first presentation
34
+ - clean quick-start and deployment instructions
35
+
36
+ Do **not** copy these patterns into SENTINEL:
37
+
38
+ - giant "predict anything" scope
39
+ - too many use cases
40
+ - vague platform framing
41
+ - vision language that is larger than the actual judged artifact
42
+
43
+ ## Phase Rules
44
+
45
+ - Phase 1 must lock the narrative.
46
+ - Phase 2 must lock the diagram system.
47
+ - Phase 3 must make the UI explain the backend and the story.
48
+ - Phase 4 must make learning evidence obvious.
49
+ - Phase 5 must make the submission complete and reproducible.
50
+ - Phase 6 must make the final pitch unforgettable.
51
+
52
+ Do not skip a verification gate just because the feature "looks done."
53
+
54
+ ---
55
+
56
+ ## Phase 1 - Narrative Lock
57
+
58
+ **Goal**
59
+ Create one judge-safe project story and use it everywhere.
60
+
61
+ **Outputs**
62
+ - [Narrative Lock](./presentation/NARRATIVE_LOCK.md)
63
+ - final one-line thesis
64
+ - final hook
65
+ - final problem framing
66
+ - final before/after claim
67
+ - final "what not to say" guardrails
68
+
69
+ **Done means**
70
+ - README, UI, demo script, and pitch all use the same project sentence
71
+ - no outdated numbers or mismatched claims remain in primary docs
72
+ - the problem statement is clearly software-first, RL-first, and OpenEnv-first
73
+
74
+ **Verification**
75
+ - README top section matches the narrative lock
76
+ - UI top section uses the same thesis
77
+ - team can explain SENTINEL in 20 seconds and 2 minutes without changing the core message
78
+
79
+ **Status**
80
+ `In progress`
81
+
82
+ ---
83
+
84
+ ## Phase 2 - Visual System Pack
85
+
86
+ **Goal**
87
+ Turn scattered diagrams into one visual language.
88
+
89
+ **Outputs**
90
+ - [Visual System](./diagrams/VISUAL_SYSTEM.md)
91
+ - architecture diagram
92
+ - episode lifecycle diagram
93
+ - trust / reward dataflow diagram
94
+ - before / after failure chain
95
+ - theme fit diagram
96
+ - training loop diagram
97
+
98
+ **Done means**
99
+ - every diagram uses the same naming and system boundaries
100
+ - no diagram contradicts the actual code
101
+ - diagrams can be embedded in README, blog, pitch, and UI
102
+
103
+ **Verification**
104
+ - `app.py`, `environment.py`, `specialists.py`, `trust_ledger.py`, `graders.py`, `task_graph.py`, and `inference.py` are all represented correctly
105
+ - before/after flow uses real baseline numbers, not aspirational placeholders
106
+
107
+ **Status**
108
+ `In progress`
109
+
110
+ ---
111
+
112
+ ## Phase 3 - Productized Demo UI
113
+
114
+ **Goal**
115
+ Make the frontend explain the backend to judges and first-time users.
116
+
117
+ **Outputs**
118
+ - `Overview` mode
119
+ - `Playground` mode
120
+ - `Judge Demo` mode
121
+ - raw request/response visibility
122
+ - guided walkthrough of one episode
123
+ - profile swap demo path
124
+
125
+ **Done means**
126
+ - a first-time viewer can answer:
127
+ - what is SENTINEL?
128
+ - what does the agent observe?
129
+ - what action did the UI send?
130
+ - what did the backend return?
131
+ - why does trust change?
132
+ - why is this hard?
133
+
134
+ **Verification**
135
+ - local `/`, `/reset`, `/step`, `/state`, and `/assets/baseline_comparison.png` all behave correctly
136
+ - live Space reflects the same experience
137
+ - no section feels like internal tooling only
138
+
139
+ **Status**
140
+ `Pending`
141
+
142
+ ---
143
+
144
+ ## Phase 4 - Learning Evidence
145
+
146
+ **Goal**
147
+ Make reward improvement impossible to miss.
148
+
149
+ **Outputs**
150
+ - random vs heuristic vs oracle-lite comparison
151
+ - visible completion, detection, calibration, efficiency metrics
152
+ - onsite GRPO / Unsloth reward curve
153
+ - trained vs untrained comparison block
154
+
155
+ **Done means**
156
+ - judges can see measurable improvement in one screen and one README section
157
+ - there is a visible path from baseline -> better policy -> trained model
158
+
159
+ **Verification**
160
+ - `training/evaluate.py` outputs are committed and linked
161
+ - onsite curve is committed once available
162
+ - numbers shown in UI and README match evaluation artifacts
163
+
164
+ **Status**
165
+ `Pending`
166
+
167
+ ---
168
+
169
+ ## Phase 5 - Submission Pack
170
+
171
+ **Goal**
172
+ Make the project submission-complete.
173
+
174
+ **Outputs**
175
+ - final README with all links
176
+ - HF Space link
177
+ - Colab / training notebook link
178
+ - blog or video link
179
+ - screenshots and diagram links
180
+ - reproduction commands
181
+
182
+ **Done means**
183
+ - a judge can clone, run, inspect, and understand the project without asking for missing context
184
+
185
+ **Verification**
186
+ - README links are live
187
+ - Space is live
188
+ - `openenv validate . --json` passes
189
+ - Docker build passes
190
+
191
+ **Status**
192
+ `Pending`
193
+
194
+ ---
195
+
196
+ ## Phase 6 - Finale Pack
197
+
198
+ **Goal**
199
+ Package the repo for the room, not just for the validator.
200
+
201
+ **Outputs**
202
+ - 3-minute script
203
+ - 5 likely judge questions + answers
204
+ - backup screenshots
205
+ - fallback demo sequence
206
+ - one-click "killer moment" path
207
+
208
+ **Done means**
209
+ - the pitch works even if the live environment is slow
210
+ - the trained-vs-baseline story is memorable
211
+ - the profile swap moment is rehearsed
212
+
213
+ **Verification**
214
+ - demo path can be run without improvising architecture details
215
+ - every claim can be grounded in repo assets
216
+
217
+ **Status**
218
+ `Pending`
219
+
220
+ ---
221
+
222
+ ## Execution Order
223
+
224
+ ```text
225
+ Phase 1 -> Phase 2 -> Phase 3 -> Phase 4 -> Phase 5 -> Phase 6
226
+ ```
227
+
228
+ ## Next Immediate Build Target
229
+
230
+ Phase 1 and Phase 2 are the current active work.
231
+ Once both are fully stable in-repo, Phase 3 starts on top of them.
docs/diagrams/VISUAL_SYSTEM.md ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SENTINEL Visual System
2
+
3
+ This file is the diagram source of truth. Every diagram used in README, UI, blog, or slides should be derived from here.
4
+
5
+ ## Diagram Inventory
6
+
7
+ | Diagram | Purpose | Status |
8
+ | --- | --- | --- |
9
+ | System stack | show the code architecture | ready |
10
+ | Episode lifecycle | explain `reset()` to terminal reward | ready |
11
+ | Trust and reward flow | show how state turns into learning signal | ready |
12
+ | Before / after | show why SENTINEL matters | ready |
13
+ | Theme fit | map the project to the hackathon | ready |
14
+ | Training loop | show OpenEnv -> TRL / Unsloth pipeline | ready |
15
+
16
+ ---
17
+
18
+ ## 1. System Stack
19
+
20
+ ```mermaid
21
+ flowchart TD
22
+ A["HTTP client / UI / inference.py"] --> B["app.py<br/>FastAPI on port 7860"]
23
+ B --> C["SentinelEnv<br/>environment.py"]
24
+ B --> D["_sessions<br/>session_id -> SentinelEnv"]
25
+ C --> E["TaskGraph<br/>task_graph.py"]
26
+ C --> F["TrustLedger<br/>trust_ledger.py"]
27
+ C --> G["SpecialistPool<br/>specialists.py"]
28
+ C --> H["RewardEngine<br/>graders.py"]
29
+ C --> I["Scenario dataset<br/>scenarios.py"]
30
+ C --> J["Typed models<br/>models.py"]
31
+ B --> K["openenv.yaml"]
32
+ B --> L["static/index.html"]
33
+ ```
34
+
35
+ ---
36
+
37
+ ## 2. Episode Lifecycle
38
+
39
+ ```mermaid
40
+ flowchart TD
41
+ A["reset(task_type, seed)"] --> B["sample scenario"]
42
+ B --> C["reshuffle hidden specialist profiles"]
43
+ C --> D["set trust priors to 0.50"]
44
+ D --> E["build task graph"]
45
+ E --> F["return first observation"]
46
+
47
+ F --> G["orchestrator chooses action"]
48
+ G --> H["delegate / verify / self solve / skip"]
49
+ H --> I["specialist or self execution"]
50
+ I --> J["record outcome in TaskGraph"]
51
+ J --> K["update TrustLedger"]
52
+ K --> L["compute step reward"]
53
+ L --> M{"done?"}
54
+ M -- "no" --> N["return next observation"]
55
+ N --> G
56
+ M -- "yes" --> O["compute terminal reward"]
57
+ O --> P["return done=True with final info"]
58
+ ```
59
+
60
+ ---
61
+
62
+ ## 3. Trust And Reward Flow
63
+
64
+ ```mermaid
65
+ flowchart LR
66
+ A["Observation<br/>subtask, stakes, trust snapshot"] --> B["Action choice"]
67
+ B --> C["Specialist result<br/>outcome, confidence, adversarial flag, step_cost"]
68
+ C --> D["TaskGraph update"]
69
+ C --> E["TrustLedger Bayesian update"]
70
+ D --> F["completion, detections, poisonings"]
71
+ E --> G["calibration state"]
72
+ F --> H["RewardEngine"]
73
+ G --> H
74
+ H --> I["step reward"]
75
+ H --> J["terminal reward"]
76
+ ```
77
+
78
+ ---
79
+
80
+ ## 4. Before / After
81
+
82
+ ```mermaid
83
+ flowchart LR
84
+ subgraph BEFORE["Before SENTINEL"]
85
+ A1["Uniform trust"] --> A2["Blind delegation"]
86
+ A2 --> A3["Poison accepted at high stakes"]
87
+ A3 --> A4["Downstream subtasks inherit bad state"]
88
+ A4 --> A5["Mission drifts or fails"]
89
+ end
90
+
91
+ subgraph AFTER["After SENTINEL"]
92
+ B1["Behavior updates trust"] --> B2["Low-trust high-stakes node detected"]
93
+ B2 --> B3["Verify instead of delegate"]
94
+ B3 --> B4["Poison blocked before cascade"]
95
+ B4 --> B5["Mission completes cleanly"]
96
+ end
97
+ ```
98
+
99
+ ---
100
+
101
+ ## 5. Theme Fit
102
+
103
+ ```mermaid
104
+ flowchart TD
105
+ S["SENTINEL"] --> T1["Theme 1<br/>multi-agent interaction"]
106
+ S --> T2["Theme 2<br/>long-horizon planning"]
107
+ S --> T4["Theme 4<br/>self-improvement"]
108
+ S --> T5["Theme 5<br/>wild card"]
109
+
110
+ T1 --> B1["orchestrator + five specialists<br/>partial observability<br/>adversarial dynamics"]
111
+ T2 --> B2["task graph<br/>step budget pressure<br/>delayed terminal reward"]
112
+ T4 --> B3["profile reshuffle<br/>auto-curriculum<br/>no memorization"]
113
+ T5 --> B4["real production weakness<br/>blind trust in agent pipelines"]
114
+ ```
115
+
116
+ ---
117
+
118
+ ## 6. Training Loop
119
+
120
+ ```mermaid
121
+ flowchart LR
122
+ A["Prompt / observation"] --> B["Model rollout"]
123
+ B --> C["Action text or structured action"]
124
+ C --> D["SENTINEL environment"]
125
+ D --> E["Reward + next observation"]
126
+ E --> F["TRL / GRPO trainer"]
127
+ F --> G["updated policy"]
128
+ G --> B
129
+
130
+ H["training/evaluate.py"] --> I["random / heuristic / oracle-lite"]
131
+ I --> J["evaluation_results.json"]
132
+ I --> K["baseline_comparison.png"]
133
+ ```
134
+
135
+ ---
136
+
137
+ ## Use Rules
138
+
139
+ 1. Do not invent new component names in slide decks that do not exist in code.
140
+ 2. Use `SentinelEnv`, `TrustLedger`, `SpecialistPool`, `TaskGraph`, `RewardEngine` consistently.
141
+ 3. Use real baseline numbers in public before/after materials.
142
+ 4. Export polished PNG versions from these mermaid sources later, but keep this file as the editable truth.
docs/presentation/NARRATIVE_LOCK.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SENTINEL Narrative Lock
2
+
3
+ This file defines the one story the whole project must tell.
4
+
5
+ ## One Sentence
6
+
7
+ SENTINEL is an OpenEnv RL environment that trains an orchestrator to decide who to trust, when to verify, and how to recover in long multi-agent tasks when specialist agents are unreliable or adversarial.
8
+
9
+ ## 20-Second Version
10
+
11
+ Multi-agent systems break because they trust sub-agents too easily. SENTINEL turns that failure into a trainable environment: the orchestrator must learn trust calibration from behavioral evidence alone, under long-horizon pressure and adversarial specialists.
12
+
13
+ ## 2-Minute Version
14
+
15
+ Every multi-agent framework today has the same hidden weakness: one specialist can be confidently wrong, and the orchestrator will often delegate blindly, accept the result, and let the failure cascade downstream. SENTINEL is an OpenEnv RL environment built to train exactly against that weakness.
16
+
17
+ The orchestrator never sees specialist internals. It only sees behavior: outcomes, stakes, history, and trust scores. Five public specialist slots are visible, but the hidden profiles reshuffle every episode, so the agent cannot memorize identities. It must learn the skill of trust calibration.
18
+
19
+ The environment rewards mission completion, adversarial detection, calibration quality, and efficiency. That makes the project more than a simulation. It is a training environment with measurable improvement: random routing, trust-aware heuristic routing, and eventually trained routing.
20
+
21
+ ## Problem Statement
22
+
23
+ Train an orchestrator to complete long multi-agent tasks under partial observability by learning:
24
+
25
+ - which specialist to trust
26
+ - when a risky result should be verified
27
+ - when to self-solve instead of delegating
28
+ - how to recover before poisoned state cascades through the mission
29
+
30
+ ## What We Are Building
31
+
32
+ We are building:
33
+
34
+ - a deployable OpenEnv environment
35
+ - a reward design for trust calibration
36
+ - a live judge-demo UI
37
+ - a training and evaluation pipeline
38
+ - a final before/after demo showing learned behavioral change
39
+
40
+ We are **not** building:
41
+
42
+ - a general chatbot
43
+ - a coding assistant product
44
+ - a replay of incident triage
45
+ - a giant multi-domain prediction platform
46
+ - a vague multi-agent "framework"
47
+
48
+ ## Why Judges Should Care
49
+
50
+ This is not a toy coordination task. It targets a real production weakness in modern agent systems:
51
+
52
+ > sub-agents are often assumed trustworthy until a human catches the damage.
53
+
54
+ SENTINEL makes that weakness trainable.
55
+
56
+ ## Before / After Claim
57
+
58
+ **Before SENTINEL**
59
+ - trust is static or heuristic
60
+ - bad high-confidence outputs slip through
61
+ - failures cascade across downstream steps
62
+ - the orchestrator cannot explain why the mission drifted
63
+
64
+ **After SENTINEL**
65
+ - trust changes from observed behavior
66
+ - high-stakes, low-trust outputs are verified
67
+ - adversarial attempts are caught before cascade
68
+ - the orchestrator learns skill, not memorized role identity
69
+
70
+ ## Non-Negotiable Claims
71
+
72
+ These claims must stay consistent in README, UI, demo, and blog:
73
+
74
+ 1. SENTINEL is about **trust calibration**
75
+ 2. the orchestrator is the **trainable policy**
76
+ 3. specialists are **scripted on purpose** for stable reward
77
+ 4. the reshuffle mechanic proves **skill over memorization**
78
+ 5. the reward combines **completion, detection, calibration, efficiency**
79
+
80
+ ## What Not To Say
81
+
82
+ Do not describe SENTINEL as:
83
+
84
+ - "predict anything"
85
+ - "a full digital twin of the world"
86
+ - "an all-in-one multi-agent platform"
87
+ - "a software assistant for every use case"
88
+ - "a space, quantum, or general science simulator"
89
+
90
+ Those make the project sound bigger but less judgeable.
91
+
92
+ ## Judge-Facing Angle By Criterion
93
+
94
+ ### Environment Innovation
95
+ The novelty is not just "multi-agent."
96
+ The novelty is **training trust calibration under shuffled adversarial identity**.
97
+
98
+ ### Storytelling
99
+ The story is simple:
100
+ - blind trust fails
101
+ - behavioral evidence updates trust
102
+ - verification blocks poison
103
+ - profile swap proves generalization
104
+
105
+ ### Improvement In Rewards
106
+ The visual proof is:
107
+ - random
108
+ - heuristic
109
+ - oracle-lite
110
+ - trained model onsite
111
+
112
+ ### Reward / Training Pipeline
113
+ The important line is:
114
+
115
+ > the reward does not praise vibes; it scores completion, detection, calibration, and efficiency.
116
+
117
+ ## 3-Minute Pitch Spine
118
+
119
+ ### Minute 1
120
+ Problem: multi-agent systems trust sub-agents too easily.
121
+
122
+ ### Minute 2
123
+ Environment: orchestrator, five specialist slots, hidden shuffled profiles, trust ledger, task graph, reward engine.
124
+
125
+ ### Minute 3
126
+ Evidence: baseline gap, live trust changes, profile swap moment.