XcodeAddy commited on
Commit
31e8c01
Β·
1 Parent(s): 7f19e87

πŸ“ Final README: add all submission links, training evidence, blog reference, hackathon alignment

Browse files
Files changed (1) hide show
  1. README.md +238 -182
README.md CHANGED
@@ -8,34 +8,43 @@ pinned: false
8
  license: mit
9
  ---
10
 
11
- # SENTINEL
12
 
13
- Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks.
14
 
15
- SENTINEL is an OpenEnv-compatible RL environment for one core skill: training an orchestrator to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
16
 
17
- ## Rollout Source Of Truth
18
 
19
- The phased execution plan and presentation assets now live in-repo:
 
 
 
 
 
 
 
20
 
21
- - [Rollout](docs/ROLL_OUT.md)
22
- - [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md)
23
- - [Visual System](docs/diagrams/VISUAL_SYSTEM.md)
24
 
25
- ## Why It Matters
26
 
27
- Modern agent systems fail in the same pattern:
 
 
28
 
29
  1. A long task is decomposed into many steps.
30
  2. The orchestrator delegates to sub-agents or tools.
31
- 3. One specialist returns a confident but wrong result.
32
- 4. The system trusts it, builds on it, and drifts into failure.
33
 
34
- SENTINEL turns that failure mode into a trainable environment. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It never sees hidden specialist identities.
35
 
36
- ## Real-World Bridge
37
 
38
- SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the hidden control loop inside a long-running agent.
 
 
39
 
40
  Example user mission:
41
 
@@ -46,70 +55,105 @@ fix the risky parts, and prepare it for deployment.
46
 
47
  What SENTINEL abstracts:
48
 
49
- 1. The user mission becomes a scenario with a task graph.
50
- 2. The LLM orchestrator sees one subtask, current stakes, public specialist ids, and trust scores.
51
  3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`.
52
- 4. A hidden specialist profile responds: accurate, overconfident, domain-bound, adversarial, or degrading.
53
  5. The reward engine scores the action and the trust ledger updates.
54
- 6. GRPO/TRL uses that reward to train better orchestration behavior.
55
 
56
- This is why the project matters for real agents: after many long user requests, the failure is often not "the LLM cannot speak." The failure is that the system trusted the wrong intermediate result and kept building on it. SENTINEL trains the agent to catch that failure while it is still recoverable.
57
 
58
- Judge-readable endpoints:
59
 
60
- ```bash
61
- curl http://localhost:7860/problem
62
- curl "http://localhost:7860/mission?task_type=task3"
63
- ```
64
 
65
- ## Environment Shape
66
 
67
- - API: `reset()`, `step(action)`, `state()`
68
- - Runtime: FastAPI on port `7860`
69
- - Tasks: `task1`, `task2`, `task3`
70
- - Specialists: 5 scripted FSM agents with shuffled hidden profiles
71
- - Rewards: per-step reward plus terminal score, normalized to `0.0-1.0`
72
- - Dataset: 120 abstract multi-agent scenarios
73
- - Session store: single-process memory with TTL/LRU cleanup
74
- - Optional adaptive curriculum: pass `adaptive=true` on `/reset` for Theme 4 demos
75
- - Live trust stream: `/stream?session_id=...` feeds the `/trust-dashboard` bars
76
 
77
- Deployment contract: run one server worker for the submitted Space. Active `SentinelEnv` objects live in process memory, so multi-worker deployments need sticky sessions or a shared store such as Redis. The Dockerfile intentionally starts uvicorn with `--workers 1`.
78
 
79
- ## Live Submission Targets
80
 
81
- - GitHub: `https://github.com/ADITYAGABA1322/sentinel-env`
82
- - Hugging Face Space repo/settings: `https://huggingface.co/spaces/XcodeAddy/sentinel-env`
83
- - Hugging Face live app: `https://xcodeaddy-sentinel-env.hf.space`
84
- - OpenEnv base URL: `https://xcodeaddy-sentinel-env.hf.space`
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
- Local note: run uvicorn with `--host 0.0.0.0`, but open the app in a browser at
87
- `http://127.0.0.1:7860/` or `http://localhost:7860/`. `0.0.0.0` is a bind
88
- address, not the page URL to demo.
89
 
90
- ## Specialist Behaviors
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
  | Public Slot | Hidden Behavior |
93
  | --- | --- |
94
- | S0-S4 | Public ids are shuffled every episode |
95
 
96
  Hidden profiles:
97
 
98
- - `AccurateSlow`: 90 percent accurate, costs 2 steps.
99
- - `OverconfidentFast`: quick and confident, wrong 40 percent of the time.
100
- - `DomainBound`: strong on analysis/verification, weak elsewhere.
101
- - `Adversarial`: benign below stakes `0.70`, poisons at stakes `>=0.70`.
102
- - `Degrading`: strong early, gets worse with reuse.
103
 
104
- ## Tasks
 
 
105
 
106
  | Task | Difficulty | Goal |
107
  | --- | --- | --- |
108
- | `task1` | Easy | Learn basic delegation over a 10-subtask chain. |
109
- | `task2` | Medium | Complete a 15-subtask DAG and build calibrated trust. |
110
- | `task3` | Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts. |
 
 
111
 
112
- ## Reward Model
113
 
114
  Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`.
115
 
@@ -118,20 +162,18 @@ Shared threshold/cost constants live in `sentinel_config.py`:
118
  - `ADVERSARIAL_TRIGGER_STAKES = 0.70`: adversarial specialist starts poisoning.
119
  - `ADVERSARIAL_AWARENESS_STAKES = 0.70`: graders reward verification/evasion.
120
  - `CRITICAL_POISON_STAKES = 0.85`: unchecked poison ends the episode.
121
- - `VERIFY_EXTRA_STEP_COST = 1`: verify cost is specialist step cost plus one.
122
 
123
  Task 3 terminal score:
124
 
125
  ```text
126
- 0.35 * completion_rate
127
- + 0.30 * adversarial_detection_rate
128
- + 0.25 * trust_calibration
129
- + 0.10 * efficiency
130
  ```
131
 
132
- The episode `score` exposed in `info` and inference logs is the mean reward over emitted grading events, normalized to `0.0-1.0`. It is intentionally not raw cumulative return; terminal reward and efficiency terms carry the penalty for unfinished or wasteful episodes while keeping scores comparable across tasks with different horizons.
133
-
134
- Reward Engine v2 adds process-aware signals on top of outcome scoring:
135
 
136
  - `confidence_alignment`: penalizes confident wrong outputs.
137
  - `domain_routing`: rewards domain-bound behavior only when it is actually in-domain.
@@ -139,30 +181,31 @@ Reward Engine v2 adds process-aware signals on top of outcome scoring:
139
 
140
  The active step formulas are exposed at `/grader`, and each active episode exposes a full component trace at `/reward-report?session_id=<id>`.
141
 
142
- ## WOW Factor Features
143
 
144
- SENTINEL now includes three judge-facing upgrades:
145
 
146
  1. **Adaptive difficulty engine**: `DifficultyController` watches rolling adversarial detection rate. Strong agents get earlier adversarial triggers, more high-stakes nodes, and a tighter step budget. Struggling agents get easier episodes. Enable it with:
147
 
148
- ```bash
149
- curl -X POST http://localhost:7860/reset \
150
- -H "Content-Type: application/json" \
151
- -d '{"task_type":"task3","seed":42,"adaptive":true}'
152
- ```
153
-
154
- 2. **Behavioral fingerprints**: every observation includes `behavioral_fingerprints` for S0-S4:
155
 
156
- - `confidence_accuracy_gap`
157
- - `domain_hit_rate`
158
- - `stakes_volatility`
159
- - low/high stakes accuracy
 
160
 
161
- These are public behavioral signals only. They do not leak the hidden specialist identity.
162
 
163
  3. **Live trust stream**: `/stream?session_id=<id>` emits server-sent events with trust updates, fingerprints, and difficulty profile. Open `/trust-dashboard?session_id=<id>` during a demo to watch the trust bars update live.
164
 
165
- ## API
 
 
166
 
167
  ```bash
168
  curl http://localhost:7860/health
@@ -177,11 +220,11 @@ curl "http://localhost:7860/reward-report?session_id=<session_id>"
177
  curl http://localhost:7860/difficulty
178
  ```
179
 
180
- The root route `/` serves the live SENTINEL dashboard on Hugging Face Spaces.
181
  Use `/api` for the JSON route index.
182
  Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard.
183
 
184
- Live stream demo:
185
 
186
  ```bash
187
  # Terminal 1
@@ -196,7 +239,31 @@ curl -s -X POST http://localhost:7860/reset \
196
  open "http://localhost:7860/trust-dashboard?session_id=<session_id>"
197
  ```
198
 
199
- ## Backend Walkthrough
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
200
 
201
  For terminal-first debugging and pitch clarity, run:
202
 
@@ -214,94 +281,75 @@ This prints the full backend story:
214
 
215
  The key scenario to understand is `task3, seed=42`: public slot `S0` is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift.
216
 
217
- Adaptive evaluation:
218
 
219
  ```bash
220
  python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \
221
  --plot outputs/task3_adaptive_comparison.png
222
  ```
223
 
224
- ## Live Dashboard
 
 
225
 
226
  The Space opens directly into **SENTINEL Trust Mission Control**, a judge-demo dashboard:
227
 
228
- - live task progress and score
229
- - S0-S4 network theater with trust state per public slot
230
- - manual `delegate`, `verify`, `solve_independently`, and `skip` controls
231
- - heuristic auto-policy and one-click recommended move
232
  - API playground showing raw request and response payloads
233
- - profile reshuffle demo via seed swap
234
- - before-and-after story lane for judge presentation
235
- - hackathon readiness panel for what is done vs still pending
236
- - risk gate for high-stakes subtasks
237
- - flight recorder of step rewards and decisions
238
- - code-flow map from `reset()` to reward
239
- - hackathon theme coverage map
240
- - adversarial detection and poisoning counters
241
- - baseline proof table and chart for random, heuristic, and oracle-lite policies
242
-
243
- Current status as of April 22, 2026:
244
-
245
- | Requirement | Status |
246
- | --- | --- |
247
- | Hugging Face Space | Live |
248
- | Docker build | Passing |
249
- | OpenEnv validation | Passing |
250
- | Baseline chart | Committed |
251
- | Live trust UI | Deployed |
252
- | Mini-blog/video | Still required before finale |
253
- | Onsite GRPO curve | Still required during finale |
254
-
255
- Start an episode:
256
 
257
- ```bash
258
- curl -X POST http://localhost:7860/reset \
259
- -H "Content-Type: application/json" \
260
- -d '{"task_type":"task3","seed":42}'
261
- ```
262
-
263
- Step:
264
-
265
- ```bash
266
- curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
267
- -H "Content-Type: application/json" \
268
- -d '{
269
- "session_id":"<SESSION_ID>",
270
- "task_type":"task3",
271
- "action_type":"delegate",
272
- "specialist_id":"S2",
273
- "reasoning":"S2 has the best observed trust score"
274
- }'
275
- ```
276
 
277
- ## Project Structure
278
 
279
  ```text
280
  sentinel-env/
281
- |-- app.py
282
- |-- environment.py
283
- |-- models.py
284
- |-- graders.py
285
- |-- specialists.py
286
- |-- trust_ledger.py
287
- |-- task_graph.py
288
- |-- comms_bus.py
289
- |-- scenarios.py
290
- |-- inference.py
291
- |-- openenv.yaml
292
- |-- Dockerfile
293
- |-- requirements.txt
294
- |-- training/
295
- | |-- train.py
296
- | |-- evaluate.py
297
- | `-- colab_notebook.ipynb
298
- `-- tests/
299
- |-- test_environment.py
300
- |-- test_graders.py
301
- `-- test_specialists.py
 
 
 
 
 
 
 
 
 
302
  ```
303
 
304
- ## Local Setup
 
 
305
 
306
  ```bash
307
  python3 -m venv .venv
@@ -311,7 +359,7 @@ pip install -r requirements.txt
311
  pip install pytest
312
  ```
313
 
314
- Run checks:
315
 
316
  ```bash
317
  python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py
@@ -322,27 +370,29 @@ python training/train.py --dry-run --episodes 5
322
  python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14
323
  ```
324
 
325
- Run the server:
326
 
327
  ```bash
328
  uvicorn app:app --host 0.0.0.0 --port 7860
329
  ```
330
 
331
- Validate with OpenEnv:
332
 
333
  ```bash
334
  pip install openenv-core==0.2.3
335
  openenv validate . --json
336
  ```
337
 
338
- Docker:
339
 
340
  ```bash
341
  docker build -t sentinel-env .
342
  docker run -p 7860:7860 sentinel-env
343
  ```
344
 
345
- ## Baselines
 
 
346
 
347
  `inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
348
 
@@ -357,22 +407,13 @@ docker run -p 7860:7860 sentinel-env
357
  - `random`
358
  - `heuristic`
359
  - `oracle_lite`
 
360
 
361
  The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`.
362
 
363
- ![Baseline Comparison](outputs/baseline_comparison.png)
364
-
365
- Latest local comparison, 20 episodes per task and policy:
366
-
367
- | Policy | Overall | Task 1 | Task 2 | Task 3 |
368
- | --- | ---: | ---: | ---: | ---: |
369
- | Random | 0.6954 | 0.7702 | 0.6505 | 0.6655 |
370
- | Heuristic trust-weighted | 0.7960 | 0.8690 | 0.7677 | 0.7513 |
371
- | Oracle-lite upper bound | 0.8553 | 0.9180 | 0.7801 | 0.8678 |
372
-
373
- The demo story is the score gap: the reward function distinguishes blind delegation from trust-aware routing, and the oracle-lite upper bound shows room for onsite RL training.
374
 
375
- ## Hugging Face Deployment
376
 
377
  ```bash
378
  huggingface-cli login
@@ -392,22 +433,37 @@ curl -X POST https://xcodeaddy-sentinel-env.hf.space/reset \
392
  openenv validate . --json
393
  ```
394
 
395
- ## Mini-Blog Draft
396
 
397
- Title: `SENTINEL: Training AI to Trust Wisely in Multi-Agent Systems`
398
 
399
- SENTINEL is an OpenEnv RL environment for one failure mode: multi-agent systems delegate blindly. One orchestrator must complete long tasks by routing work across five specialist agents whose reliability profiles are hidden and reshuffled every episode. The orchestrator only sees behavior, confidence, stakes, and history, so it must learn skepticism, verification, recovery, and calibrated trust.
 
 
 
 
 
 
400
 
401
- The specialists are deterministic FSMs on purpose: they give stable reward signals while the orchestrator remains the trainable target. Under Reward Engine v2, random routing scores `0.6954`, trust-weighted routing scores `0.7960`, and oracle-lite reaches `0.8553`, showing the environment has a meaningful learning signal before onsite GRPO training.
402
 
403
- ## Hackathon Alignment
404
 
405
- - Theme 1: multi-agent interaction, partial observability, adversarial specialist, trust calibration.
406
- - Theme 2: long-horizon task graphs with delayed terminal reward and failure recovery.
407
- - Theme 3.1: professional agent orchestration workflow with API-style actions.
408
- - Theme 4: profile shuffle creates a self-resetting curriculum.
409
- - Theme 5: targets a real AI systems failure: blind trust inside agent pipelines.
410
 
411
- Winning demo line:
412
 
413
- > Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  license: mit
9
  ---
10
 
11
+ # πŸ›‘οΈ SENTINEL β€” Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks
12
 
13
+ > Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.
14
 
15
+ ---
16
 
17
+ ## πŸ“Œ Quick Links
18
 
19
+ | Resource | Link |
20
+ | --- | --- |
21
+ | 🌐 **Live HF Space** | [https://xcodeaddy-sentinel-env.hf.space](https://xcodeaddy-sentinel-env.hf.space) |
22
+ | πŸ“‚ **HF Space Repo** | [https://huggingface.co/spaces/XcodeAddy/sentinel-env](https://huggingface.co/spaces/XcodeAddy/sentinel-env) |
23
+ | πŸ™ **GitHub Repo** | [https://github.com/ADITYAGABA1322/sentinel-env](https://github.com/ADITYAGABA1322/sentinel-env) |
24
+ | πŸ““ **Training Notebook (Colab)** | [training/colab_notebook.ipynb](training/colab_notebook.ipynb) |
25
+ | πŸ“ **Mini-Blog on Hugging Face** | [https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely](https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely) |
26
+ | πŸ–₯️ **OpenEnv Base URL** | [https://xcodeaddy-sentinel-env.hf.space](https://xcodeaddy-sentinel-env.hf.space) |
27
 
28
+ ---
 
 
29
 
30
+ ## 🧠 What Is SENTINEL?
31
 
32
+ SENTINEL is an **OpenEnv-compatible RL environment** designed to train one core skill: teaching an orchestrator agent to decide **who to trust, when to verify, how to recover, and how to finish** long multi-agent work when specialist agents are unreliable or adversarial.
33
+
34
+ Modern agent systems fail in a predictable pattern:
35
 
36
  1. A long task is decomposed into many steps.
37
  2. The orchestrator delegates to sub-agents or tools.
38
+ 3. One specialist returns a **confident but wrong** result.
39
+ 4. The system trusts it, builds on it, and **drifts into failure**.
40
 
41
+ SENTINEL turns that failure mode into a **trainable environment**. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It **never** sees hidden specialist identities.
42
 
43
+ ---
44
 
45
+ ## 🌍 Real-World Bridge
46
+
47
+ SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the **hidden control loop** inside a long-running agent.
48
 
49
  Example user mission:
50
 
 
55
 
56
  What SENTINEL abstracts:
57
 
58
+ 1. The user mission becomes a scenario with a **task graph**.
59
+ 2. The LLM orchestrator sees one subtask, current stakes, public specialist IDs, and trust scores.
60
  3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`.
61
+ 4. A hidden specialist profile responds: *accurate*, *overconfident*, *domain-bound*, *adversarial*, or *degrading*.
62
  5. The reward engine scores the action and the trust ledger updates.
63
+ 6. **GRPO/TRL** uses that reward to train better orchestration behavior.
64
 
65
+ ---
66
 
67
+ ## 🎯 Training Evidence
68
 
69
+ ### Training Notebook
 
 
 
70
 
71
+ The full training pipeline is available as a **reproducible Colab notebook**: [`training/colab_notebook.ipynb`](training/colab_notebook.ipynb).
72
 
73
+ It produces every artifact the repo expects:
74
+ - `outputs/eval_pre.json` β€” Pre-training baselines
75
+ - `training/sentinel_qwen15_grpo/` β€” LoRA adapter + `trainer_state.json`
76
+ - `outputs/trained_policy_replay.jsonl` β€” UI replay table
77
+ - `outputs/eval_post.json` β€” Post-training evaluation
78
+ - `outputs/reward_report_task3_seed42.json` β€” Per-step reward report
79
+ - `outputs/charts/*.png` β€” 12 publication-quality charts
 
 
80
 
81
+ ### Loss & Reward Plots
82
 
83
+ All generated from real training runs via `training/plots.py`:
84
 
85
+ | Chart | Description |
86
+ | --- | --- |
87
+ | `outputs/charts/grpo_reward_curve.png` | GRPO reward over training steps |
88
+ | `outputs/charts/baseline_grouped_bars.png` | Random vs Heuristic vs Oracle-lite vs Trained |
89
+ | `outputs/charts/trust_evolution.png` | Trust trajectory per specialist |
90
+ | `outputs/charts/detection_vs_poisoning.png` | Adversarial detection vs poison events |
91
+ | `outputs/charts/ablation.png` | Reward component ablation |
92
+ | `outputs/charts/task_radar.png` | Multi-dimension task performance |
93
+ | `outputs/charts/failure_fishbone_map.png` | Failure mode analysis |
94
+
95
+ ### Baseline Comparison
96
+
97
+ ![Baseline Comparison](outputs/baseline_comparison.png)
98
+
99
+ Latest local comparison, 30 episodes per task and policy:
100
+
101
+ | Policy | Overall | Task 1 | Task 2 | Task 3 |
102
+ | --- | ---: | ---: | ---: | ---: |
103
+ | Random | 0.6904 | 0.7635 | 0.6472 | 0.6606 |
104
+ | Heuristic trust-weighted | 0.7817 | 0.8504 | 0.7497 | 0.7449 |
105
+ | Oracle-lite upper bound | 0.8405 | 0.9011 | 0.7638 | 0.8567 |
106
+ | **Trained (GRPO)** | **0.7880** | **0.8504** | **0.7497** | **0.7637** |
107
+
108
+ The demo story is the **score gap**: the reward function distinguishes blind delegation from trust-aware routing, and the oracle-lite upper bound shows room for further RL training.
109
+
110
+ ---
111
 
112
+ ## πŸ”§ Environment Shape
 
 
113
 
114
+ | Property | Value |
115
+ | --- | --- |
116
+ | API | `reset()`, `step(action)`, `state()` |
117
+ | Runtime | FastAPI on port `7860` |
118
+ | Tasks | `task1`, `task2`, `task3` |
119
+ | Specialists | 5 scripted FSM agents with shuffled hidden profiles |
120
+ | Rewards | Per-step reward + terminal score, normalized to `0.0–1.0` |
121
+ | Dataset | 120 abstract multi-agent scenarios |
122
+ | Session store | Single-process memory with TTL/LRU cleanup |
123
+ | Adaptive curriculum | Pass `adaptive=true` on `/reset` for Theme 4 demos |
124
+ | Live trust stream | `/stream?session_id=...` feeds the `/trust-dashboard` bars |
125
+
126
+ Deployment contract: run one server worker for the submitted Space. Active `SentinelEnv` objects live in process memory, so multi-worker deployments need sticky sessions or a shared store such as Redis. The Dockerfile intentionally starts uvicorn with `--workers 1`.
127
+
128
+ ---
129
+
130
+ ## πŸ•΅οΈ Specialist Behaviors
131
 
132
  | Public Slot | Hidden Behavior |
133
  | --- | --- |
134
+ | S0–S4 | Public IDs are **shuffled every episode** |
135
 
136
  Hidden profiles:
137
 
138
+ - **`AccurateSlow`**: 90% accurate, costs 2 steps.
139
+ - **`OverconfidentFast`**: Quick and confident, wrong 40% of the time.
140
+ - **`DomainBound`**: Strong on analysis/verification, weak elsewhere.
141
+ - **`Adversarial`**: Benign below stakes `0.70`, poisons at stakes `β‰₯0.70`.
142
+ - **`Degrading`**: Strong early, gets worse with reuse.
143
 
144
+ ---
145
+
146
+ ## πŸ“‹ Tasks
147
 
148
  | Task | Difficulty | Goal |
149
  | --- | --- | --- |
150
+ | `task1` | Easy | Learn basic delegation over a 10-subtask chain |
151
+ | `task2` | Medium | Complete a 15-subtask DAG and build calibrated trust |
152
+ | `task3` | Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts |
153
+
154
+ ---
155
 
156
+ ## πŸ’° Reward Model
157
 
158
  Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`.
159
 
 
162
  - `ADVERSARIAL_TRIGGER_STAKES = 0.70`: adversarial specialist starts poisoning.
163
  - `ADVERSARIAL_AWARENESS_STAKES = 0.70`: graders reward verification/evasion.
164
  - `CRITICAL_POISON_STAKES = 0.85`: unchecked poison ends the episode.
165
+ - `VERIFY_EXTRA_STEP_COST = 1`: verify cost = specialist step cost + 1.
166
 
167
  Task 3 terminal score:
168
 
169
  ```text
170
+ 0.35 Γ— completion_rate
171
+ + 0.30 Γ— adversarial_detection_rate
172
+ + 0.25 Γ— trust_calibration
173
+ + 0.10 Γ— efficiency
174
  ```
175
 
176
+ **Reward Engine v2** adds process-aware signals on top of outcome scoring:
 
 
177
 
178
  - `confidence_alignment`: penalizes confident wrong outputs.
179
  - `domain_routing`: rewards domain-bound behavior only when it is actually in-domain.
 
181
 
182
  The active step formulas are exposed at `/grader`, and each active episode exposes a full component trace at `/reward-report?session_id=<id>`.
183
 
184
+ ---
185
 
186
+ ## ✨ WOW Factor Features
187
 
188
  1. **Adaptive difficulty engine**: `DifficultyController` watches rolling adversarial detection rate. Strong agents get earlier adversarial triggers, more high-stakes nodes, and a tighter step budget. Struggling agents get easier episodes. Enable it with:
189
 
190
+ ```bash
191
+ curl -X POST http://localhost:7860/reset \
192
+ -H "Content-Type: application/json" \
193
+ -d '{"task_type":"task3","seed":42,"adaptive":true}'
194
+ ```
 
 
195
 
196
+ 2. **Behavioral fingerprints**: every observation includes `behavioral_fingerprints` for S0–S4:
197
+ - `confidence_accuracy_gap`
198
+ - `domain_hit_rate`
199
+ - `stakes_volatility`
200
+ - low/high stakes accuracy
201
 
202
+ These are public behavioral signals only. They do **not** leak the hidden specialist identity.
203
 
204
  3. **Live trust stream**: `/stream?session_id=<id>` emits server-sent events with trust updates, fingerprints, and difficulty profile. Open `/trust-dashboard?session_id=<id>` during a demo to watch the trust bars update live.
205
 
206
+ ---
207
+
208
+ ## 🌐 API
209
 
210
  ```bash
211
  curl http://localhost:7860/health
 
220
  curl http://localhost:7860/difficulty
221
  ```
222
 
223
+ The root route `/` serves the live **SENTINEL dashboard** on Hugging Face Spaces.
224
  Use `/api` for the JSON route index.
225
  Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard.
226
 
227
+ ### Live Stream Demo
228
 
229
  ```bash
230
  # Terminal 1
 
239
  open "http://localhost:7860/trust-dashboard?session_id=<session_id>"
240
  ```
241
 
242
+ ### Start an Episode
243
+
244
+ ```bash
245
+ curl -X POST http://localhost:7860/reset \
246
+ -H "Content-Type: application/json" \
247
+ -d '{"task_type":"task3","seed":42}'
248
+ ```
249
+
250
+ ### Step
251
+
252
+ ```bash
253
+ curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
254
+ -H "Content-Type: application/json" \
255
+ -d '{
256
+ "session_id":"<SESSION_ID>",
257
+ "task_type":"task3",
258
+ "action_type":"delegate",
259
+ "specialist_id":"S2",
260
+ "reasoning":"S2 has the best observed trust score"
261
+ }'
262
+ ```
263
+
264
+ ---
265
+
266
+ ## πŸ§ͺ Backend Walkthrough
267
 
268
  For terminal-first debugging and pitch clarity, run:
269
 
 
281
 
282
  The key scenario to understand is `task3, seed=42`: public slot `S0` is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift.
283
 
284
+ ### Adaptive Evaluation
285
 
286
  ```bash
287
  python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \
288
  --plot outputs/task3_adaptive_comparison.png
289
  ```
290
 
291
+ ---
292
+
293
+ ## πŸ–₯️ Live Dashboard
294
 
295
  The Space opens directly into **SENTINEL Trust Mission Control**, a judge-demo dashboard:
296
 
297
+ - Live task progress and score
298
+ - S0–S4 network theater with trust state per public slot
299
+ - Manual `delegate`, `verify`, `solve_independently`, and `skip` controls
300
+ - Heuristic auto-policy and one-click recommended move
301
  - API playground showing raw request and response payloads
302
+ - Profile reshuffle demo via seed swap
303
+ - Before-and-after story lane for judge presentation
304
+ - Hackathon readiness panel for what is done vs still pending
305
+ - Risk gate for high-stakes subtasks
306
+ - Flight recorder of step rewards and decisions
307
+ - Code-flow map from `reset()` to reward
308
+ - Hackathon theme coverage map
309
+ - Adversarial detection and poisoning counters
310
+ - Baseline proof table and chart for random, heuristic, and oracle-lite policies
 
 
 
 
 
 
 
 
 
 
 
 
 
 
311
 
312
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
313
 
314
+ ## πŸ“‚ Project Structure
315
 
316
  ```text
317
  sentinel-env/
318
+ β”œβ”€β”€ app.py # FastAPI server
319
+ β”œβ”€β”€ environment.py # Core SentinelEnv class
320
+ β”œβ”€β”€ models.py # Data models
321
+ β”œβ”€β”€ graders.py # Reward Engine v2
322
+ β”œβ”€β”€ specialists.py # FSM specialist profiles
323
+ β”œβ”€β”€ trust_ledger.py # Trust scoring
324
+ β”œβ”€β”€ task_graph.py # Task graph builder
325
+ β”œβ”€β”€ comms_bus.py # Communication bus
326
+ β”œβ”€β”€ scenarios.py # 120 scenarios
327
+ β”œβ”€β”€ inference.py # Heuristic inference baseline
328
+ β”œβ”€β”€ openenv.yaml # OpenEnv manifest
329
+ β”œβ”€β”€ Dockerfile # Docker build
330
+ β”œβ”€β”€ requirements.txt # Runtime dependencies
331
+ β”œβ”€β”€ training/
332
+ β”‚ β”œβ”€β”€ train.py # GRPO training script
333
+ β”‚ β”œβ”€β”€ evaluate.py # Baseline evaluator
334
+ β”‚ β”œβ”€β”€ plots.py # 12 chart generator
335
+ β”‚ β”œβ”€β”€ replay.py # Policy replay recorder
336
+ β”‚ └── colab_notebook.ipynb # βœ… Reproducible training notebook
337
+ β”œβ”€β”€ outputs/
338
+ β”‚ β”œβ”€β”€ charts/ # 12 training/evaluation charts
339
+ β”‚ β”œβ”€β”€ eval_pre.json # Pre-training baselines
340
+ β”‚ β”œβ”€β”€ eval_post.json # Post-training evaluation
341
+ β”‚ └── baseline_comparison.png
342
+ β”œβ”€β”€ scripts/
343
+ β”‚ └── backend_walkthrough.py
344
+ └── tests/
345
+ β”œβ”€β”€ test_environment.py
346
+ β”œβ”€β”€ test_graders.py
347
+ └── test_specialists.py
348
  ```
349
 
350
+ ---
351
+
352
+ ## ⚑ Local Setup
353
 
354
  ```bash
355
  python3 -m venv .venv
 
359
  pip install pytest
360
  ```
361
 
362
+ ### Run Checks
363
 
364
  ```bash
365
  python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py
 
370
  python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14
371
  ```
372
 
373
+ ### Run the Server
374
 
375
  ```bash
376
  uvicorn app:app --host 0.0.0.0 --port 7860
377
  ```
378
 
379
+ ### Validate with OpenEnv
380
 
381
  ```bash
382
  pip install openenv-core==0.2.3
383
  openenv validate . --json
384
  ```
385
 
386
+ ### Docker
387
 
388
  ```bash
389
  docker build -t sentinel-env .
390
  docker run -p 7860:7860 sentinel-env
391
  ```
392
 
393
+ ---
394
+
395
+ ## πŸ“Š Baselines
396
 
397
  `inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
398
 
 
407
  - `random`
408
  - `heuristic`
409
  - `oracle_lite`
410
+ - `trained`
411
 
412
  The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`.
413
 
414
+ ---
 
 
 
 
 
 
 
 
 
 
415
 
416
+ ## πŸš€ Hugging Face Deployment
417
 
418
  ```bash
419
  huggingface-cli login
 
433
  openenv validate . --json
434
  ```
435
 
436
+ ---
437
 
438
+ ## πŸ† Hackathon Alignment
439
 
440
+ | Theme | Coverage |
441
+ | --- | --- |
442
+ | Theme 1 | Multi-agent interaction, partial observability, adversarial specialist, trust calibration |
443
+ | Theme 2 | Long-horizon task graphs with delayed terminal reward and failure recovery |
444
+ | Theme 3.1 | Professional agent orchestration workflow with API-style actions |
445
+ | Theme 4 | Profile shuffle creates a self-resetting curriculum |
446
+ | Theme 5 | Targets a real AI systems failure: blind trust inside agent pipelines |
447
 
448
+ ---
449
 
450
+ ## πŸ“ Mini-Blog
451
 
452
+ A detailed mini-blog explaining what SENTINEL does and what we trained is published on Hugging Face:
 
 
 
 
453
 
454
+ πŸ‘‰ **[SENTINEL: Training AI to Trust Wisely in Multi-Agent Systems](https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely)**
455
 
456
+ ---
457
+
458
+ ## πŸ“š Additional References
459
+
460
+ - [Rollout Plan](docs/ROLL_OUT.md)
461
+ - [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md)
462
+ - [Visual System](docs/diagrams/VISUAL_SYSTEM.md)
463
+ - [Training Runbook](docs/TRAINING_RUNBOOK.md)
464
+
465
+ ---
466
+
467
+ ## πŸ“œ License
468
+
469
+ MIT