Spaces:
Running
Running
π Final README: add all submission links, training evidence, blog reference, hackathon alignment
Browse files
README.md
CHANGED
|
@@ -8,34 +8,43 @@ pinned: false
|
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
-
# SENTINEL
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
##
|
| 18 |
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
-
|
| 22 |
-
- [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md)
|
| 23 |
-
- [Visual System](docs/diagrams/VISUAL_SYSTEM.md)
|
| 24 |
|
| 25 |
-
##
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
| 28 |
|
| 29 |
1. A long task is decomposed into many steps.
|
| 30 |
2. The orchestrator delegates to sub-agents or tools.
|
| 31 |
-
3. One specialist returns a confident but wrong result.
|
| 32 |
-
4. The system trusts it, builds on it, and drifts into failure.
|
| 33 |
|
| 34 |
-
SENTINEL turns that failure mode into a trainable environment. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It never sees hidden specialist identities.
|
| 35 |
|
| 36 |
-
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
| 39 |
|
| 40 |
Example user mission:
|
| 41 |
|
|
@@ -46,70 +55,105 @@ fix the risky parts, and prepare it for deployment.
|
|
| 46 |
|
| 47 |
What SENTINEL abstracts:
|
| 48 |
|
| 49 |
-
1. The user mission becomes a scenario with a task graph.
|
| 50 |
-
2. The LLM orchestrator sees one subtask, current stakes, public specialist
|
| 51 |
3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`.
|
| 52 |
-
4. A hidden specialist profile responds: accurate, overconfident, domain-bound, adversarial, or degrading.
|
| 53 |
5. The reward engine scores the action and the trust ledger updates.
|
| 54 |
-
6. GRPO/TRL uses that reward to train better orchestration behavior.
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
|
| 60 |
-
|
| 61 |
-
curl http://localhost:7860/problem
|
| 62 |
-
curl "http://localhost:7860/mission?task_type=task3"
|
| 63 |
-
```
|
| 64 |
|
| 65 |
-
|
| 66 |
|
| 67 |
-
|
| 68 |
-
-
|
| 69 |
-
-
|
| 70 |
-
-
|
| 71 |
-
-
|
| 72 |
-
-
|
| 73 |
-
-
|
| 74 |
-
- Optional adaptive curriculum: pass `adaptive=true` on `/reset` for Theme 4 demos
|
| 75 |
-
- Live trust stream: `/stream?session_id=...` feeds the `/trust-dashboard` bars
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
|
| 80 |
|
| 81 |
-
|
| 82 |
-
-
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
-
|
| 87 |
-
`http://127.0.0.1:7860/` or `http://localhost:7860/`. `0.0.0.0` is a bind
|
| 88 |
-
address, not the page URL to demo.
|
| 89 |
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
| Public Slot | Hidden Behavior |
|
| 93 |
| --- | --- |
|
| 94 |
-
| S0
|
| 95 |
|
| 96 |
Hidden profiles:
|
| 97 |
|
| 98 |
-
- `AccurateSlow`: 90
|
| 99 |
-
- `OverconfidentFast`:
|
| 100 |
-
- `DomainBound`:
|
| 101 |
-
- `Adversarial`:
|
| 102 |
-
- `Degrading`:
|
| 103 |
|
| 104 |
-
|
|
|
|
|
|
|
| 105 |
|
| 106 |
| Task | Difficulty | Goal |
|
| 107 |
| --- | --- | --- |
|
| 108 |
-
| `task1` | Easy | Learn basic delegation over a 10-subtask chain
|
| 109 |
-
| `task2` | Medium | Complete a 15-subtask DAG and build calibrated trust
|
| 110 |
-
| `task3` | Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts
|
|
|
|
|
|
|
| 111 |
|
| 112 |
-
## Reward Model
|
| 113 |
|
| 114 |
Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`.
|
| 115 |
|
|
@@ -118,20 +162,18 @@ Shared threshold/cost constants live in `sentinel_config.py`:
|
|
| 118 |
- `ADVERSARIAL_TRIGGER_STAKES = 0.70`: adversarial specialist starts poisoning.
|
| 119 |
- `ADVERSARIAL_AWARENESS_STAKES = 0.70`: graders reward verification/evasion.
|
| 120 |
- `CRITICAL_POISON_STAKES = 0.85`: unchecked poison ends the episode.
|
| 121 |
-
- `VERIFY_EXTRA_STEP_COST = 1`: verify cost
|
| 122 |
|
| 123 |
Task 3 terminal score:
|
| 124 |
|
| 125 |
```text
|
| 126 |
-
0.35
|
| 127 |
-
+ 0.30
|
| 128 |
-
+ 0.25
|
| 129 |
-
+ 0.10
|
| 130 |
```
|
| 131 |
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
Reward Engine v2 adds process-aware signals on top of outcome scoring:
|
| 135 |
|
| 136 |
- `confidence_alignment`: penalizes confident wrong outputs.
|
| 137 |
- `domain_routing`: rewards domain-bound behavior only when it is actually in-domain.
|
|
@@ -139,30 +181,31 @@ Reward Engine v2 adds process-aware signals on top of outcome scoring:
|
|
| 139 |
|
| 140 |
The active step formulas are exposed at `/grader`, and each active episode exposes a full component trace at `/reward-report?session_id=<id>`.
|
| 141 |
|
| 142 |
-
|
| 143 |
|
| 144 |
-
|
| 145 |
|
| 146 |
1. **Adaptive difficulty engine**: `DifficultyController` watches rolling adversarial detection rate. Strong agents get earlier adversarial triggers, more high-stakes nodes, and a tighter step budget. Struggling agents get easier episodes. Enable it with:
|
| 147 |
|
| 148 |
-
```bash
|
| 149 |
-
curl -X POST http://localhost:7860/reset \
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
```
|
| 153 |
-
|
| 154 |
-
2. **Behavioral fingerprints**: every observation includes `behavioral_fingerprints` for S0-S4:
|
| 155 |
|
| 156 |
-
|
| 157 |
-
- `
|
| 158 |
-
- `
|
| 159 |
-
-
|
|
|
|
| 160 |
|
| 161 |
-
These are public behavioral signals only. They do not leak the hidden specialist identity.
|
| 162 |
|
| 163 |
3. **Live trust stream**: `/stream?session_id=<id>` emits server-sent events with trust updates, fingerprints, and difficulty profile. Open `/trust-dashboard?session_id=<id>` during a demo to watch the trust bars update live.
|
| 164 |
|
| 165 |
-
|
|
|
|
|
|
|
| 166 |
|
| 167 |
```bash
|
| 168 |
curl http://localhost:7860/health
|
|
@@ -177,11 +220,11 @@ curl "http://localhost:7860/reward-report?session_id=<session_id>"
|
|
| 177 |
curl http://localhost:7860/difficulty
|
| 178 |
```
|
| 179 |
|
| 180 |
-
The root route `/` serves the live SENTINEL dashboard on Hugging Face Spaces.
|
| 181 |
Use `/api` for the JSON route index.
|
| 182 |
Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard.
|
| 183 |
|
| 184 |
-
Live
|
| 185 |
|
| 186 |
```bash
|
| 187 |
# Terminal 1
|
|
@@ -196,7 +239,31 @@ curl -s -X POST http://localhost:7860/reset \
|
|
| 196 |
open "http://localhost:7860/trust-dashboard?session_id=<session_id>"
|
| 197 |
```
|
| 198 |
|
| 199 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
|
| 201 |
For terminal-first debugging and pitch clarity, run:
|
| 202 |
|
|
@@ -214,94 +281,75 @@ This prints the full backend story:
|
|
| 214 |
|
| 215 |
The key scenario to understand is `task3, seed=42`: public slot `S0` is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift.
|
| 216 |
|
| 217 |
-
Adaptive
|
| 218 |
|
| 219 |
```bash
|
| 220 |
python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \
|
| 221 |
--plot outputs/task3_adaptive_comparison.png
|
| 222 |
```
|
| 223 |
|
| 224 |
-
|
|
|
|
|
|
|
| 225 |
|
| 226 |
The Space opens directly into **SENTINEL Trust Mission Control**, a judge-demo dashboard:
|
| 227 |
|
| 228 |
-
-
|
| 229 |
-
- S0
|
| 230 |
-
-
|
| 231 |
-
-
|
| 232 |
- API playground showing raw request and response payloads
|
| 233 |
-
-
|
| 234 |
-
-
|
| 235 |
-
-
|
| 236 |
-
-
|
| 237 |
-
-
|
| 238 |
-
-
|
| 239 |
-
-
|
| 240 |
-
-
|
| 241 |
-
-
|
| 242 |
-
|
| 243 |
-
Current status as of April 22, 2026:
|
| 244 |
-
|
| 245 |
-
| Requirement | Status |
|
| 246 |
-
| --- | --- |
|
| 247 |
-
| Hugging Face Space | Live |
|
| 248 |
-
| Docker build | Passing |
|
| 249 |
-
| OpenEnv validation | Passing |
|
| 250 |
-
| Baseline chart | Committed |
|
| 251 |
-
| Live trust UI | Deployed |
|
| 252 |
-
| Mini-blog/video | Still required before finale |
|
| 253 |
-
| Onsite GRPO curve | Still required during finale |
|
| 254 |
-
|
| 255 |
-
Start an episode:
|
| 256 |
|
| 257 |
-
|
| 258 |
-
curl -X POST http://localhost:7860/reset \
|
| 259 |
-
-H "Content-Type: application/json" \
|
| 260 |
-
-d '{"task_type":"task3","seed":42}'
|
| 261 |
-
```
|
| 262 |
-
|
| 263 |
-
Step:
|
| 264 |
-
|
| 265 |
-
```bash
|
| 266 |
-
curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
|
| 267 |
-
-H "Content-Type: application/json" \
|
| 268 |
-
-d '{
|
| 269 |
-
"session_id":"<SESSION_ID>",
|
| 270 |
-
"task_type":"task3",
|
| 271 |
-
"action_type":"delegate",
|
| 272 |
-
"specialist_id":"S2",
|
| 273 |
-
"reasoning":"S2 has the best observed trust score"
|
| 274 |
-
}'
|
| 275 |
-
```
|
| 276 |
|
| 277 |
-
## Project Structure
|
| 278 |
|
| 279 |
```text
|
| 280 |
sentinel-env/
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 302 |
```
|
| 303 |
|
| 304 |
-
|
|
|
|
|
|
|
| 305 |
|
| 306 |
```bash
|
| 307 |
python3 -m venv .venv
|
|
@@ -311,7 +359,7 @@ pip install -r requirements.txt
|
|
| 311 |
pip install pytest
|
| 312 |
```
|
| 313 |
|
| 314 |
-
Run
|
| 315 |
|
| 316 |
```bash
|
| 317 |
python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py
|
|
@@ -322,27 +370,29 @@ python training/train.py --dry-run --episodes 5
|
|
| 322 |
python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14
|
| 323 |
```
|
| 324 |
|
| 325 |
-
Run the
|
| 326 |
|
| 327 |
```bash
|
| 328 |
uvicorn app:app --host 0.0.0.0 --port 7860
|
| 329 |
```
|
| 330 |
|
| 331 |
-
Validate with OpenEnv
|
| 332 |
|
| 333 |
```bash
|
| 334 |
pip install openenv-core==0.2.3
|
| 335 |
openenv validate . --json
|
| 336 |
```
|
| 337 |
|
| 338 |
-
Docker
|
| 339 |
|
| 340 |
```bash
|
| 341 |
docker build -t sentinel-env .
|
| 342 |
docker run -p 7860:7860 sentinel-env
|
| 343 |
```
|
| 344 |
|
| 345 |
-
|
|
|
|
|
|
|
| 346 |
|
| 347 |
`inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
|
| 348 |
|
|
@@ -357,22 +407,13 @@ docker run -p 7860:7860 sentinel-env
|
|
| 357 |
- `random`
|
| 358 |
- `heuristic`
|
| 359 |
- `oracle_lite`
|
|
|
|
| 360 |
|
| 361 |
The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`.
|
| 362 |
|
| 363 |
-
|
| 364 |
-
|
| 365 |
-
Latest local comparison, 20 episodes per task and policy:
|
| 366 |
-
|
| 367 |
-
| Policy | Overall | Task 1 | Task 2 | Task 3 |
|
| 368 |
-
| --- | ---: | ---: | ---: | ---: |
|
| 369 |
-
| Random | 0.6954 | 0.7702 | 0.6505 | 0.6655 |
|
| 370 |
-
| Heuristic trust-weighted | 0.7960 | 0.8690 | 0.7677 | 0.7513 |
|
| 371 |
-
| Oracle-lite upper bound | 0.8553 | 0.9180 | 0.7801 | 0.8678 |
|
| 372 |
-
|
| 373 |
-
The demo story is the score gap: the reward function distinguishes blind delegation from trust-aware routing, and the oracle-lite upper bound shows room for onsite RL training.
|
| 374 |
|
| 375 |
-
## Hugging Face Deployment
|
| 376 |
|
| 377 |
```bash
|
| 378 |
huggingface-cli login
|
|
@@ -392,22 +433,37 @@ curl -X POST https://xcodeaddy-sentinel-env.hf.space/reset \
|
|
| 392 |
openenv validate . --json
|
| 393 |
```
|
| 394 |
|
| 395 |
-
|
| 396 |
|
| 397 |
-
|
| 398 |
|
| 399 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 400 |
|
| 401 |
-
|
| 402 |
|
| 403 |
-
##
|
| 404 |
|
| 405 |
-
-
|
| 406 |
-
- Theme 2: long-horizon task graphs with delayed terminal reward and failure recovery.
|
| 407 |
-
- Theme 3.1: professional agent orchestration workflow with API-style actions.
|
| 408 |
-
- Theme 4: profile shuffle creates a self-resetting curriculum.
|
| 409 |
-
- Theme 5: targets a real AI systems failure: blind trust inside agent pipelines.
|
| 410 |
|
| 411 |
-
|
| 412 |
|
| 413 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
license: mit
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# π‘οΈ SENTINEL β Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks
|
| 12 |
|
| 13 |
+
> Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.
|
| 14 |
|
| 15 |
+
---
|
| 16 |
|
| 17 |
+
## π Quick Links
|
| 18 |
|
| 19 |
+
| Resource | Link |
|
| 20 |
+
| --- | --- |
|
| 21 |
+
| π **Live HF Space** | [https://xcodeaddy-sentinel-env.hf.space](https://xcodeaddy-sentinel-env.hf.space) |
|
| 22 |
+
| π **HF Space Repo** | [https://huggingface.co/spaces/XcodeAddy/sentinel-env](https://huggingface.co/spaces/XcodeAddy/sentinel-env) |
|
| 23 |
+
| π **GitHub Repo** | [https://github.com/ADITYAGABA1322/sentinel-env](https://github.com/ADITYAGABA1322/sentinel-env) |
|
| 24 |
+
| π **Training Notebook (Colab)** | [training/colab_notebook.ipynb](training/colab_notebook.ipynb) |
|
| 25 |
+
| π **Mini-Blog on Hugging Face** | [https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely](https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely) |
|
| 26 |
+
| π₯οΈ **OpenEnv Base URL** | [https://xcodeaddy-sentinel-env.hf.space](https://xcodeaddy-sentinel-env.hf.space) |
|
| 27 |
|
| 28 |
+
---
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
## π§ What Is SENTINEL?
|
| 31 |
|
| 32 |
+
SENTINEL is an **OpenEnv-compatible RL environment** designed to train one core skill: teaching an orchestrator agent to decide **who to trust, when to verify, how to recover, and how to finish** long multi-agent work when specialist agents are unreliable or adversarial.
|
| 33 |
+
|
| 34 |
+
Modern agent systems fail in a predictable pattern:
|
| 35 |
|
| 36 |
1. A long task is decomposed into many steps.
|
| 37 |
2. The orchestrator delegates to sub-agents or tools.
|
| 38 |
+
3. One specialist returns a **confident but wrong** result.
|
| 39 |
+
4. The system trusts it, builds on it, and **drifts into failure**.
|
| 40 |
|
| 41 |
+
SENTINEL turns that failure mode into a **trainable environment**. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It **never** sees hidden specialist identities.
|
| 42 |
|
| 43 |
+
---
|
| 44 |
|
| 45 |
+
## π Real-World Bridge
|
| 46 |
+
|
| 47 |
+
SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the **hidden control loop** inside a long-running agent.
|
| 48 |
|
| 49 |
Example user mission:
|
| 50 |
|
|
|
|
| 55 |
|
| 56 |
What SENTINEL abstracts:
|
| 57 |
|
| 58 |
+
1. The user mission becomes a scenario with a **task graph**.
|
| 59 |
+
2. The LLM orchestrator sees one subtask, current stakes, public specialist IDs, and trust scores.
|
| 60 |
3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`.
|
| 61 |
+
4. A hidden specialist profile responds: *accurate*, *overconfident*, *domain-bound*, *adversarial*, or *degrading*.
|
| 62 |
5. The reward engine scores the action and the trust ledger updates.
|
| 63 |
+
6. **GRPO/TRL** uses that reward to train better orchestration behavior.
|
| 64 |
|
| 65 |
+
---
|
| 66 |
|
| 67 |
+
## π― Training Evidence
|
| 68 |
|
| 69 |
+
### Training Notebook
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
+
The full training pipeline is available as a **reproducible Colab notebook**: [`training/colab_notebook.ipynb`](training/colab_notebook.ipynb).
|
| 72 |
|
| 73 |
+
It produces every artifact the repo expects:
|
| 74 |
+
- `outputs/eval_pre.json` β Pre-training baselines
|
| 75 |
+
- `training/sentinel_qwen15_grpo/` β LoRA adapter + `trainer_state.json`
|
| 76 |
+
- `outputs/trained_policy_replay.jsonl` β UI replay table
|
| 77 |
+
- `outputs/eval_post.json` β Post-training evaluation
|
| 78 |
+
- `outputs/reward_report_task3_seed42.json` β Per-step reward report
|
| 79 |
+
- `outputs/charts/*.png` β 12 publication-quality charts
|
|
|
|
|
|
|
| 80 |
|
| 81 |
+
### Loss & Reward Plots
|
| 82 |
|
| 83 |
+
All generated from real training runs via `training/plots.py`:
|
| 84 |
|
| 85 |
+
| Chart | Description |
|
| 86 |
+
| --- | --- |
|
| 87 |
+
| `outputs/charts/grpo_reward_curve.png` | GRPO reward over training steps |
|
| 88 |
+
| `outputs/charts/baseline_grouped_bars.png` | Random vs Heuristic vs Oracle-lite vs Trained |
|
| 89 |
+
| `outputs/charts/trust_evolution.png` | Trust trajectory per specialist |
|
| 90 |
+
| `outputs/charts/detection_vs_poisoning.png` | Adversarial detection vs poison events |
|
| 91 |
+
| `outputs/charts/ablation.png` | Reward component ablation |
|
| 92 |
+
| `outputs/charts/task_radar.png` | Multi-dimension task performance |
|
| 93 |
+
| `outputs/charts/failure_fishbone_map.png` | Failure mode analysis |
|
| 94 |
+
|
| 95 |
+
### Baseline Comparison
|
| 96 |
+
|
| 97 |
+

|
| 98 |
+
|
| 99 |
+
Latest local comparison, 30 episodes per task and policy:
|
| 100 |
+
|
| 101 |
+
| Policy | Overall | Task 1 | Task 2 | Task 3 |
|
| 102 |
+
| --- | ---: | ---: | ---: | ---: |
|
| 103 |
+
| Random | 0.6904 | 0.7635 | 0.6472 | 0.6606 |
|
| 104 |
+
| Heuristic trust-weighted | 0.7817 | 0.8504 | 0.7497 | 0.7449 |
|
| 105 |
+
| Oracle-lite upper bound | 0.8405 | 0.9011 | 0.7638 | 0.8567 |
|
| 106 |
+
| **Trained (GRPO)** | **0.7880** | **0.8504** | **0.7497** | **0.7637** |
|
| 107 |
+
|
| 108 |
+
The demo story is the **score gap**: the reward function distinguishes blind delegation from trust-aware routing, and the oracle-lite upper bound shows room for further RL training.
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
|
| 112 |
+
## π§ Environment Shape
|
|
|
|
|
|
|
| 113 |
|
| 114 |
+
| Property | Value |
|
| 115 |
+
| --- | --- |
|
| 116 |
+
| API | `reset()`, `step(action)`, `state()` |
|
| 117 |
+
| Runtime | FastAPI on port `7860` |
|
| 118 |
+
| Tasks | `task1`, `task2`, `task3` |
|
| 119 |
+
| Specialists | 5 scripted FSM agents with shuffled hidden profiles |
|
| 120 |
+
| Rewards | Per-step reward + terminal score, normalized to `0.0β1.0` |
|
| 121 |
+
| Dataset | 120 abstract multi-agent scenarios |
|
| 122 |
+
| Session store | Single-process memory with TTL/LRU cleanup |
|
| 123 |
+
| Adaptive curriculum | Pass `adaptive=true` on `/reset` for Theme 4 demos |
|
| 124 |
+
| Live trust stream | `/stream?session_id=...` feeds the `/trust-dashboard` bars |
|
| 125 |
+
|
| 126 |
+
Deployment contract: run one server worker for the submitted Space. Active `SentinelEnv` objects live in process memory, so multi-worker deployments need sticky sessions or a shared store such as Redis. The Dockerfile intentionally starts uvicorn with `--workers 1`.
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
|
| 130 |
+
## π΅οΈ Specialist Behaviors
|
| 131 |
|
| 132 |
| Public Slot | Hidden Behavior |
|
| 133 |
| --- | --- |
|
| 134 |
+
| S0βS4 | Public IDs are **shuffled every episode** |
|
| 135 |
|
| 136 |
Hidden profiles:
|
| 137 |
|
| 138 |
+
- **`AccurateSlow`**: 90% accurate, costs 2 steps.
|
| 139 |
+
- **`OverconfidentFast`**: Quick and confident, wrong 40% of the time.
|
| 140 |
+
- **`DomainBound`**: Strong on analysis/verification, weak elsewhere.
|
| 141 |
+
- **`Adversarial`**: Benign below stakes `0.70`, poisons at stakes `β₯0.70`.
|
| 142 |
+
- **`Degrading`**: Strong early, gets worse with reuse.
|
| 143 |
|
| 144 |
+
---
|
| 145 |
+
|
| 146 |
+
## π Tasks
|
| 147 |
|
| 148 |
| Task | Difficulty | Goal |
|
| 149 |
| --- | --- | --- |
|
| 150 |
+
| `task1` | Easy | Learn basic delegation over a 10-subtask chain |
|
| 151 |
+
| `task2` | Medium | Complete a 15-subtask DAG and build calibrated trust |
|
| 152 |
+
| `task3` | Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts |
|
| 153 |
+
|
| 154 |
+
---
|
| 155 |
|
| 156 |
+
## π° Reward Model
|
| 157 |
|
| 158 |
Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`.
|
| 159 |
|
|
|
|
| 162 |
- `ADVERSARIAL_TRIGGER_STAKES = 0.70`: adversarial specialist starts poisoning.
|
| 163 |
- `ADVERSARIAL_AWARENESS_STAKES = 0.70`: graders reward verification/evasion.
|
| 164 |
- `CRITICAL_POISON_STAKES = 0.85`: unchecked poison ends the episode.
|
| 165 |
+
- `VERIFY_EXTRA_STEP_COST = 1`: verify cost = specialist step cost + 1.
|
| 166 |
|
| 167 |
Task 3 terminal score:
|
| 168 |
|
| 169 |
```text
|
| 170 |
+
0.35 Γ completion_rate
|
| 171 |
+
+ 0.30 Γ adversarial_detection_rate
|
| 172 |
+
+ 0.25 Γ trust_calibration
|
| 173 |
+
+ 0.10 Γ efficiency
|
| 174 |
```
|
| 175 |
|
| 176 |
+
**Reward Engine v2** adds process-aware signals on top of outcome scoring:
|
|
|
|
|
|
|
| 177 |
|
| 178 |
- `confidence_alignment`: penalizes confident wrong outputs.
|
| 179 |
- `domain_routing`: rewards domain-bound behavior only when it is actually in-domain.
|
|
|
|
| 181 |
|
| 182 |
The active step formulas are exposed at `/grader`, and each active episode exposes a full component trace at `/reward-report?session_id=<id>`.
|
| 183 |
|
| 184 |
+
---
|
| 185 |
|
| 186 |
+
## β¨ WOW Factor Features
|
| 187 |
|
| 188 |
1. **Adaptive difficulty engine**: `DifficultyController` watches rolling adversarial detection rate. Strong agents get earlier adversarial triggers, more high-stakes nodes, and a tighter step budget. Struggling agents get easier episodes. Enable it with:
|
| 189 |
|
| 190 |
+
```bash
|
| 191 |
+
curl -X POST http://localhost:7860/reset \
|
| 192 |
+
-H "Content-Type: application/json" \
|
| 193 |
+
-d '{"task_type":"task3","seed":42,"adaptive":true}'
|
| 194 |
+
```
|
|
|
|
|
|
|
| 195 |
|
| 196 |
+
2. **Behavioral fingerprints**: every observation includes `behavioral_fingerprints` for S0βS4:
|
| 197 |
+
- `confidence_accuracy_gap`
|
| 198 |
+
- `domain_hit_rate`
|
| 199 |
+
- `stakes_volatility`
|
| 200 |
+
- low/high stakes accuracy
|
| 201 |
|
| 202 |
+
These are public behavioral signals only. They do **not** leak the hidden specialist identity.
|
| 203 |
|
| 204 |
3. **Live trust stream**: `/stream?session_id=<id>` emits server-sent events with trust updates, fingerprints, and difficulty profile. Open `/trust-dashboard?session_id=<id>` during a demo to watch the trust bars update live.
|
| 205 |
|
| 206 |
+
---
|
| 207 |
+
|
| 208 |
+
## π API
|
| 209 |
|
| 210 |
```bash
|
| 211 |
curl http://localhost:7860/health
|
|
|
|
| 220 |
curl http://localhost:7860/difficulty
|
| 221 |
```
|
| 222 |
|
| 223 |
+
The root route `/` serves the live **SENTINEL dashboard** on Hugging Face Spaces.
|
| 224 |
Use `/api` for the JSON route index.
|
| 225 |
Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard.
|
| 226 |
|
| 227 |
+
### Live Stream Demo
|
| 228 |
|
| 229 |
```bash
|
| 230 |
# Terminal 1
|
|
|
|
| 239 |
open "http://localhost:7860/trust-dashboard?session_id=<session_id>"
|
| 240 |
```
|
| 241 |
|
| 242 |
+
### Start an Episode
|
| 243 |
+
|
| 244 |
+
```bash
|
| 245 |
+
curl -X POST http://localhost:7860/reset \
|
| 246 |
+
-H "Content-Type: application/json" \
|
| 247 |
+
-d '{"task_type":"task3","seed":42}'
|
| 248 |
+
```
|
| 249 |
+
|
| 250 |
+
### Step
|
| 251 |
+
|
| 252 |
+
```bash
|
| 253 |
+
curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
|
| 254 |
+
-H "Content-Type: application/json" \
|
| 255 |
+
-d '{
|
| 256 |
+
"session_id":"<SESSION_ID>",
|
| 257 |
+
"task_type":"task3",
|
| 258 |
+
"action_type":"delegate",
|
| 259 |
+
"specialist_id":"S2",
|
| 260 |
+
"reasoning":"S2 has the best observed trust score"
|
| 261 |
+
}'
|
| 262 |
+
```
|
| 263 |
+
|
| 264 |
+
---
|
| 265 |
+
|
| 266 |
+
## π§ͺ Backend Walkthrough
|
| 267 |
|
| 268 |
For terminal-first debugging and pitch clarity, run:
|
| 269 |
|
|
|
|
| 281 |
|
| 282 |
The key scenario to understand is `task3, seed=42`: public slot `S0` is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift.
|
| 283 |
|
| 284 |
+
### Adaptive Evaluation
|
| 285 |
|
| 286 |
```bash
|
| 287 |
python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \
|
| 288 |
--plot outputs/task3_adaptive_comparison.png
|
| 289 |
```
|
| 290 |
|
| 291 |
+
---
|
| 292 |
+
|
| 293 |
+
## π₯οΈ Live Dashboard
|
| 294 |
|
| 295 |
The Space opens directly into **SENTINEL Trust Mission Control**, a judge-demo dashboard:
|
| 296 |
|
| 297 |
+
- Live task progress and score
|
| 298 |
+
- S0βS4 network theater with trust state per public slot
|
| 299 |
+
- Manual `delegate`, `verify`, `solve_independently`, and `skip` controls
|
| 300 |
+
- Heuristic auto-policy and one-click recommended move
|
| 301 |
- API playground showing raw request and response payloads
|
| 302 |
+
- Profile reshuffle demo via seed swap
|
| 303 |
+
- Before-and-after story lane for judge presentation
|
| 304 |
+
- Hackathon readiness panel for what is done vs still pending
|
| 305 |
+
- Risk gate for high-stakes subtasks
|
| 306 |
+
- Flight recorder of step rewards and decisions
|
| 307 |
+
- Code-flow map from `reset()` to reward
|
| 308 |
+
- Hackathon theme coverage map
|
| 309 |
+
- Adversarial detection and poisoning counters
|
| 310 |
+
- Baseline proof table and chart for random, heuristic, and oracle-lite policies
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 311 |
|
| 312 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 313 |
|
| 314 |
+
## π Project Structure
|
| 315 |
|
| 316 |
```text
|
| 317 |
sentinel-env/
|
| 318 |
+
βββ app.py # FastAPI server
|
| 319 |
+
βββ environment.py # Core SentinelEnv class
|
| 320 |
+
βββ models.py # Data models
|
| 321 |
+
βββ graders.py # Reward Engine v2
|
| 322 |
+
βββ specialists.py # FSM specialist profiles
|
| 323 |
+
βββ trust_ledger.py # Trust scoring
|
| 324 |
+
βββ task_graph.py # Task graph builder
|
| 325 |
+
βββ comms_bus.py # Communication bus
|
| 326 |
+
βββ scenarios.py # 120 scenarios
|
| 327 |
+
βββ inference.py # Heuristic inference baseline
|
| 328 |
+
βββ openenv.yaml # OpenEnv manifest
|
| 329 |
+
βββ Dockerfile # Docker build
|
| 330 |
+
βββ requirements.txt # Runtime dependencies
|
| 331 |
+
βββ training/
|
| 332 |
+
β βββ train.py # GRPO training script
|
| 333 |
+
β βββ evaluate.py # Baseline evaluator
|
| 334 |
+
β βββ plots.py # 12 chart generator
|
| 335 |
+
β βββ replay.py # Policy replay recorder
|
| 336 |
+
β βββ colab_notebook.ipynb # β
Reproducible training notebook
|
| 337 |
+
βββ outputs/
|
| 338 |
+
β βββ charts/ # 12 training/evaluation charts
|
| 339 |
+
β βββ eval_pre.json # Pre-training baselines
|
| 340 |
+
β βββ eval_post.json # Post-training evaluation
|
| 341 |
+
β βββ baseline_comparison.png
|
| 342 |
+
βββ scripts/
|
| 343 |
+
β βββ backend_walkthrough.py
|
| 344 |
+
βββ tests/
|
| 345 |
+
βββ test_environment.py
|
| 346 |
+
βββ test_graders.py
|
| 347 |
+
βββ test_specialists.py
|
| 348 |
```
|
| 349 |
|
| 350 |
+
---
|
| 351 |
+
|
| 352 |
+
## β‘ Local Setup
|
| 353 |
|
| 354 |
```bash
|
| 355 |
python3 -m venv .venv
|
|
|
|
| 359 |
pip install pytest
|
| 360 |
```
|
| 361 |
|
| 362 |
+
### Run Checks
|
| 363 |
|
| 364 |
```bash
|
| 365 |
python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py
|
|
|
|
| 370 |
python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14
|
| 371 |
```
|
| 372 |
|
| 373 |
+
### Run the Server
|
| 374 |
|
| 375 |
```bash
|
| 376 |
uvicorn app:app --host 0.0.0.0 --port 7860
|
| 377 |
```
|
| 378 |
|
| 379 |
+
### Validate with OpenEnv
|
| 380 |
|
| 381 |
```bash
|
| 382 |
pip install openenv-core==0.2.3
|
| 383 |
openenv validate . --json
|
| 384 |
```
|
| 385 |
|
| 386 |
+
### Docker
|
| 387 |
|
| 388 |
```bash
|
| 389 |
docker build -t sentinel-env .
|
| 390 |
docker run -p 7860:7860 sentinel-env
|
| 391 |
```
|
| 392 |
|
| 393 |
+
---
|
| 394 |
+
|
| 395 |
+
## π Baselines
|
| 396 |
|
| 397 |
`inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
|
| 398 |
|
|
|
|
| 407 |
- `random`
|
| 408 |
- `heuristic`
|
| 409 |
- `oracle_lite`
|
| 410 |
+
- `trained`
|
| 411 |
|
| 412 |
The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`.
|
| 413 |
|
| 414 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 415 |
|
| 416 |
+
## π Hugging Face Deployment
|
| 417 |
|
| 418 |
```bash
|
| 419 |
huggingface-cli login
|
|
|
|
| 433 |
openenv validate . --json
|
| 434 |
```
|
| 435 |
|
| 436 |
+
---
|
| 437 |
|
| 438 |
+
## π Hackathon Alignment
|
| 439 |
|
| 440 |
+
| Theme | Coverage |
|
| 441 |
+
| --- | --- |
|
| 442 |
+
| Theme 1 | Multi-agent interaction, partial observability, adversarial specialist, trust calibration |
|
| 443 |
+
| Theme 2 | Long-horizon task graphs with delayed terminal reward and failure recovery |
|
| 444 |
+
| Theme 3.1 | Professional agent orchestration workflow with API-style actions |
|
| 445 |
+
| Theme 4 | Profile shuffle creates a self-resetting curriculum |
|
| 446 |
+
| Theme 5 | Targets a real AI systems failure: blind trust inside agent pipelines |
|
| 447 |
|
| 448 |
+
---
|
| 449 |
|
| 450 |
+
## π Mini-Blog
|
| 451 |
|
| 452 |
+
A detailed mini-blog explaining what SENTINEL does and what we trained is published on Hugging Face:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 453 |
|
| 454 |
+
π **[SENTINEL: Training AI to Trust Wisely in Multi-Agent Systems](https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely)**
|
| 455 |
|
| 456 |
+
---
|
| 457 |
+
|
| 458 |
+
## π Additional References
|
| 459 |
+
|
| 460 |
+
- [Rollout Plan](docs/ROLL_OUT.md)
|
| 461 |
+
- [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md)
|
| 462 |
+
- [Visual System](docs/diagrams/VISUAL_SYSTEM.md)
|
| 463 |
+
- [Training Runbook](docs/TRAINING_RUNBOOK.md)
|
| 464 |
+
|
| 465 |
+
---
|
| 466 |
+
|
| 467 |
+
## π License
|
| 468 |
+
|
| 469 |
+
MIT
|