Spaces:
Running
Running
| title: SENTINEL | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| license: mit | |
| # π‘οΈ SENTINEL β Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks | |
| > Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight. | |
| --- | |
| ## π Quick Links | |
| | Resource | Link | | |
| | --- | --- | | |
| | π **Live HF Space** | [https://xcodeaddy-sentinel-env.hf.space](https://xcodeaddy-sentinel-env.hf.space) | | |
| | π **HF Space Repo** | [https://huggingface.co/spaces/XcodeAddy/sentinel-env](https://huggingface.co/spaces/XcodeAddy/sentinel-env) | | |
| | π **GitHub Repo** | [https://github.com/ADITYAGABA1322/sentinel-env](https://github.com/ADITYAGABA1322/sentinel-env) | | |
| | π **Training Notebook (Colab)** | [training/colab_notebook.ipynb](training/colab_notebook.ipynb) | | |
| | π **Mini-Blog on Hugging Face** | [https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely](https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely) | | |
| | π₯οΈ **OpenEnv Base URL** | [https://xcodeaddy-sentinel-env.hf.space](https://xcodeaddy-sentinel-env.hf.space) | | |
| --- | |
| ## π§ What Is SENTINEL? | |
| SENTINEL is an **OpenEnv-compatible RL environment** designed to train one core skill: teaching an orchestrator agent to decide **who to trust, when to verify, how to recover, and how to finish** long multi-agent work when specialist agents are unreliable or adversarial. | |
| Modern agent systems fail in a predictable pattern: | |
| 1. A long task is decomposed into many steps. | |
| 2. The orchestrator delegates to sub-agents or tools. | |
| 3. One specialist returns a **confident but wrong** result. | |
| 4. The system trusts it, builds on it, and **drifts into failure**. | |
| SENTINEL turns that failure mode into a **trainable environment**. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It **never** sees hidden specialist identities. | |
| --- | |
| ## π Real-World Bridge | |
| SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the **hidden control loop** inside a long-running agent. | |
| Example user mission: | |
| ```text | |
| Refactor this project, inspect failures, route work to code/test/security agents, | |
| fix the risky parts, and prepare it for deployment. | |
| ``` | |
| What SENTINEL abstracts: | |
| 1. The user mission becomes a scenario with a **task graph**. | |
| 2. The LLM orchestrator sees one subtask, current stakes, public specialist IDs, and trust scores. | |
| 3. The model emits one control action: `delegate`, `verify`, `solve_independently`, or `skip`. | |
| 4. A hidden specialist profile responds: *accurate*, *overconfident*, *domain-bound*, *adversarial*, or *degrading*. | |
| 5. The reward engine scores the action and the trust ledger updates. | |
| 6. **GRPO/TRL** uses that reward to train better orchestration behavior. | |
| --- | |
| ## π― Training Evidence | |
| ### Training Notebook | |
| The full training pipeline is available as a **reproducible Colab notebook**: [`training/colab_notebook.ipynb`](training/colab_notebook.ipynb). | |
| It produces every artifact the repo expects: | |
| - `outputs/eval_pre.json` β Pre-training baselines | |
| - `training/sentinel_qwen15_grpo/` β LoRA adapter + `trainer_state.json` | |
| - `outputs/trained_policy_replay.jsonl` β UI replay table | |
| - `outputs/eval_post.json` β Post-training evaluation | |
| - `outputs/reward_report_task3_seed42.json` β Per-step reward report | |
| - `outputs/charts/*.png` β 12 publication-quality charts | |
| ### Loss & Reward Plots | |
| All generated from real training runs via `training/plots.py`: | |
| | Chart | Description | | |
| | --- | --- | | |
| | `outputs/charts/grpo_reward_curve.png` | GRPO reward over training steps | | |
| | `outputs/charts/baseline_grouped_bars.png` | Random vs Heuristic vs Oracle-lite vs Trained | | |
| | `outputs/charts/trust_evolution.png` | Trust trajectory per specialist | | |
| | `outputs/charts/detection_vs_poisoning.png` | Adversarial detection vs poison events | | |
| | `outputs/charts/ablation.png` | Reward component ablation | | |
| | `outputs/charts/task_radar.png` | Multi-dimension task performance | | |
| | `outputs/charts/failure_fishbone_map.png` | Failure mode analysis | | |
| ### Baseline Comparison | |
|  | |
| Latest local comparison, 30 episodes per task and policy: | |
| | Policy | Overall | Task 1 | Task 2 | Task 3 | | |
| | --- | ---: | ---: | ---: | ---: | | |
| | Random | 0.6904 | 0.7635 | 0.6472 | 0.6606 | | |
| | Heuristic trust-weighted | 0.7817 | 0.8504 | 0.7497 | 0.7449 | | |
| | Oracle-lite upper bound | 0.8405 | 0.9011 | 0.7638 | 0.8567 | | |
| | **Trained (GRPO)** | **0.7880** | **0.8504** | **0.7497** | **0.7637** | | |
| The demo story is the **score gap**: the reward function distinguishes blind delegation from trust-aware routing, and the oracle-lite upper bound shows room for further RL training. | |
| --- | |
| ## π§ Environment Shape | |
| | Property | Value | | |
| | --- | --- | | |
| | API | `reset()`, `step(action)`, `state()` | | |
| | Runtime | FastAPI on port `7860` | | |
| | Tasks | `task1`, `task2`, `task3` | | |
| | Specialists | 5 scripted FSM agents with shuffled hidden profiles | | |
| | Rewards | Per-step reward + terminal score, normalized to `0.0β1.0` | | |
| | Dataset | 120 abstract multi-agent scenarios | | |
| | Session store | Single-process memory with TTL/LRU cleanup | | |
| | Adaptive curriculum | Pass `adaptive=true` on `/reset` for Theme 4 demos | | |
| | Live trust stream | `/stream?session_id=...` feeds the `/trust-dashboard` bars | | |
| Deployment contract: run one server worker for the submitted Space. Active `SentinelEnv` objects live in process memory, so multi-worker deployments need sticky sessions or a shared store such as Redis. The Dockerfile intentionally starts uvicorn with `--workers 1`. | |
| --- | |
| ## π΅οΈ Specialist Behaviors | |
| | Public Slot | Hidden Behavior | | |
| | --- | --- | | |
| | S0βS4 | Public IDs are **shuffled every episode** | | |
| Hidden profiles: | |
| - **`AccurateSlow`**: 90% accurate, costs 2 steps. | |
| - **`OverconfidentFast`**: Quick and confident, wrong 40% of the time. | |
| - **`DomainBound`**: Strong on analysis/verification, weak elsewhere. | |
| - **`Adversarial`**: Benign below stakes `0.70`, poisons at stakes `β₯0.70`. | |
| - **`Degrading`**: Strong early, gets worse with reuse. | |
| --- | |
| ## π Tasks | |
| | Task | Difficulty | Goal | | |
| | --- | --- | --- | | |
| | `task1` | Easy | Learn basic delegation over a 10-subtask chain | | |
| | `task2` | Medium | Complete a 15-subtask DAG and build calibrated trust | | |
| | `task3` | Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts | | |
| --- | |
| ## π° Reward Model | |
| Rewards are deterministic and boundary-exclusive per step: `(0.01, 0.99)`. | |
| Shared threshold/cost constants live in `sentinel_config.py`: | |
| - `ADVERSARIAL_TRIGGER_STAKES = 0.70`: adversarial specialist starts poisoning. | |
| - `ADVERSARIAL_AWARENESS_STAKES = 0.70`: graders reward verification/evasion. | |
| - `CRITICAL_POISON_STAKES = 0.85`: unchecked poison ends the episode. | |
| - `VERIFY_EXTRA_STEP_COST = 1`: verify cost = specialist step cost + 1. | |
| Task 3 terminal score: | |
| ```text | |
| 0.35 Γ completion_rate | |
| + 0.30 Γ adversarial_detection_rate | |
| + 0.25 Γ trust_calibration | |
| + 0.10 Γ efficiency | |
| ``` | |
| **Reward Engine v2** adds process-aware signals on top of outcome scoring: | |
| - `confidence_alignment`: penalizes confident wrong outputs. | |
| - `domain_routing`: rewards domain-bound behavior only when it is actually in-domain. | |
| - `verification_quality`: rewards verification when it catches real high-stakes risk, and discourages blind verification everywhere. | |
| The active step formulas are exposed at `/grader`, and each active episode exposes a full component trace at `/reward-report?session_id=<id>`. | |
| --- | |
| ## β¨ WOW Factor Features | |
| 1. **Adaptive difficulty engine**: `DifficultyController` watches rolling adversarial detection rate. Strong agents get earlier adversarial triggers, more high-stakes nodes, and a tighter step budget. Struggling agents get easier episodes. Enable it with: | |
| ```bash | |
| curl -X POST http://localhost:7860/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"task_type":"task3","seed":42,"adaptive":true}' | |
| ``` | |
| 2. **Behavioral fingerprints**: every observation includes `behavioral_fingerprints` for S0βS4: | |
| - `confidence_accuracy_gap` | |
| - `domain_hit_rate` | |
| - `stakes_volatility` | |
| - low/high stakes accuracy | |
| These are public behavioral signals only. They do **not** leak the hidden specialist identity. | |
| 3. **Live trust stream**: `/stream?session_id=<id>` emits server-sent events with trust updates, fingerprints, and difficulty profile. Open `/trust-dashboard?session_id=<id>` during a demo to watch the trust bars update live. | |
| --- | |
| ## π API | |
| ```bash | |
| curl http://localhost:7860/health | |
| curl http://localhost:7860/ | |
| curl http://localhost:7860/api | |
| curl http://localhost:7860/problem | |
| curl "http://localhost:7860/mission?task_type=task3" | |
| curl http://localhost:7860/metadata | |
| curl http://localhost:7860/tasks | |
| curl http://localhost:7860/schema | |
| curl "http://localhost:7860/reward-report?session_id=<session_id>" | |
| curl http://localhost:7860/difficulty | |
| ``` | |
| The root route `/` serves the live **SENTINEL dashboard** on Hugging Face Spaces. | |
| Use `/api` for the JSON route index. | |
| Use `/assets/baseline_comparison.png` for the committed baseline chart used in the dashboard. | |
| ### Live Stream Demo | |
| ```bash | |
| # Terminal 1 | |
| uvicorn app:app --host 0.0.0.0 --port 7860 | |
| # Terminal 2: create a session and copy session_id | |
| curl -s -X POST http://localhost:7860/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"task_type":"task3","seed":42,"adaptive":true}' | python -m json.tool | |
| # Browser | |
| open "http://localhost:7860/trust-dashboard?session_id=<session_id>" | |
| ``` | |
| ### Start an Episode | |
| ```bash | |
| curl -X POST http://localhost:7860/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"task_type":"task3","seed":42}' | |
| ``` | |
| ### Step | |
| ```bash | |
| curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "session_id":"<SESSION_ID>", | |
| "task_type":"task3", | |
| "action_type":"delegate", | |
| "specialist_id":"S2", | |
| "reasoning":"S2 has the best observed trust score" | |
| }' | |
| ``` | |
| --- | |
| ## π§ͺ Backend Walkthrough | |
| For terminal-first debugging and pitch clarity, run: | |
| ```bash | |
| python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare | |
| ``` | |
| This prints the full backend story: | |
| - the compact `/reset` JSON the orchestrator sees | |
| - the exact LLM orchestrator prompt used by the training harness | |
| - the hidden shuffled profile for builders only | |
| - each action, reward, score, trust update, detection, and poisoning count | |
| - a before/after comparison of blind trust vs trust-aware routing vs oracle-lite upper bound | |
| The key scenario to understand is `task3, seed=42`: public slot `S0` is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift. | |
| ### Adaptive Evaluation | |
| ```bash | |
| python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \ | |
| --plot outputs/task3_adaptive_comparison.png | |
| ``` | |
| --- | |
| ## π₯οΈ Live Dashboard | |
| The Space opens directly into **SENTINEL Trust Mission Control**, a judge-demo dashboard: | |
| - Live task progress and score | |
| - S0βS4 network theater with trust state per public slot | |
| - Manual `delegate`, `verify`, `solve_independently`, and `skip` controls | |
| - Heuristic auto-policy and one-click recommended move | |
| - API playground showing raw request and response payloads | |
| - Profile reshuffle demo via seed swap | |
| - Before-and-after story lane for judge presentation | |
| - Hackathon readiness panel for what is done vs still pending | |
| - Risk gate for high-stakes subtasks | |
| - Flight recorder of step rewards and decisions | |
| - Code-flow map from `reset()` to reward | |
| - Hackathon theme coverage map | |
| - Adversarial detection and poisoning counters | |
| - Baseline proof table and chart for random, heuristic, and oracle-lite policies | |
| --- | |
| ## π Project Structure | |
| ```text | |
| sentinel-env/ | |
| βββ app.py # FastAPI server | |
| βββ environment.py # Core SentinelEnv class | |
| βββ models.py # Data models | |
| βββ graders.py # Reward Engine v2 | |
| βββ specialists.py # FSM specialist profiles | |
| βββ trust_ledger.py # Trust scoring | |
| βββ task_graph.py # Task graph builder | |
| βββ comms_bus.py # Communication bus | |
| βββ scenarios.py # 120 scenarios | |
| βββ inference.py # Heuristic inference baseline | |
| βββ openenv.yaml # OpenEnv manifest | |
| βββ Dockerfile # Docker build | |
| βββ requirements.txt # Runtime dependencies | |
| βββ training/ | |
| β βββ train.py # GRPO training script | |
| β βββ evaluate.py # Baseline evaluator | |
| β βββ plots.py # 12 chart generator | |
| β βββ replay.py # Policy replay recorder | |
| β βββ colab_notebook.ipynb # β Reproducible training notebook | |
| βββ outputs/ | |
| β βββ charts/ # 12 training/evaluation charts | |
| β βββ eval_pre.json # Pre-training baselines | |
| β βββ eval_post.json # Post-training evaluation | |
| β βββ baseline_comparison.png | |
| βββ scripts/ | |
| β βββ backend_walkthrough.py | |
| βββ tests/ | |
| βββ test_environment.py | |
| βββ test_graders.py | |
| βββ test_specialists.py | |
| ``` | |
| --- | |
| ## β‘ Local Setup | |
| ```bash | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| python -m pip install --upgrade pip | |
| pip install -r requirements.txt | |
| pip install pytest | |
| ``` | |
| ### Run Checks | |
| ```bash | |
| python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py | |
| python -m pytest -q | |
| python inference.py | |
| python training/evaluate.py --episodes 20 --task all --plot outputs/baseline_comparison.png | |
| python training/train.py --dry-run --episodes 5 | |
| python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14 | |
| ``` | |
| ### Run the Server | |
| ```bash | |
| uvicorn app:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ### Validate with OpenEnv | |
| ```bash | |
| pip install openenv-core==0.2.3 | |
| openenv validate . --json | |
| ``` | |
| ### Docker | |
| ```bash | |
| docker build -t sentinel-env . | |
| docker run -p 7860:7860 sentinel-env | |
| ``` | |
| --- | |
| ## π Baselines | |
| `inference.py` runs 30 deterministic heuristic episodes and emits only strict hackathon logs: | |
| ```text | |
| [START] task=SCN-TASK3-001 env=sentinel-env model=heuristic-baseline | |
| [STEP] step=1 action=delegate:S0 reward=0.99 done=false error=null | |
| [END] success=true steps=20 score=0.812 rewards=... | |
| ``` | |
| `training/evaluate.py` compares: | |
| - `random` | |
| - `heuristic` | |
| - `oracle_lite` | |
| - `trained` | |
| The evaluator writes `outputs/evaluation_results.json` and `outputs/baseline_comparison.png`. | |
| --- | |
| ## π Hugging Face Deployment | |
| ```bash | |
| huggingface-cli login | |
| huggingface-cli repo create sentinel-env --type space --space-sdk docker --private false | |
| git remote add hf https://huggingface.co/spaces/XcodeAddy/sentinel-env | |
| git push hf main | |
| ``` | |
| After the Space builds: | |
| ```bash | |
| curl https://xcodeaddy-sentinel-env.hf.space/health | |
| curl https://xcodeaddy-sentinel-env.hf.space/ | |
| curl -X POST https://xcodeaddy-sentinel-env.hf.space/reset \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"task_type":"task3","seed":42}' | |
| openenv validate . --json | |
| ``` | |
| --- | |
| ## π Hackathon Alignment | |
| | Theme | Coverage | | |
| | --- | --- | | |
| | Theme 1 | Multi-agent interaction, partial observability, adversarial specialist, trust calibration | | |
| | Theme 2 | Long-horizon task graphs with delayed terminal reward and failure recovery | | |
| | Theme 3.1 | Professional agent orchestration workflow with API-style actions | | |
| | Theme 4 | Profile shuffle creates a self-resetting curriculum | | |
| | Theme 5 | Targets a real AI systems failure: blind trust inside agent pipelines | | |
| --- | |
| ## π Mini-Blog | |
| A detailed mini-blog explaining what SENTINEL does and what we trained is published on Hugging Face: | |
| π **[SENTINEL: Training AI to Trust Wisely in Multi-Agent Systems](https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely)** | |
| --- | |
| ## π Additional References | |
| - [Rollout Plan](docs/ROLL_OUT.md) | |
| - [Narrative Lock](docs/presentation/NARRATIVE_LOCK.md) | |
| - [Visual System](docs/diagrams/VISUAL_SYSTEM.md) | |
| - [Training Runbook](docs/TRAINING_RUNBOOK.md) | |
| --- | |
| ## π License | |
| MIT | |