Spaces:
Running
title: SENTINEL
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
π‘οΈ SENTINEL β Self-Evolving Network for Training Intelligent Agents Under Adversarial Long-Horizon Tasks
Agents fail because they trust blindly. SENTINEL trains skepticism, recovery, and oversight.
π Quick Links
| Resource | Link |
|---|---|
| π Live HF Space | https://xcodeaddy-sentinel-env.hf.space |
| π HF Space Repo | https://huggingface.co/spaces/XcodeAddy/sentinel-env |
| π GitHub Repo | https://github.com/ADITYAGABA1322/sentinel-env |
| π Training Notebook (Colab) | training/colab_notebook.ipynb |
| π Mini-Blog on Hugging Face | https://huggingface.co/blog/XcodeAddy/sentinel-training-ai-to-trust-wisely |
| π₯οΈ OpenEnv Base URL | https://xcodeaddy-sentinel-env.hf.space |
π§ What Is SENTINEL?
SENTINEL is an OpenEnv-compatible RL environment designed to train one core skill: teaching an orchestrator agent to decide who to trust, when to verify, how to recover, and how to finish long multi-agent work when specialist agents are unreliable or adversarial.
Modern agent systems fail in a predictable pattern:
- A long task is decomposed into many steps.
- The orchestrator delegates to sub-agents or tools.
- One specialist returns a confident but wrong result.
- The system trusts it, builds on it, and drifts into failure.
SENTINEL turns that failure mode into a trainable environment. The model only sees behavior: returned outcomes, confidence, stakes, history, and trust scores. It never sees hidden specialist identities.
π Real-World Bridge
SENTINEL is not a normal chatbot that answers one prompt. It is the training ground for the hidden control loop inside a long-running agent.
Example user mission:
Refactor this project, inspect failures, route work to code/test/security agents,
fix the risky parts, and prepare it for deployment.
What SENTINEL abstracts:
- The user mission becomes a scenario with a task graph.
- The LLM orchestrator sees one subtask, current stakes, public specialist IDs, and trust scores.
- The model emits one control action:
delegate,verify,solve_independently, orskip. - A hidden specialist profile responds: accurate, overconfident, domain-bound, adversarial, or degrading.
- The reward engine scores the action and the trust ledger updates.
- GRPO/TRL uses that reward to train better orchestration behavior.
π― Training Evidence
Training Notebook
The full training pipeline is available as a reproducible Colab notebook: training/colab_notebook.ipynb.
It produces every artifact the repo expects:
outputs/eval_pre.jsonβ Pre-training baselinestraining/sentinel_qwen15_grpo/β LoRA adapter +trainer_state.jsonoutputs/trained_policy_replay.jsonlβ UI replay tableoutputs/eval_post.jsonβ Post-training evaluationoutputs/reward_report_task3_seed42.jsonβ Per-step reward reportoutputs/charts/*.pngβ 12 publication-quality charts
Loss & Reward Plots
All generated from real training runs via training/plots.py:
| Chart | Description |
|---|---|
outputs/charts/grpo_reward_curve.png |
GRPO reward over training steps |
outputs/charts/baseline_grouped_bars.png |
Random vs Heuristic vs Oracle-lite vs Trained |
outputs/charts/trust_evolution.png |
Trust trajectory per specialist |
outputs/charts/detection_vs_poisoning.png |
Adversarial detection vs poison events |
outputs/charts/ablation.png |
Reward component ablation |
outputs/charts/task_radar.png |
Multi-dimension task performance |
outputs/charts/failure_fishbone_map.png |
Failure mode analysis |
Baseline Comparison
Latest local comparison, 30 episodes per task and policy:
| Policy | Overall | Task 1 | Task 2 | Task 3 |
|---|---|---|---|---|
| Random | 0.6904 | 0.7635 | 0.6472 | 0.6606 |
| Heuristic trust-weighted | 0.7817 | 0.8504 | 0.7497 | 0.7449 |
| Oracle-lite upper bound | 0.8405 | 0.9011 | 0.7638 | 0.8567 |
| Trained (GRPO) | 0.7880 | 0.8504 | 0.7497 | 0.7637 |
The demo story is the score gap: the reward function distinguishes blind delegation from trust-aware routing, and the oracle-lite upper bound shows room for further RL training.
π§ Environment Shape
| Property | Value |
|---|---|
| API | reset(), step(action), state() |
| Runtime | FastAPI on port 7860 |
| Tasks | task1, task2, task3 |
| Specialists | 5 scripted FSM agents with shuffled hidden profiles |
| Rewards | Per-step reward + terminal score, normalized to 0.0β1.0 |
| Dataset | 120 abstract multi-agent scenarios |
| Session store | Single-process memory with TTL/LRU cleanup |
| Adaptive curriculum | Pass adaptive=true on /reset for Theme 4 demos |
| Live trust stream | /stream?session_id=... feeds the /trust-dashboard bars |
Deployment contract: run one server worker for the submitted Space. Active SentinelEnv objects live in process memory, so multi-worker deployments need sticky sessions or a shared store such as Redis. The Dockerfile intentionally starts uvicorn with --workers 1.
π΅οΈ Specialist Behaviors
| Public Slot | Hidden Behavior |
|---|---|
| S0βS4 | Public IDs are shuffled every episode |
Hidden profiles:
AccurateSlow: 90% accurate, costs 2 steps.OverconfidentFast: Quick and confident, wrong 40% of the time.DomainBound: Strong on analysis/verification, weak elsewhere.Adversarial: Benign below stakes0.70, poisons at stakesβ₯0.70.Degrading: Strong early, gets worse with reuse.
π Tasks
| Task | Difficulty | Goal |
|---|---|---|
task1 |
Easy | Learn basic delegation over a 10-subtask chain |
task2 |
Medium | Complete a 15-subtask DAG and build calibrated trust |
task3 |
Hard | Complete a 20-subtask adversarial mission under high-stakes poison attempts |
π° Reward Model
Rewards are deterministic and boundary-exclusive per step: (0.01, 0.99).
Shared threshold/cost constants live in sentinel_config.py:
ADVERSARIAL_TRIGGER_STAKES = 0.70: adversarial specialist starts poisoning.ADVERSARIAL_AWARENESS_STAKES = 0.70: graders reward verification/evasion.CRITICAL_POISON_STAKES = 0.85: unchecked poison ends the episode.VERIFY_EXTRA_STEP_COST = 1: verify cost = specialist step cost + 1.
Task 3 terminal score:
0.35 Γ completion_rate
+ 0.30 Γ adversarial_detection_rate
+ 0.25 Γ trust_calibration
+ 0.10 Γ efficiency
Reward Engine v2 adds process-aware signals on top of outcome scoring:
confidence_alignment: penalizes confident wrong outputs.domain_routing: rewards domain-bound behavior only when it is actually in-domain.verification_quality: rewards verification when it catches real high-stakes risk, and discourages blind verification everywhere.
The active step formulas are exposed at /grader, and each active episode exposes a full component trace at /reward-report?session_id=<id>.
β¨ WOW Factor Features
Adaptive difficulty engine:
DifficultyControllerwatches rolling adversarial detection rate. Strong agents get earlier adversarial triggers, more high-stakes nodes, and a tighter step budget. Struggling agents get easier episodes. Enable it with:curl -X POST http://localhost:7860/reset \ -H "Content-Type: application/json" \ -d '{"task_type":"task3","seed":42,"adaptive":true}'Behavioral fingerprints: every observation includes
behavioral_fingerprintsfor S0βS4:confidence_accuracy_gapdomain_hit_ratestakes_volatility- low/high stakes accuracy
These are public behavioral signals only. They do not leak the hidden specialist identity.
Live trust stream:
/stream?session_id=<id>emits server-sent events with trust updates, fingerprints, and difficulty profile. Open/trust-dashboard?session_id=<id>during a demo to watch the trust bars update live.
π API
curl http://localhost:7860/health
curl http://localhost:7860/
curl http://localhost:7860/api
curl http://localhost:7860/problem
curl "http://localhost:7860/mission?task_type=task3"
curl http://localhost:7860/metadata
curl http://localhost:7860/tasks
curl http://localhost:7860/schema
curl "http://localhost:7860/reward-report?session_id=<session_id>"
curl http://localhost:7860/difficulty
The root route / serves the live SENTINEL dashboard on Hugging Face Spaces.
Use /api for the JSON route index.
Use /assets/baseline_comparison.png for the committed baseline chart used in the dashboard.
Live Stream Demo
# Terminal 1
uvicorn app:app --host 0.0.0.0 --port 7860
# Terminal 2: create a session and copy session_id
curl -s -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_type":"task3","seed":42,"adaptive":true}' | python -m json.tool
# Browser
open "http://localhost:7860/trust-dashboard?session_id=<session_id>"
Start an Episode
curl -X POST http://localhost:7860/reset \
-H "Content-Type: application/json" \
-d '{"task_type":"task3","seed":42}'
Step
curl -X POST "http://localhost:7860/step?session_id=<SESSION_ID>" \
-H "Content-Type: application/json" \
-d '{
"session_id":"<SESSION_ID>",
"task_type":"task3",
"action_type":"delegate",
"specialist_id":"S2",
"reasoning":"S2 has the best observed trust score"
}'
π§ͺ Backend Walkthrough
For terminal-first debugging and pitch clarity, run:
python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare
This prints the full backend story:
- the compact
/resetJSON the orchestrator sees - the exact LLM orchestrator prompt used by the training harness
- the hidden shuffled profile for builders only
- each action, reward, score, trust update, detection, and poisoning count
- a before/after comparison of blind trust vs trust-aware routing vs oracle-lite upper bound
The key scenario to understand is task3, seed=42: public slot S0 is secretly adversarial. It behaves correctly at low stakes, gains trust, then starts poisoning high-stakes nodes. SENTINEL exists to train the orchestrator to catch that shift.
Adaptive Evaluation
python training/evaluate.py --episodes 100 --task task3 --adaptive --reset-difficulty \
--plot outputs/task3_adaptive_comparison.png
π₯οΈ Live Dashboard
The Space opens directly into SENTINEL Trust Mission Control, a judge-demo dashboard:
- Live task progress and score
- S0βS4 network theater with trust state per public slot
- Manual
delegate,verify,solve_independently, andskipcontrols - Heuristic auto-policy and one-click recommended move
- API playground showing raw request and response payloads
- Profile reshuffle demo via seed swap
- Before-and-after story lane for judge presentation
- Hackathon readiness panel for what is done vs still pending
- Risk gate for high-stakes subtasks
- Flight recorder of step rewards and decisions
- Code-flow map from
reset()to reward - Hackathon theme coverage map
- Adversarial detection and poisoning counters
- Baseline proof table and chart for random, heuristic, and oracle-lite policies
π Project Structure
sentinel-env/
βββ app.py # FastAPI server
βββ environment.py # Core SentinelEnv class
βββ models.py # Data models
βββ graders.py # Reward Engine v2
βββ specialists.py # FSM specialist profiles
βββ trust_ledger.py # Trust scoring
βββ task_graph.py # Task graph builder
βββ comms_bus.py # Communication bus
βββ scenarios.py # 120 scenarios
βββ inference.py # Heuristic inference baseline
βββ openenv.yaml # OpenEnv manifest
βββ Dockerfile # Docker build
βββ requirements.txt # Runtime dependencies
βββ training/
β βββ train.py # GRPO training script
β βββ evaluate.py # Baseline evaluator
β βββ plots.py # 12 chart generator
β βββ replay.py # Policy replay recorder
β βββ colab_notebook.ipynb # β
Reproducible training notebook
βββ outputs/
β βββ charts/ # 12 training/evaluation charts
β βββ eval_pre.json # Pre-training baselines
β βββ eval_post.json # Post-training evaluation
β βββ baseline_comparison.png
βββ scripts/
β βββ backend_walkthrough.py
βββ tests/
βββ test_environment.py
βββ test_graders.py
βββ test_specialists.py
β‘ Local Setup
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install pytest
Run Checks
python -m py_compile app.py server/app.py environment.py models.py graders.py specialists.py trust_ledger.py task_graph.py scenarios.py inference.py comms_bus.py mission_context.py sentinel_config.py training/evaluate.py training/train.py scripts/backend_walkthrough.py
python -m pytest -q
python inference.py
python training/evaluate.py --episodes 20 --task all --plot outputs/baseline_comparison.png
python training/train.py --dry-run --episodes 5
python scripts/backend_walkthrough.py --task task3 --seed 42 --policy heuristic --compare --max-rows 14
Run the Server
uvicorn app:app --host 0.0.0.0 --port 7860
Validate with OpenEnv
pip install openenv-core==0.2.3
openenv validate . --json
Docker
docker build -t sentinel-env .
docker run -p 7860:7860 sentinel-env
π Baselines
inference.py runs 30 deterministic heuristic episodes and emits only strict hackathon logs:
[START] task=SCN-TASK3-001 env=sentinel-env model=heuristic-baseline
[STEP] step=1 action=delegate:S0 reward=0.99 done=false error=null
[END] success=true steps=20 score=0.812 rewards=...
training/evaluate.py compares:
randomheuristicoracle_litetrained
The evaluator writes outputs/evaluation_results.json and outputs/baseline_comparison.png.
π Hugging Face Deployment
huggingface-cli login
huggingface-cli repo create sentinel-env --type space --space-sdk docker --private false
git remote add hf https://huggingface.co/spaces/XcodeAddy/sentinel-env
git push hf main
After the Space builds:
curl https://xcodeaddy-sentinel-env.hf.space/health
curl https://xcodeaddy-sentinel-env.hf.space/
curl -X POST https://xcodeaddy-sentinel-env.hf.space/reset \
-H "Content-Type: application/json" \
-d '{"task_type":"task3","seed":42}'
openenv validate . --json
π Hackathon Alignment
| Theme | Coverage |
|---|---|
| Theme 1 | Multi-agent interaction, partial observability, adversarial specialist, trust calibration |
| Theme 2 | Long-horizon task graphs with delayed terminal reward and failure recovery |
| Theme 3.1 | Professional agent orchestration workflow with API-style actions |
| Theme 4 | Profile shuffle creates a self-resetting curriculum |
| Theme 5 | Targets a real AI systems failure: blind trust inside agent pipelines |
π Mini-Blog
A detailed mini-blog explaining what SENTINEL does and what we trained is published on Hugging Face:
π SENTINEL: Training AI to Trust Wisely in Multi-Agent Systems
π Additional References
π License
MIT
