S

SENTINEL Trust Mission Control

OpenEnv RL environment for adversarial multi-agent trust calibration
Overview turns the environment into a judge-readable system story: the problem, the learning signal, and the live failure mode it fixes.

System Overview

reset → step → state OpenEnv compatible skill, not identity

What SENTINEL actually teaches

SENTINEL is not training a specialist to solve one domain task. It trains the orchestrator to decide who to trust, when to verify, when to self-solve, and how to recover when one public slot turns unreliable or adversarial inside a long multi-agent task graph.

Observation model The orchestrator only sees behavior: public slots, trust scores, stakes, step budget, and outcomes.
Core novelty Hidden specialist profiles reshuffle every reset, so the agent cannot memorize that S2 or S3 is dangerous.
Judge takeaway This environment turns blind agent-to-agent trust into a trainable oversight skill.
Random overall
0.695
Heuristic overall
0.796
Task 3 detect
0.735

How to test this fast

1. Overview mode Read the before/after lanes and reward proof. This tells the story in judge language.
2. Playground mode Reset an episode, click Auto Policy, and watch the API payloads, trust bars, and reward stream update.
3. Judge Demo mode Run Random, then Heuristic, then Swap + Replay. That is the live finale sequence.

Live Trust Theater

READY SCENARIO
Profile rule reshuffle on reset
Observation rule behavior only
Failure mode high-stakes poison
Orchestrator
Reset starts a fresh trust game.
S00.50
public slotwatch
S10.50
public slotwatch
S20.50
public slotwatch
S30.50
public slotwatch
S40.50
public slotwatch
Recommended move delegate:S0
Adversarial signals 0 detected / 0 poison
Trust objective skill, not identity

Command Deck

No session
Route Decision delegate:S0
Waiting for episode state.

Mission State

0 detected 0 poison
Score
0.000
Step Budget
0/45
Subtasks Done
0/20
Stakes
0.00
Risk gate
Reset an episode to begin.

Bayesian Trust Ledger

mean 0.50

API Playground

POST /reset backend visible
Last Request what UI sent
{
  "status": "waiting",
  "message": "Reset or step to inspect backend payloads."
}
Last Response what backend returned
{
  "status": "waiting",
  "message": "Observation, reward, and info will appear here."
}
reset() Start a new episode, get the first observation, then choose actions step by step.
What To Watch Observation tells you the next subtask and public trust scores. Reward tells you whether routing was correct. Info tells you the normalized episode score and adversarial counters.

Before And After

blind trust trained skepticism
Without SENTINEL task3 random 0.666
All public slots start near the same trust. The orchestrator delegates with weak evidence.
A high-confidence specialist can slip poisoned output into a high-stakes node.
Downstream subtasks inherit bad state, so the mission drifts before anyone notices.
Detection stays weak and the agent cannot explain which public slot became dangerous.
Judge takeaway: good-looking multi-agent systems still fail if trust is static or role-based.
With SENTINEL task3 heuristic 0.784
Behavior updates the TrustLedger after every step, so public slots diverge quickly.
When stakes rise and trust is shaky, the orchestrator switches from delegate to verify.
Adversarial attempts are detected before they cascade through the task graph.
Profile swap forces re-learning from evidence, proving skill instead of memorized identity.
Judge takeaway: this environment teaches oversight, recovery, and calibrated delegation under uncertainty.

Judge Demo Rail

3-minute flow one-click policies
Random baseline
0.695
Blind delegation baseline. Good enough to move, weak at skepticism.
Heuristic policy
0.796
Trust-weighted routing plus verification at risky gates.
Task 3 detection
0.735
Adversarial detections before poison can cascade into later nodes.
Step 1 — show the failure Run Random to show how similar-looking trust scores lead to brittle routing and weak detection.
Step 2 — show the learned behavior Run Heuristic to show trust divergence, verification at risky gates, and cleaner recovery.
Step 3 — show generalization Hit Swap + Replay so hidden roles reshuffle and the orchestrator has to learn from fresh evidence again.

Hackathon Readiness

what is done vs what is left
Environment Core Ready OpenEnv shape works: reset, step, state, normalized score, Docker, Space, and live dashboard.
Reward Proof Ready Random, heuristic, and oracle-lite comparisons are committed and visible in the UI.
Training Harness Ready TRL and Unsloth dry-run path exists; onsite job is to capture the real reward-improvement curve.
Still Needed For Finale Mini-blog or video, onsite GRPO run, and one polished 3-minute story using this dashboard plus before/after evidence.

Reward Signal Proof

random to heuristic to oracle-lite
Random
0.695
Heuristic
0.796
Oracle-lite
0.855
T3 detect
0.735
SENTINEL baseline comparison chart

Flight Recorder

last reward 0.00

Code Flow

reset, step, state
Reset environment.py samples scenario, resets graph, ledger, and specialist profile.
Observe agent sees subtask, stakes, trust snapshot, step budget, and public slots.
Act delegate, verify, self solve, or skip through the OpenEnv step API.
Specialist scripted FSM returns outcome, confidence, cost, and adversarial flag.
Ledger trust_ledger.py updates public slot reliability from observed behavior.
Reward graders.py scores completion, detection, calibration, and efficiency.

Hackathon Fit

judge story map
Theme 1 Orchestrator manages five partially observable actors under adversarial pressure.
Theme 2 Long-horizon task graph with budget pressure, retries, and delayed terminal reward.
Theme 4 Profile shuffle creates an adaptive curriculum and blocks identity memorization.
Wild Card Turns blind agent-to-agent trust into a trainable safety and oversight skill.