SENTINEL Trust Mission Control

System Overview

reset → step → state OpenEnv compatible skill, not identity

What SENTINEL actually teaches

SENTINEL is not training a specialist to solve one domain task. It trains the orchestrator to decide who to trust, when to verify, when to self-solve, and how to recover when one public slot turns unreliable or adversarial inside a long multi-agent task graph.

Observation model The orchestrator only sees behavior: public slots, trust scores, stakes, step budget, and outcomes.

Core novelty Hidden specialist profiles reshuffle every reset, so the agent cannot memorize that S2 or S3 is dangerous.

Judge takeaway This environment turns blind agent-to-agent trust into a trainable oversight skill.

Random overall

0.695

Heuristic overall

0.796

Task 3 detect

0.735

How to test this fast

1. Overview mode Read the before/after lanes and reward proof. This tells the story in judge language.

2. Playground mode Reset an episode, click Auto Policy, and watch the API payloads, trust bars, and reward stream update.

3. Judge Demo mode Run Random, then Heuristic, then Swap + Replay. That is the live finale sequence.

Live Trust Theater

READY SCENARIO

Profile rule reshuffle on reset

Observation rule behavior only

Failure mode high-stakes poison

Orchestrator

Reset starts a fresh trust game.

S00.50

public slotwatch

S10.50

public slotwatch

S20.50

public slotwatch

S30.50

public slotwatch

S40.50

public slotwatch

Recommended move delegate:S0

Adversarial signals 0 detected / 0 poison

Trust objective skill, not identity

Command Deck

No session

Route Decision delegate:S0

Waiting for episode state.

Mission State

0 detected 0 poison

Score

0.000

Step Budget

0/45

Subtasks Done

0/20

Stakes

0.00

Risk gate

Reset an episode to begin.

Bayesian Trust Ledger

mean 0.50

API Playground

POST /reset backend visible

Last Request what UI sent

{
  "status": "waiting",
  "message": "Reset or step to inspect backend payloads."
}

Last Response what backend returned

{
  "status": "waiting",
  "message": "Observation, reward, and info will appear here."
}

reset() Start a new episode, get the first observation, then choose actions step by step.

What To Watch Observation tells you the next subtask and public trust scores. Reward tells you whether routing was correct. Info tells you the normalized episode score and adversarial counters.

Before And After

blind trust trained skepticism

Without SENTINEL task3 random 0.666

All public slots start near the same trust. The orchestrator delegates with weak evidence.

A high-confidence specialist can slip poisoned output into a high-stakes node.

Downstream subtasks inherit bad state, so the mission drifts before anyone notices.

Detection stays weak and the agent cannot explain which public slot became dangerous.

Judge takeaway: good-looking multi-agent systems still fail if trust is static or role-based.

With SENTINEL task3 heuristic 0.784

Behavior updates the TrustLedger after every step, so public slots diverge quickly.

When stakes rise and trust is shaky, the orchestrator switches from delegate to verify.

Adversarial attempts are detected before they cascade through the task graph.

Profile swap forces re-learning from evidence, proving skill instead of memorized identity.

Judge takeaway: this environment teaches oversight, recovery, and calibrated delegation under uncertainty.

Judge Demo Rail

3-minute flow one-click policies

Random baseline

0.695

Blind delegation baseline. Good enough to move, weak at skepticism.

Heuristic policy

0.796

Trust-weighted routing plus verification at risky gates.

Task 3 detection

0.735

Adversarial detections before poison can cascade into later nodes.

Step 1 — show the failure Run Random to show how similar-looking trust scores lead to brittle routing and weak detection.

Step 2 — show the learned behavior Run Heuristic to show trust divergence, verification at risky gates, and cleaner recovery.

Step 3 — show generalization Hit Swap + Replay so hidden roles reshuffle and the orchestrator has to learn from fresh evidence again.

Hackathon Readiness

what is done vs what is left

Environment Core Ready OpenEnv shape works: reset, step, state, normalized score, Docker, Space, and live dashboard.

Reward Proof Ready Random, heuristic, and oracle-lite comparisons are committed and visible in the UI.

Training Harness Ready TRL and Unsloth dry-run path exists; onsite job is to capture the real reward-improvement curve.

Still Needed For Finale Mini-blog or video, onsite GRPO run, and one polished 3-minute story using this dashboard plus before/after evidence.

Reward Signal Proof

random to heuristic to oracle-lite

Random

0.695

Heuristic

0.796

Oracle-lite

0.855

T3 detect

0.735

Flight Recorder

last reward 0.00

Code Flow

reset, step, state

Reset environment.py samples scenario, resets graph, ledger, and specialist profile.

Observe agent sees subtask, stakes, trust snapshot, step budget, and public slots.

Act delegate, verify, self solve, or skip through the OpenEnv step API.

Specialist scripted FSM returns outcome, confidence, cost, and adversarial flag.

Ledger trust_ledger.py updates public slot reliability from observed behavior.

Reward graders.py scores completion, detection, calibration, and efficiency.

Hackathon Fit

judge story map

Theme 1 Orchestrator manages five partially observable actors under adversarial pressure.

Theme 2 Long-horizon task graph with budget pressure, retries, and delayed terminal reward.

Theme 4 Profile shuffle creates an adaptive curriculum and blocks identity memorization.

Wild Card Turns blind agent-to-agent trust into a trainable safety and oversight skill.