sentinel-env / docs /ROLL_OUT.md
XcodeAddy's picture
Add process-aware reward engine reports
b3b9bbd
# SENTINEL Rollout
This file is the execution spine for the project. The rule is simple:
1. Finish one phase.
2. Verify it.
3. Only then move to the next phase.
SENTINEL wins if the repo, Space, README, UI, and pitch all tell the same story:
> Train an orchestrator to decide who to trust, when to verify, and how to recover in long multi-agent tasks when specialists are unreliable or adversarial.
## Current Status
| Area | Status | Notes |
| --- | --- | --- |
| Environment core | Strong | `reset()`, `step()`, `state()`, reward v2, task graph, specialists, trust ledger |
| OpenEnv / deploy | Strong | Space live, Docker passing, validation passing |
| UI clarity | Improving | Trust Mission Control is live, but still needs full judge-demo mode |
| Presentation assets | Partial | Story exists, but diagrams and finale pack need stronger structure |
| Training evidence | Partial | Baselines are refreshed under Reward Engine v2; final onsite GRPO curve still missing |
| Submission completeness | Partial | Mini-blog/video and final finale package still needed |
## What We Borrow From MiroFish
We borrow **presentation discipline**, not product scope.
Use these MiroFish-style strengths:
- one sharp promise at the top
- visible workflow
- screenshot and diagram density
- live demo-first presentation
- clean quick-start and deployment instructions
Do **not** copy these patterns into SENTINEL:
- giant "predict anything" scope
- too many use cases
- vague platform framing
- vision language that is larger than the actual judged artifact
## Phase Rules
- Phase 1 must lock the narrative.
- Phase 2 must lock the diagram system.
- Phase 3 must make the UI explain the backend and the story.
- Phase 4 must make learning evidence obvious.
- Phase 5 must make the submission complete and reproducible.
- Phase 6 must make the final pitch unforgettable.
Do not skip a verification gate just because the feature "looks done."
---
## Phase 1 - Narrative Lock
**Goal**
Create one judge-safe project story and use it everywhere.
**Outputs**
- [Narrative Lock](./presentation/NARRATIVE_LOCK.md)
- final one-line thesis
- final hook
- final problem framing
- final before/after claim
- final "what not to say" guardrails
**Done means**
- README, UI, demo script, and pitch all use the same project sentence
- no outdated numbers or mismatched claims remain in primary docs
- the problem statement is clearly software-first, RL-first, and OpenEnv-first
**Verification**
- README top section matches the narrative lock
- UI top section uses the same thesis
- team can explain SENTINEL in 20 seconds and 2 minutes without changing the core message
**Status**
`In progress`
---
## Phase 2 - Visual System Pack
**Goal**
Turn scattered diagrams into one visual language.
**Outputs**
- [Visual System](./diagrams/VISUAL_SYSTEM.md)
- architecture diagram
- episode lifecycle diagram
- trust / reward dataflow diagram
- before / after failure chain
- theme fit diagram
- training loop diagram
**Done means**
- every diagram uses the same naming and system boundaries
- no diagram contradicts the actual code
- diagrams can be embedded in README, blog, pitch, and UI
**Verification**
- `app.py`, `environment.py`, `specialists.py`, `trust_ledger.py`, `graders.py`, `task_graph.py`, and `inference.py` are all represented correctly
- before/after flow uses real baseline numbers, not aspirational placeholders
**Status**
`In progress`
---
## Phase 3 - Productized Demo UI
**Goal**
Make the frontend explain the backend to judges and first-time users.
**Outputs**
- `Overview` mode
- `Playground` mode
- `Judge Demo` mode
- raw request/response visibility
- guided walkthrough of one episode
- profile swap demo path
**Done means**
- a first-time viewer can answer:
- what is SENTINEL?
- what does the agent observe?
- what action did the UI send?
- what did the backend return?
- why does trust change?
- why is this hard?
**Verification**
- local `/`, `/reset`, `/step`, `/state`, and `/assets/baseline_comparison.png` all behave correctly
- live Space reflects the same experience
- no section feels like internal tooling only
**Status**
`Pending`
---
## Phase 4 - Learning Evidence
**Goal**
Make reward improvement impossible to miss.
**Outputs**
- random vs heuristic vs oracle-lite comparison
- visible completion, detection, calibration, efficiency metrics
- onsite GRPO / Unsloth reward curve
- trained vs untrained comparison block
**Done means**
- judges can see measurable improvement in one screen and one README section
- there is a visible path from baseline -> better policy -> trained model
**Verification**
- `training/evaluate.py` outputs are committed and linked
- onsite curve is committed once available
- numbers shown in UI and README match evaluation artifacts
**Status**
`Pending`
---
## Phase 5 - Submission Pack
**Goal**
Make the project submission-complete.
**Outputs**
- final README with all links
- HF Space link
- Colab / training notebook link
- blog or video link
- screenshots and diagram links
- reproduction commands
**Done means**
- a judge can clone, run, inspect, and understand the project without asking for missing context
**Verification**
- README links are live
- Space is live
- `openenv validate . --json` passes
- Docker build passes
**Status**
`Pending`
---
## Phase 6 - Finale Pack
**Goal**
Package the repo for the room, not just for the validator.
**Outputs**
- 3-minute script
- 5 likely judge questions + answers
- backup screenshots
- fallback demo sequence
- one-click "killer moment" path
**Done means**
- the pitch works even if the live environment is slow
- the trained-vs-baseline story is memorable
- the profile swap moment is rehearsed
**Verification**
- demo path can be run without improvising architecture details
- every claim can be grounded in repo assets
**Status**
`Pending`
---
## Execution Order
```text
Phase 1 -> Phase 2 -> Phase 3 -> Phase 4 -> Phase 5 -> Phase 6
```
## Next Immediate Build Target
Phase 1 and Phase 2 are the current active work.
Once both are fully stable in-repo, Phase 3 starts on top of them.