vergil-training / docs /VIDEO_SCRIPT.md
Laksh718
feat(submission): OpenEnv shim + plot pipeline + demo Space deploy + docs
ce44f4b
# VERGIL — 90-second submission video script
> Goal: convince a judge in 90 seconds that VERGIL is (a) a real OpenEnv,
> (b) solving a real problem, (c) producing a measurably better agent.
---
## Recording plan
* **Tool**: QuickTime (mac) or OBS — record the browser at 1280×800.
* **Mic**: phone mic with [Krisp](https://krisp.ai/) noise removal is fine.
* **Edit**: iMovie / DaVinci — single timeline, no transitions, no music
bed (judges watch many; clarity > vibe).
* **Captions**: burned-in, white sans-serif, lower third, 28pt.
* **Final output**: `.mp4`, 1080p, < 2 min. Upload as a HF Space asset and
link it in `README.md` § 9 + `docs/SUBMISSION.md` § 8.
---
## Shot list (00:00 → 01:30)
### 00:00 – 00:08 · The hook
**On-screen**: split-screen, two LLM chat windows side-by-side. The user
asks each: "Can you finish the Q3 deck by 5pm? Also redesign the homepage
by EOD. Also prep the board memo by morning?" Both LLMs answer "Yes, of
course!"
**VO**:
> "Here's a problem nobody's solved. LLM agents over-commit — they say
> yes to three back-to-back deadlines without realising they're impossible
> *together*."
### 00:08 – 00:18 · Why it matters
**On-screen**: Cut to a clock animation; one of the chats turns red as a
deadline slips. A small graph appears showing two more nodes turning red
in cascade.
**VO**:
> "And the failure cascades silently. The third commitment kills the second,
> the second kills the first, and the user only finds out at 5pm Friday."
### 00:18 – 00:32 · The environment
**On-screen**: Open the live demo Space at
`huggingface.co/spaces/Laksh718/vergil-demo`. Click **New Episode**. The
CDG renders with 3-4 nodes, edges, urgency rings.
**VO**:
> "VERGIL turns this into an RL environment. A *Commitment Dependency Graph*:
> nodes are promises, edges are dependencies, every accept mutates the
> satisfiability of every other promise. Stakeholders have multi-dimensional
> trust that decays differently for honest declines versus broken promises.
> It's an OpenEnv-compatible POMDP."
*[While speaking, hover over a couple of nodes to show urgency / deadline
hover-info; click the **Compare** button to preload the overlay.]*
### 00:32 – 00:50 · The reward
**On-screen**: Cut to a slide listing the 7 reward components with their
weights, with **silent_drop −0.50** highlighted.
**VO**:
> "Reward has 7 process-aware components plus a format bonus. The biggest
> *negative* signal isn't broken commitment — it's *silent drop*. Accepting
> something and quietly ignoring it is worse than honestly declining. That
> single weight inversion is what teaches the agent to renegotiate
> proactively instead of disappearing."
### 00:50 – 01:10 · The training run
**On-screen**: Cut to the training Space
`huggingface.co/spaces/Laksh718/vergil-training` showing live logs and the
status bar; then transition to the rendered `training_curve.png`.
**VO**:
> "GRPO on Qwen 2.5 1.5B with Unsloth and LoRA rank 64. One L40S, 60 steps,
> about 25 minutes. Reward goes from random to about plus zero point eight
> on a curriculum that ramps from one stakeholder to four with adversarial
> behaviours."
### 01:10 – 01:25 · The payoff
**On-screen**: Back to the demo Space. Click **⚡ Compare**. Pick "Deadline
Cascade Chain". Click **Run**. As the side-by-side mini-graphs animate,
the naive side turns red across the chain; the VERGIL side stays mostly
green with one counter-propose flagged.
**VO**:
> "Same scenario, both agents. Naive accepts everything, the chain
> collapses, four broken commitments, average trust drops to forty percent.
> VERGIL counter-proposes once, completes the rest, average trust above
> sixty-five. That's a measurable, reproducible OpenEnv contribution."
### 01:25 – 01:30 · The CTA
**On-screen**: Title card with the three URLs + GitHub link.
**VO**:
> "Code, model and live demo are all linked. Thanks for watching."
---
## On-screen URLs to show in the title card
```
github.com/Laksh718/Vergil
huggingface.co/spaces/Laksh718/vergil-demo
huggingface.co/Laksh718/vergil-commitment-engine
```
---
## Backup mini-blog post (if a video isn't recorded in time)
Title:
> **VERGIL: teaching LLMs to think before they commit**
Lead paragraph:
> *We built a graph-structured POMDP where every "yes" mutates the
> feasibility of every other promise — and trained a 1.5B Qwen with GRPO
> to navigate it. The result is an agent that proactively renegotiates
> instead of silently failing. Source, model and live demo linked below.*
Sections (mirror this script):
1. The problem (over-commitment, cascading failure)
2. The environment (CDG, POMDP, multi-dim trust)
3. The reward (7 components + silent-drop is largest negative)
4. The training (GRPO, Unsloth, L40S, curriculum)
5. The payoff (naive vs trained, with embedded plots)
6. Try it / fork it (links)
Publish as a Hugging Face *Spaces blog post* on
`hf.co/Laksh718/vergil-commitment-engine` or as a Markdown gist.