vergil-training / docs /VIDEO_SCRIPT.md
Laksh718
feat(submission): OpenEnv shim + plot pipeline + demo Space deploy + docs
ce44f4b

VERGIL — 90-second submission video script

Goal: convince a judge in 90 seconds that VERGIL is (a) a real OpenEnv, (b) solving a real problem, (c) producing a measurably better agent.


Recording plan

  • Tool: QuickTime (mac) or OBS — record the browser at 1280×800.
  • Mic: phone mic with Krisp noise removal is fine.
  • Edit: iMovie / DaVinci — single timeline, no transitions, no music bed (judges watch many; clarity > vibe).
  • Captions: burned-in, white sans-serif, lower third, 28pt.
  • Final output: .mp4, 1080p, < 2 min. Upload as a HF Space asset and link it in README.md § 9 + docs/SUBMISSION.md § 8.

Shot list (00:00 → 01:30)

00:00 – 00:08 · The hook

On-screen: split-screen, two LLM chat windows side-by-side. The user asks each: "Can you finish the Q3 deck by 5pm? Also redesign the homepage by EOD. Also prep the board memo by morning?" Both LLMs answer "Yes, of course!"

VO:

"Here's a problem nobody's solved. LLM agents over-commit — they say yes to three back-to-back deadlines without realising they're impossible together."

00:08 – 00:18 · Why it matters

On-screen: Cut to a clock animation; one of the chats turns red as a deadline slips. A small graph appears showing two more nodes turning red in cascade.

VO:

"And the failure cascades silently. The third commitment kills the second, the second kills the first, and the user only finds out at 5pm Friday."

00:18 – 00:32 · The environment

On-screen: Open the live demo Space at huggingface.co/spaces/Laksh718/vergil-demo. Click New Episode. The CDG renders with 3-4 nodes, edges, urgency rings.

VO:

"VERGIL turns this into an RL environment. A Commitment Dependency Graph: nodes are promises, edges are dependencies, every accept mutates the satisfiability of every other promise. Stakeholders have multi-dimensional trust that decays differently for honest declines versus broken promises. It's an OpenEnv-compatible POMDP."

[While speaking, hover over a couple of nodes to show urgency / deadline hover-info; click the Compare button to preload the overlay.]

00:32 – 00:50 · The reward

On-screen: Cut to a slide listing the 7 reward components with their weights, with silent_drop −0.50 highlighted.

VO:

"Reward has 7 process-aware components plus a format bonus. The biggest negative signal isn't broken commitment — it's silent drop. Accepting something and quietly ignoring it is worse than honestly declining. That single weight inversion is what teaches the agent to renegotiate proactively instead of disappearing."

00:50 – 01:10 · The training run

On-screen: Cut to the training Space huggingface.co/spaces/Laksh718/vergil-training showing live logs and the status bar; then transition to the rendered training_curve.png.

VO:

"GRPO on Qwen 2.5 1.5B with Unsloth and LoRA rank 64. One L40S, 60 steps, about 25 minutes. Reward goes from random to about plus zero point eight on a curriculum that ramps from one stakeholder to four with adversarial behaviours."

01:10 – 01:25 · The payoff

On-screen: Back to the demo Space. Click ⚡ Compare. Pick "Deadline Cascade Chain". Click Run. As the side-by-side mini-graphs animate, the naive side turns red across the chain; the VERGIL side stays mostly green with one counter-propose flagged.

VO:

"Same scenario, both agents. Naive accepts everything, the chain collapses, four broken commitments, average trust drops to forty percent. VERGIL counter-proposes once, completes the rest, average trust above sixty-five. That's a measurable, reproducible OpenEnv contribution."

01:25 – 01:30 · The CTA

On-screen: Title card with the three URLs + GitHub link.

VO:

"Code, model and live demo are all linked. Thanks for watching."


On-screen URLs to show in the title card

github.com/Laksh718/Vergil
huggingface.co/spaces/Laksh718/vergil-demo
huggingface.co/Laksh718/vergil-commitment-engine

Backup mini-blog post (if a video isn't recorded in time)

Title:

VERGIL: teaching LLMs to think before they commit

Lead paragraph:

We built a graph-structured POMDP where every "yes" mutates the feasibility of every other promise — and trained a 1.5B Qwen with GRPO to navigate it. The result is an agent that proactively renegotiates instead of silently failing. Source, model and live demo linked below.

Sections (mirror this script):

  1. The problem (over-commitment, cascading failure)
  2. The environment (CDG, POMDP, multi-dim trust)
  3. The reward (7 components + silent-drop is largest negative)
  4. The training (GRPO, Unsloth, L40S, curriculum)
  5. The payoff (naive vs trained, with embedded plots)
  6. Try it / fork it (links)

Publish as a Hugging Face Spaces blog post on hf.co/Laksh718/vergil-commitment-engine or as a Markdown gist.