Spaces:

Laksh718
/

vergil-training

Paused

App Files Files Community

vergil-training / docs /VIDEO_SCRIPT.md

Laksh718

feat(submission): OpenEnv shim + plot pipeline + demo Space deploy + docs

ce44f4b about 1 month ago

preview code

raw

history blame contribute delete

5 kB

	# VERGIL — 90-second submission video script

	> Goal: convince a judge in 90 seconds that VERGIL is (a) a real OpenEnv,
	> (b) solving a real problem, (c) producing a measurably better agent.

	---

	## Recording plan

	* Tool: QuickTime (mac) or OBS — record the browser at 1280×800.
	* Mic: phone mic with [Krisp](https://krisp.ai/) noise removal is fine.
	* Edit: iMovie / DaVinci — single timeline, no transitions, no music
	bed (judges watch many; clarity > vibe).
	* Captions: burned-in, white sans-serif, lower third, 28pt.
	* Final output: `.mp4`, 1080p, < 2 min. Upload as a HF Space asset and
	link it in `README.md` § 9 + `docs/SUBMISSION.md` § 8.

	---

	## Shot list (00:00 → 01:30)

	### 00:00 – 00:08 · The hook
	On-screen: split-screen, two LLM chat windows side-by-side. The user
	asks each: "Can you finish the Q3 deck by 5pm? Also redesign the homepage
	by EOD. Also prep the board memo by morning?" Both LLMs answer "Yes, of
	course!"

	VO:
	> "Here's a problem nobody's solved. LLM agents over-commit — they say
	> yes to three back-to-back deadlines without realising they're impossible
	> together."

	### 00:08 – 00:18 · Why it matters
	On-screen: Cut to a clock animation; one of the chats turns red as a
	deadline slips. A small graph appears showing two more nodes turning red
	in cascade.

	VO:
	> "And the failure cascades silently. The third commitment kills the second,
	> the second kills the first, and the user only finds out at 5pm Friday."

	### 00:18 – 00:32 · The environment
	On-screen: Open the live demo Space at
	`huggingface.co/spaces/Laksh718/vergil-demo`. Click New Episode. The
	CDG renders with 3-4 nodes, edges, urgency rings.

	VO:
	> "VERGIL turns this into an RL environment. A Commitment Dependency Graph:
	> nodes are promises, edges are dependencies, every accept mutates the
	> satisfiability of every other promise. Stakeholders have multi-dimensional
	> trust that decays differently for honest declines versus broken promises.
	> It's an OpenEnv-compatible POMDP."

	*[While speaking, hover over a couple of nodes to show urgency / deadline
	hover-info; click the Compare button to preload the overlay.]*

	### 00:32 – 00:50 · The reward
	On-screen: Cut to a slide listing the 7 reward components with their
	weights, with silent_drop −0.50 highlighted.

	VO:
	> "Reward has 7 process-aware components plus a format bonus. The biggest
	> negative signal isn't broken commitment — it's silent drop. Accepting
	> something and quietly ignoring it is worse than honestly declining. That
	> single weight inversion is what teaches the agent to renegotiate
	> proactively instead of disappearing."

	### 00:50 – 01:10 · The training run
	On-screen: Cut to the training Space
	`huggingface.co/spaces/Laksh718/vergil-training` showing live logs and the
	status bar; then transition to the rendered `training_curve.png`.

	VO:
	> "GRPO on Qwen 2.5 1.5B with Unsloth and LoRA rank 64. One L40S, 60 steps,
	> about 25 minutes. Reward goes from random to about plus zero point eight
	> on a curriculum that ramps from one stakeholder to four with adversarial
	> behaviours."

	### 01:10 – 01:25 · The payoff
	On-screen: Back to the demo Space. Click ⚡ Compare. Pick "Deadline
	Cascade Chain". Click Run. As the side-by-side mini-graphs animate,
	the naive side turns red across the chain; the VERGIL side stays mostly
	green with one counter-propose flagged.

	VO:
	> "Same scenario, both agents. Naive accepts everything, the chain
	> collapses, four broken commitments, average trust drops to forty percent.
	> VERGIL counter-proposes once, completes the rest, average trust above
	> sixty-five. That's a measurable, reproducible OpenEnv contribution."

	### 01:25 – 01:30 · The CTA
	On-screen: Title card with the three URLs + GitHub link.

	VO:
	> "Code, model and live demo are all linked. Thanks for watching."

	---

	## On-screen URLs to show in the title card

	```
	github.com/Laksh718/Vergil
	huggingface.co/spaces/Laksh718/vergil-demo
	huggingface.co/Laksh718/vergil-commitment-engine
	```

	---

	## Backup mini-blog post (if a video isn't recorded in time)

	Title:
	> VERGIL: teaching LLMs to think before they commit

	Lead paragraph:
	> *We built a graph-structured POMDP where every "yes" mutates the
	> feasibility of every other promise — and trained a 1.5B Qwen with GRPO
	> to navigate it. The result is an agent that proactively renegotiates
	> instead of silently failing. Source, model and live demo linked below.*

	Sections (mirror this script):
	1. The problem (over-commitment, cascading failure)
	2. The environment (CDG, POMDP, multi-dim trust)
	3. The reward (7 components + silent-drop is largest negative)
	4. The training (GRPO, Unsloth, L40S, curriculum)
	5. The payoff (naive vs trained, with embedded plots)
	6. Try it / fork it (links)

	Publish as a Hugging Face Spaces blog post on
	`hf.co/Laksh718/vergil-commitment-engine` or as a Markdown gist.