# Demo Script

## 30-Second Version

0x960 is an OpenEnv self-improvement environment where an AI learns to *engineer* chess engines, not play chess. It gets a bounded coding workspace, edits engine evaluation code, tests changes with real matches, and is rewarded only when the engine actually gets stronger. We went from a base model that never wrote a single line of code (reward: -2.1) to a distilled student that reliably executes the full engineering loop (reward: +1.0), while our autonomous Codex agent swarm pushed engine strength by **+596.5 Elo** internally and **beat Stockfish 1320 by +221.1 Elo** — reaching competitive strength with Stockfish 1600 in Chess960.

## One-Minute Outline

### 1. Opening (10s)

0x960 is a bounded self-improvement environment for Chess960 engine engineering, built on OpenEnv 0.2.1.

We turned engine engineering into a trainable RL task: inspect code, edit it, test it against a baseline, and get rewarded only when the engine is measurably stronger.

### 2. Why Chess960 (10s)

Chess960 randomizes the starting position across 960 setups. No opening books, no memorized lines. The engine has to *actually understand chess positions* to improve — you can't game the reward by memorizing patterns.

### 3. The Problem We Solved (15s)

When we dropped Qwen 3.5 into this environment, it scored **-2.1 reward** — it never once attempted to write code. It just read files and quit. Raw GRPO RL couldn't fix this because the policy never explored the right actions.

Our breakthrough: **teacher-student distillation first, RL second.**

- GPT-5.4 teacher generates successful bounded-action trajectories via ACP runtime
- Qwen 3.5-0.8B student learns the workflow through SFT (98.76% token accuracy in 5 minutes on H100)
- TRL GRPO refines the student on real match reward
- We also ran Qwen 3.5-9B QLoRA GRPO as a scaling probe on the Northflank H100

After distillation: reward **+1.0**, reliable `write_file → run_match → finish` execution.

### 4. The Codex Agent Swarm (15s)

In parallel, we built an autonomous Codex agent swarm — over a dozen agents across multiple rounds, each specializing in different chess knowledge (king safety, tactics, pawn structure, piece activity, initiative).

Champion/challenger tournament format: every patch gets benchmarked on held-out Chess960 positions. Only verified winners get promoted. 4 eval champions promoted through the gate. The swarm also edits search heuristics directly.

### 5. Results (10s)

- **+596.5 Elo** internal gain (vs search baseline)
- **+221.1 Elo** vs Stockfish 1320 anchor
- **Competitive with Stockfish 1600** in local Chess960 benchmarks
- Engine went from bare negamax to PVS + TT + null-move pruning + LMR + aspiration windows
- Full benchmark suite: eval-vs-eval, engine-vs-engine, UCI anchors, league self-play, static dashboard

All built in ~20 hours at the hackathon. Two parallel self-improvement loops — policy learning and autonomous engine search — feeding the same engine, with every claim backed by held-out match results.