# FORGE-v4 Submission Guide

## Colab-First Commands

1. Benchmark with model policy:

```bash
python train_colab.py --benchmark --policy model --episodes 20
```

2. Compare baseline vs model:

```bash
python train_colab.py --compare --episodes 20
```

3. Top up authentic DPO dataset to 480 pairs:

```bash
python train_colab.py --benchmark --policy model --episodes 20 --topup-dpo --target-pairs 480
```

4. Verify pair count:

```bash
python -c "import pathlib; p=pathlib.Path('data/dpo_dataset.jsonl'); print(sum(1 for _ in p.open('r',encoding='utf-8')) if p.exists() else 0)"
```

## Security Notes

- API keys must be set via environment variables.
- No secrets should be hardcoded in source files.
- Sandbox enforces timeout, memory cap (where supported), blocked risky builtins, and temp cleanup.
- For public deployment, add container isolation.

## What Judges Should See

- outputs/reward_curve.png
- outputs/loss_curve.png
- outputs/pass_rate.png
- outputs/final_report.json
- data/dpo_dataset.jsonl with target pair count