Spaces:
Runtime error
Runtime error
| # Training Data | |
| Seed and generated datasets for SFT warm-start live under `data/`. | |
| ## Files | |
| | File | Purpose | | |
| |------|---------| | |
| | `sft.jsonl` | Seed SFT dataset in ChatML format, including assistant tool calls and tool responses. | | |
| | `tool_info.md` | Reusable tool catalog that can be injected into generated system prompts with `--tool-info`. | | |
| | `synthetic*.jsonl` | Generated synthetic datasets from `openrange synthetic-data` (gitignored). | | |
| ## Seed SFT Format | |
| Each line in `sft.jsonl` is a single solved trajectory: | |
| ```json | |
| { | |
| "messages": [ | |
| {"role": "system", "content": "..."}, | |
| {"role": "user", "content": "..."}, | |
| {"role": "assistant", "content": "...", "tool_calls": [...]}, | |
| {"role": "tool", "tool_call_id": "...", "name": "shell_command", "content": "..."} | |
| ], | |
| "metadata": {"source": "bootstrap", "success": true}, | |
| "ground_truth_flag": "FLAG{...}", | |
| "optimal_steps": 8 | |
| } | |
| ``` | |
| ## Generating Synthetic Data | |
| Use the seed file as bootstrap context and merge newly generated OpenRange traces into a single output: | |
| ```bash | |
| uv run --extra synthetic openrange synthetic-data \ | |
| --manifest manifests/tier1_basic.yaml \ | |
| --output data/synthetic_sft_5.jsonl \ | |
| --num-traces 5 \ | |
| --roles red \ | |
| --teacher-model azure/gpt-5.2-codex \ | |
| --bootstrap-traces data/sft.jsonl \ | |
| --tool-info data/tool_info.md | |
| ``` | |
| The output file keeps the imported bootstrap records intact and appends the generated OpenRange records after them. | |