# Data Layout Use this layout for the SFT -> RLVR pipeline. - `raw/imported/`: imported generated scenario examples (E1/M1) - `raw/synthetic/`: synthetic seed trajectories (for example H1 seeds) - `processed/verified/`: strict-clean and rejected trajectories plus all-records - `sft/merged/`: final chat-format SFT training JSONL ## Recommended command ```powershell uv run python scripts/run_data_pipeline.py --write-legacy-copies ``` This command creates structured outputs and also updates legacy flat paths used by older scripts.