corp-env / data /README.md
Navigam's picture
refactor: update training scripts and documentation for SFT and RLVR processes
4e1a75b
|
Raw
History Blame Contribute Delete
539 Bytes
# Data Layout
Use this layout for the SFT -> RLVR pipeline.
- `raw/imported/`: imported generated scenario examples (E1/M1)
- `raw/synthetic/`: synthetic seed trajectories (for example H1 seeds)
- `processed/verified/`: strict-clean and rejected trajectories plus all-records
- `sft/merged/`: final chat-format SFT training JSONL
## Recommended command
```powershell
uv run python scripts/run_data_pipeline.py --write-legacy-copies
```
This command creates structured outputs and also updates legacy flat paths used
by older scripts.