corp-env / data /README.md
Navigam's picture
refactor: update training scripts and documentation for SFT and RLVR processes
4e1a75b
|
Raw
History Blame Contribute Delete
539 Bytes

Data Layout

Use this layout for the SFT -> RLVR pipeline.

  • raw/imported/: imported generated scenario examples (E1/M1)
  • raw/synthetic/: synthetic seed trajectories (for example H1 seeds)
  • processed/verified/: strict-clean and rejected trajectories plus all-records
  • sft/merged/: final chat-format SFT training JSONL

Recommended command

uv run python scripts/run_data_pipeline.py --write-legacy-copies

This command creates structured outputs and also updates legacy flat paths used by older scripts.