Spaces:
Running
Running
File size: 419 Bytes
16dc556 | 1 2 3 4 5 6 7 8 9 | """Training-data tooling for the ScrubData planner.
Synthetic-dirtying pipeline: generate CLEAN tables with known schemas, inject
controlled mess, and emit (dirty profile -> ground-truth plan) SFT pairs. Because
we created the mess, the ground-truth plan is known; every example is then
VERIFIED by running scrubdata.executor (dirty + plan should recover the clean
original), so the dataset is guaranteed-correct.
"""
|