Spaces:

build-small-hackathon
/

scrubdata

Running

OpenAI Codex

deploy: add sponsor:openai tag (Best Use of Codex) + Codex-hardened build

16dc556 11 days ago

565 Bytes

	"""Evaluation harness for the ScrubData planner.

	Measures any planner (`callable(dirty_df) -> plan dict`) against a held-out gold set:
	- JSON-schema validity of the plan
	- operation-level micro-F1 vs the gold plan
	- canonicalization mapping micro-F1 (the fuzzy skill rules can't do)
	- end-to-end cell-recovery (executor(dirty, plan) vs known-clean reference)

	Two reference systems frame every run:
	- HEURISTIC (`scrubdata.mock_plan`) = the baseline a fine-tuned model must beat.
	- ORACLE (the gold plan itself) = the goalpost ceiling (~100% by construction).
	"""