Spaces:

hjerpe
/

sql_env

Sleeping

File size: 5,460 Bytes

5dd1bb4

# Feature Demo: F004 — Question Dataset Expansion

> **Generated:** 2026-03-24T21:07:31Z
> **Context source:** spec + discovery only (implementation not read)
> **Feature entry:** [FEATURES.json #F004](./FEATURES.json)

---

## What This Feature Does

Before this feature, training data came from a single database and could overfit to one schema. F004 expands that into a curated multi-database dataset so training and evaluation reflect more realistic SQL variety.

From a user perspective, this feels like a repeatable CLI workflow: generate enriched train/eval JSON once, then validate it quickly before downstream training. You get precomputed gold answers, answer types, difficulty labels, and deterministic splits.

---

## What Is Already Proven

### Verified in This Demo Run

- Ran full curation pipeline locally and observed generated outputs: 473 train + 203 eval (676 total).
- Ran `--validate` mode locally and observed successful validation for all 676 records.
- Verified split ratio and database coverage from generated artifacts (`train_ratio=0.6997`, `eval_ratio=0.3003`, `db_count=10`).
- Ran an invalid CLI input case (`--db-list` missing path) and captured the real failure output.
- Ran repository smoke tests (`21 passed`).

### Previously Verified Evidence

- `specs/FEATURES.json` (`verification_evidence` for F004): verifier approved, `uv run pytest tests/ -v`, 21/21 passed at `2026-03-24T21:04:54Z`.
- `specs/F004-IMPLEMENTATION_SPEC.md` (Step 2.3): prior validation evidence recorded for 676 curated records and ~70/30 split.

---

## What Still Needs User Verification

None for local CLI proof.  
Optional product check: decide whether current MVP difficulty skew warnings are acceptable for your training goals.

---

## Quickstart / Verification Steps

> Run these commands to see the feature in action:

```bash
uv run python scripts/curate_questions.py
uv run python scripts/curate_questions.py --validate
```

Requires local Python/uv environment and access to existing project data directories.

---

## Live Local Proof

### Generate the Curated Train/Eval Datasets

This runs the user-facing curation pipeline end-to-end.

```bash
uv run python scripts/curate_questions.py
```

```
WARNING: Difficulty distribution off target: easy=91.72% (target 40%)
WARNING: Difficulty distribution off target: medium=7.40% (target 40%)
WARNING: Difficulty distribution off target: hard=0.89% (target 20%)
Prepared 10 databases in data/databases
Loaded 676 Spider questions
Curated 676 questions (skipped 0)
Validation passed
Wrote 473 train records to data/questions/questions_train.json
Wrote 203 eval records to data/questions/questions_eval.json
```

Notice the pipeline completes successfully and writes both split files.

### Validate Existing Curated Outputs

This is the fast re-check path users can run before training.

```bash
uv run python scripts/curate_questions.py --validate
```

```
WARNING: Difficulty distribution off target: easy=91.72% (target 40%)
WARNING: Difficulty distribution off target: medium=7.40% (target 40%)
WARNING: Difficulty distribution off target: hard=0.89% (target 20%)
Validation passed for 676 curated records
```

Notice validation passes while surfacing non-blocking MVP warnings.

---

## Existing Evidence

- F004 `verification_evidence` in `specs/FEATURES.json`: 21/21 smoke tests passed, verifier status `approved`.
- `specs/F004-IMPLEMENTATION_SPEC.md` Step 2.3: prior recorded split metrics (`473/203`) and validation pass.

---

## Manual Verification Checklist

1. Run full curation command and confirm both JSON files are written.
2. Run `--validate` and confirm exit succeeds with `Validation passed` message.
3. Confirm split counts are close to 70/30.
4. Confirm warnings (if any) match your accepted MVP quality bar.

---

## Edge Cases Exercised

### Boundary Check: Split Ratio and DB Coverage

```bash
uv run python -c "import json; from pathlib import Path; train=json.loads(Path('data/questions/questions_train.json').read_text()); eval_=json.loads(Path('data/questions/questions_eval.json').read_text()); total=len(train)+len(eval_); dbs=sorted({q['database_name'] for q in train+eval_}); print(f'train={len(train)} eval={len(eval_)} total={total} train_ratio={len(train)/total:.4f} eval_ratio={len(eval_)/total:.4f} db_count={len(dbs)}')"
```

```
train=473 eval=203 total=676 train_ratio=0.6997 eval_ratio=0.3003 db_count=10
```

This confirms the split target and multi-database coverage from actual artifacts.

### Error Case: Missing `--db-list` Path

```bash
uv run python scripts/curate_questions.py --db-list data/questions/does_not_exist.json
```

```
Traceback (most recent call last):
  ...
FileNotFoundError: [Errno 2] No such file or directory: 'data/questions/does_not_exist.json'
```

This shows current behavior for invalid input path (real failure output captured).

---

## Test Evidence (Optional)

> Supplementary proof that the repository remains healthy.

| Test Suite | Tests | Status |
|---|---|---|
| `uv run pytest tests/ -v` | 21 | All passed |

Representative command run:

```bash
uv run pytest tests/ -v
```

Result summary: `============================== 21 passed in 8.48s ==============================`

---

## Feature Links

- Implementation spec: `specs/F004-IMPLEMENTATION_SPEC.md`
- Verification spec: `specs/F004-VERIFICATION_SPEC.md`

---

*Demo generated by `feature-demo` agent. Re-run with `/feature-demo F004` to refresh.*