File size: 5,460 Bytes
5dd1bb4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# Feature Demo: F004 — Question Dataset Expansion

> **Generated:** 2026-03-24T21:07:31Z
> **Context source:** spec + discovery only (implementation not read)
> **Feature entry:** [FEATURES.json #F004](./FEATURES.json)

---

## What This Feature Does

Before this feature, training data came from a single database and could overfit to one schema. F004 expands that into a curated multi-database dataset so training and evaluation reflect more realistic SQL variety.

From a user perspective, this feels like a repeatable CLI workflow: generate enriched train/eval JSON once, then validate it quickly before downstream training. You get precomputed gold answers, answer types, difficulty labels, and deterministic splits.

---

## What Is Already Proven

### Verified in This Demo Run

- Ran full curation pipeline locally and observed generated outputs: 473 train + 203 eval (676 total).
- Ran `--validate` mode locally and observed successful validation for all 676 records.
- Verified split ratio and database coverage from generated artifacts (`train_ratio=0.6997`, `eval_ratio=0.3003`, `db_count=10`).
- Ran an invalid CLI input case (`--db-list` missing path) and captured the real failure output.
- Ran repository smoke tests (`21 passed`).

### Previously Verified Evidence

- `specs/FEATURES.json` (`verification_evidence` for F004): verifier approved, `uv run pytest tests/ -v`, 21/21 passed at `2026-03-24T21:04:54Z`.
- `specs/F004-IMPLEMENTATION_SPEC.md` (Step 2.3): prior validation evidence recorded for 676 curated records and ~70/30 split.

---

## What Still Needs User Verification

None for local CLI proof.  
Optional product check: decide whether current MVP difficulty skew warnings are acceptable for your training goals.

---

## Quickstart / Verification Steps

> Run these commands to see the feature in action:

```bash
uv run python scripts/curate_questions.py
uv run python scripts/curate_questions.py --validate
```

Requires local Python/uv environment and access to existing project data directories.

---

## Live Local Proof

### Generate the Curated Train/Eval Datasets

This runs the user-facing curation pipeline end-to-end.

```bash
uv run python scripts/curate_questions.py
```

```
WARNING: Difficulty distribution off target: easy=91.72% (target 40%)
WARNING: Difficulty distribution off target: medium=7.40% (target 40%)
WARNING: Difficulty distribution off target: hard=0.89% (target 20%)
Prepared 10 databases in data/databases
Loaded 676 Spider questions
Curated 676 questions (skipped 0)
Validation passed
Wrote 473 train records to data/questions/questions_train.json
Wrote 203 eval records to data/questions/questions_eval.json
```

Notice the pipeline completes successfully and writes both split files.

### Validate Existing Curated Outputs

This is the fast re-check path users can run before training.

```bash
uv run python scripts/curate_questions.py --validate
```

```
WARNING: Difficulty distribution off target: easy=91.72% (target 40%)
WARNING: Difficulty distribution off target: medium=7.40% (target 40%)
WARNING: Difficulty distribution off target: hard=0.89% (target 20%)
Validation passed for 676 curated records
```

Notice validation passes while surfacing non-blocking MVP warnings.

---

## Existing Evidence

- F004 `verification_evidence` in `specs/FEATURES.json`: 21/21 smoke tests passed, verifier status `approved`.
- `specs/F004-IMPLEMENTATION_SPEC.md` Step 2.3: prior recorded split metrics (`473/203`) and validation pass.

---

## Manual Verification Checklist

1. Run full curation command and confirm both JSON files are written.
2. Run `--validate` and confirm exit succeeds with `Validation passed` message.
3. Confirm split counts are close to 70/30.
4. Confirm warnings (if any) match your accepted MVP quality bar.

---

## Edge Cases Exercised

### Boundary Check: Split Ratio and DB Coverage

```bash
uv run python -c "import json; from pathlib import Path; train=json.loads(Path('data/questions/questions_train.json').read_text()); eval_=json.loads(Path('data/questions/questions_eval.json').read_text()); total=len(train)+len(eval_); dbs=sorted({q['database_name'] for q in train+eval_}); print(f'train={len(train)} eval={len(eval_)} total={total} train_ratio={len(train)/total:.4f} eval_ratio={len(eval_)/total:.4f} db_count={len(dbs)}')"
```

```
train=473 eval=203 total=676 train_ratio=0.6997 eval_ratio=0.3003 db_count=10
```

This confirms the split target and multi-database coverage from actual artifacts.

### Error Case: Missing `--db-list` Path

```bash
uv run python scripts/curate_questions.py --db-list data/questions/does_not_exist.json
```

```
Traceback (most recent call last):
  ...
FileNotFoundError: [Errno 2] No such file or directory: 'data/questions/does_not_exist.json'
```

This shows current behavior for invalid input path (real failure output captured).

---

## Test Evidence (Optional)

> Supplementary proof that the repository remains healthy.

| Test Suite | Tests | Status |
|---|---|---|
| `uv run pytest tests/ -v` | 21 | All passed |

Representative command run:

```bash
uv run pytest tests/ -v
```

Result summary: `============================== 21 passed in 8.48s ==============================`

---

## Feature Links

- Implementation spec: `specs/F004-IMPLEMENTATION_SPEC.md`
- Verification spec: `specs/F004-VERIFICATION_SPEC.md`

---

*Demo generated by `feature-demo` agent. Re-run with `/feature-demo F004` to refresh.*