Eishaan commited on
Commit
71fa486
·
1 Parent(s): b08a347

feat: expand to 7 tasks (2E/3M/2H) + engine hardening

Browse files

- Add 4 new tasks: soft-delete-restoration, schema-version-merge,
multi-entity-extraction, dual-source-consolidation
- Harden grader: PRAGMA bypass fix, sqlite_% table filtering
- Harden environment: multi-statement error handling, schema pollution
- Fix inference context window: target DDL baked into system prompt
- Per-task step budgets: Easy=10, Medium=15, Hard=20
- Update app.py, README.md for 7-task architecture
- All tests passing

README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: SQL Migration Agent
3
- emoji: 🗄️
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
@@ -11,10 +11,9 @@ tags:
11
 
12
  # SQL Schema Migration Agent
13
 
14
-
15
  > **An OpenEnv environment for benchmarking autonomous database migration agents.**
16
  >
17
- > Built for the Meta × Hugging Face OpenEnv Hackathon.
18
 
19
  ---
20
 
@@ -22,7 +21,7 @@ tags:
22
 
23
  Database schema migrations are among the most error-prone, high-stakes tasks in software engineering. Every production system faces them as application models evolve, yet they are extremely difficult to automate safely because data must be perfectly preserved.
24
 
25
- This environment trains AI agents to autonomously reconcile schema drift the exact way a real CI/CD pipeline would given a flawed current state and an ideal target state, the agent must compute and safely execute the transformation sequence using raw SQL.
26
 
27
  **Real-world analogues:** `Flyway`, `Liquibase`, Django `makemigrations`, `Terraform` state transitions. This environment models that exact problem, reduced to an agentic RL core.
28
 
@@ -32,19 +31,24 @@ This environment trains AI agents to autonomously reconcile schema drift the exa
32
 
33
  Unlike simplistic environments that merely string-match SQL schemas, this environment uses a **deep structural reconciliation grader** built specifically to prevent LLM gamification:
34
 
35
- 1. **Zero-Sum Exploit Protection:** Naive agents will often execute `DROP TABLE x; CREATE TABLE x (...)` to easily match the target schema, silently destroying all data. Our grader actively runs `SELECT COUNT(*)` and data-integrity hashing. If a table's schema matches but the data is gone, the score is brutally clamped to `0.01`.
36
- 2. **Granular Partial Credit:** Multi-step migrations (like Task 3's 4-table cascade) require 15+ steps. Binary pass/fail rewards provide zero learning signal. Our grader assigns fractional weights to individual FK constraints, data type coercions, and orphaned record audit logs, providing continuous RL reward gradients.
37
- 3. **Deterministic Adversarial Seeds:** Our injected data isn't generic. It includes edge cases that break naive SQL (e.g. `O'Brien` testing quote-escaping parametrization) and orphaned foreign keys testing `CASCADE` knowledge.
 
38
 
39
  ---
40
 
41
- ## Tasks
42
 
43
- | # | Name | Difficulty | Description |
44
- |---|------|-----------|-------------|
45
- | 1 | `column-restructure` | Easy | Merge `first_name` + `last_name` `full_name` without data loss. Adversarial: apostrophes (`O'Brien`), mid-caps (`McDonald`) |
46
- | 2 | `table-normalization` | Medium | Decompose flat `purchases` into `customers` + `orders` with FK. Adversarial: duplicate emails (`alice@` ×3), commas in item names |
47
- | 3 | `cascade-migration` | Hard | 4-table FK cascade: type coercion (`$90000` TEXT `90000` INTEGER), orphan audit logging, NULL salary removal, full FK chain enforcement |
 
 
 
 
48
 
49
  ---
50
 
@@ -55,8 +59,8 @@ Unlike simplistic environments that merely string-match SQL schemas, this enviro
55
  | `current_schema_sql` | `str` | Current database DDL extracted from `sqlite_master` |
56
  | `target_schema_sql` | `str` | Target DDL the agent must reach |
57
  | `last_execution_result` | `str` | Result of last SQL execution, or error message |
58
- | `step_number` | `int` | Current step count (0–20) |
59
- | `migration_progress` | `float` | Current grader score [0.0–1.0] |
60
  | `task_name` | `str` | Name of the active task |
61
  | `done` | `bool` | Whether the episode has terminated |
62
  | `reward` | `float` | Step reward: score delta from previous step (can be negative) |
@@ -73,26 +77,11 @@ Unlike simplistic environments that merely string-match SQL schemas, this enviro
73
 
74
  ## Reward Function
75
 
76
- - **Step reward**: Delta between current and previous migration score. Strongly negative for destructive actions (e.g., wrong DROP TABLE -0.4).
77
- - **Episode score**: Clamped to [0.0, 1.0]. Final state wins regressions hurt.
78
- - **Exploit protection**: If schema matches target, but tables are empty (agent deleted data), score is capped at 0.1.
79
- - **Auto-termination**: Episode ends immediately when score reaches 1.0, preventing post-success regression.
80
-
81
- ### Task 3 Scoring Breakdown
82
-
83
- | Check | Weight | Description |
84
- |-------|--------|-------------|
85
- | `audit_log` exists | 0.10 | Orphan audit table created |
86
- | `audit_log` row count ≥ 3 | 0.10 | All orphaned/invalid records logged |
87
- | Correct audit entries | 0.20 | Right `(source_table, reason)` pairs |
88
- | FK: `departments→companies` | 0.05 | FK chain step 1 |
89
- | FK: `employees→departments` | 0.05 | FK chain step 2 |
90
- | FK: `assets→employees` | 0.05 | FK chain step 3 |
91
- | `companies.name` NOT NULL | 0.05 | Constraint enforcement |
92
- | Employee count = 4 | 0.05 | Hal Patel (NULL salary) removed |
93
- | Salary coercion correct | 0.15 | All `$90000` → `90000` INTEGER |
94
- | No orphaned assets | 0.10 | All `asset.employee_id` valid |
95
- | `PRAGMA integrity_check` | 0.10 | Full DB integrity passes |
96
 
97
  ---
98
 
@@ -102,7 +91,7 @@ Unlike simplistic environments that merely string-match SQL schemas, this enviro
102
  # Install dependencies
103
  pip install -r requirements.txt
104
 
105
- # Run baseline inference
106
  export HF_TOKEN=your_token_here
107
  export API_BASE_URL=https://router.huggingface.co/v1
108
  export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
@@ -110,6 +99,7 @@ python inference.py
110
 
111
  # Run validation tests
112
  python test_smoke.py
 
113
 
114
  # Start environment server locally
115
  uvicorn server.app:app --host 0.0.0.0 --port 7860
@@ -125,7 +115,7 @@ uvicorn server.app:app --host 0.0.0.0 --port 7860
125
  | `/reset` | POST | Reset environment, returns initial observation |
126
  | `/step` | POST | Execute action, returns observation + reward |
127
  | `/state` | GET | Current environment state |
128
- | `/tasks` | GET | List all 3 tasks with descriptions |
129
  | `/grader` | POST | Run grader on all tasks, return scores |
130
  | `/schema` | GET | OpenEnv schema (action/observation types) |
131
  | `/ws` | WS | WebSocket for real-time interaction |
@@ -152,10 +142,13 @@ docker run -p 7860:7860 \
152
 
153
  | Task | Score | Steps | Model |
154
  |------|-------|-------|-------|
155
- | `column-restructure` | 1.00 | 4 | qwen/qwen3-32b |
156
- | `table-normalization` | 1.00 | 5-8 | qwen/qwen3-32b |
157
- | `cascade-migration` | 0.30–0.65 | 15-20 | qwen/qwen3-32b |
158
- | **Average** | **0.77** | | |
 
 
 
159
 
160
  ---
161
 
@@ -163,9 +156,10 @@ docker run -p 7860:7860 \
163
 
164
  - [x] `docker build` succeeds
165
  - [x] `curl /health` returns 200
166
- - [x] `curl /tasks` returns 3 tasks
167
  - [x] `curl -X POST /reset` returns valid observation
168
  - [x] `openenv validate` passes
169
- - [x] Baseline script completes all 3 tasks without crashing
170
- - [x] Grader scores in [0.0, 1.0] range
171
  - [x] Exploit protection: empty-table shortcuts penalized
 
 
1
  ---
2
  title: SQL Migration Agent
3
+ emoji: "\U0001F5C4\uFE0F"
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: docker
 
11
 
12
  # SQL Schema Migration Agent
13
 
 
14
  > **An OpenEnv environment for benchmarking autonomous database migration agents.**
15
  >
16
+ > Built for the Meta x Hugging Face OpenEnv Hackathon.
17
 
18
  ---
19
 
 
21
 
22
  Database schema migrations are among the most error-prone, high-stakes tasks in software engineering. Every production system faces them as application models evolve, yet they are extremely difficult to automate safely because data must be perfectly preserved.
23
 
24
+ This environment trains AI agents to autonomously reconcile schema drift the exact way a real CI/CD pipeline would -- given a flawed current state and an ideal target state, the agent must compute and safely execute the transformation sequence using raw SQL.
25
 
26
  **Real-world analogues:** `Flyway`, `Liquibase`, Django `makemigrations`, `Terraform` state transitions. This environment models that exact problem, reduced to an agentic RL core.
27
 
 
31
 
32
  Unlike simplistic environments that merely string-match SQL schemas, this environment uses a **deep structural reconciliation grader** built specifically to prevent LLM gamification:
33
 
34
+ 1. **Zero-Sum Exploit Protection:** Naive agents will often execute `DROP TABLE x; CREATE TABLE x (...)` to easily match the target schema, silently destroying all data. Our grader actively runs `SELECT COUNT(*)`, `SUM(id)`, and data-integrity fingerprinting. If a table's schema matches but the data is gone, the score is brutally clamped to `0.01`.
35
+ 2. **PRAGMA Bypass Prevention:** The grader re-asserts `PRAGMA foreign_keys = ON` before every scoring pass, preventing agents from disabling FK constraints to cheat.
36
+ 3. **Granular Partial Credit:** Multi-step migrations (like Task 7's 6-to-4 table consolidation) require 18+ steps. Binary pass/fail rewards provide zero learning signal. Our grader assigns fractional weights to individual FK constraints, data type coercions, and orphaned record audit logs, providing continuous RL reward gradients.
37
+ 4. **Deterministic Adversarial Seeds:** Our injected data includes edge cases that break naive SQL: `O'Brien` (apostrophes), `$1,234.56` (comma+dollar coercion), orphaned foreign keys, NULL emails, and leading whitespace in emails.
38
 
39
  ---
40
 
41
+ ## Tasks (2 Easy / 3 Medium / 2 Hard)
42
 
43
+ | # | Name | Difficulty | Steps | Description |
44
+ |---|------|-----------|-------|-------------|
45
+ | 1 | `column-restructure` | Easy | 10 | Merge `first_name` + `last_name` into `full_name` without data loss. Adversarial: apostrophes (`O'Brien`), mid-caps (`McDonald`) |
46
+ | 2 | `soft-delete-restoration` | Easy | 10 | Restore deleted products from `deletion_log`, add `is_deleted`/`deleted_at` columns. Adversarial: `stock=0` must not be confused with `is_deleted=1` |
47
+ | 3 | `table-normalization` | Medium | 15 | Decompose flat `purchases` into `customers` + `orders` with FK. Adversarial: duplicate emails (x3), commas in item names |
48
+ | 4 | `schema-version-merge` | Medium | 15 | Merge overlapping `products_v1` (TEXT prices) and `products_v2` (REAL prices) with conflict resolution and `source` tracking. Adversarial: `$XX.XX` coercion, NULL category, high ID=101 |
49
+ | 5 | `multi-entity-extraction` | Medium | 15 | Decompose `sales_records` god-table into 3NF (5 tables) with 3 FKs and invalid data routing. Adversarial: leading whitespace email, empty email, comma in SKU |
50
+ | 6 | `cascade-migration` | Hard | 20 | 4-table FK cascade: type coercion (`$90000` TEXT to `90000` INTEGER), orphan audit logging, NULL salary removal, full FK chain enforcement |
51
+ | 7 | `dual-source-consolidation` | Hard | 20 | Merge 6 tables from two incompatible systems (Legacy CRM + Modern SaaS) into 4 unified tables with cross-system email dedup, currency coercion, orphan detection |
52
 
53
  ---
54
 
 
59
  | `current_schema_sql` | `str` | Current database DDL extracted from `sqlite_master` |
60
  | `target_schema_sql` | `str` | Target DDL the agent must reach |
61
  | `last_execution_result` | `str` | Result of last SQL execution, or error message |
62
+ | `step_number` | `int` | Current step count |
63
+ | `migration_progress` | `float` | Current grader score [0.01-0.99] |
64
  | `task_name` | `str` | Name of the active task |
65
  | `done` | `bool` | Whether the episode has terminated |
66
  | `reward` | `float` | Step reward: score delta from previous step (can be negative) |
 
77
 
78
  ## Reward Function
79
 
80
+ - **Step reward**: Delta between current and previous migration score. Strongly negative for destructive actions (e.g., wrong DROP TABLE leads to -0.4).
81
+ - **Episode score**: Clamped to (0.01, 0.99). Final state wins -- regressions hurt.
82
+ - **Exploit protection**: If schema matches target but tables are empty (agent deleted data), score is capped at 0.01.
83
+ - **PRAGMA protection**: `PRAGMA foreign_keys = ON` is re-asserted before every grading pass.
84
+ - **Auto-termination**: Episode ends immediately when score reaches 0.99, preventing post-success regression.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ---
87
 
 
91
  # Install dependencies
92
  pip install -r requirements.txt
93
 
94
+ # Run baseline inference (requires HF_TOKEN)
95
  export HF_TOKEN=your_token_here
96
  export API_BASE_URL=https://router.huggingface.co/v1
97
  export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
 
99
 
100
  # Run validation tests
101
  python test_smoke.py
102
+ python test_all_tasks.py
103
 
104
  # Start environment server locally
105
  uvicorn server.app:app --host 0.0.0.0 --port 7860
 
115
  | `/reset` | POST | Reset environment, returns initial observation |
116
  | `/step` | POST | Execute action, returns observation + reward |
117
  | `/state` | GET | Current environment state |
118
+ | `/tasks` | GET | List all 7 tasks with descriptions |
119
  | `/grader` | POST | Run grader on all tasks, return scores |
120
  | `/schema` | GET | OpenEnv schema (action/observation types) |
121
  | `/ws` | WS | WebSocket for real-time interaction |
 
142
 
143
  | Task | Score | Steps | Model |
144
  |------|-------|-------|-------|
145
+ | `column-restructure` | 0.99 | 4 | Qwen/Qwen2.5-72B-Instruct |
146
+ | `soft-delete-restoration` | 0.99 | 5-7 | Qwen/Qwen2.5-72B-Instruct |
147
+ | `table-normalization` | 0.99 | 5-8 | Qwen/Qwen2.5-72B-Instruct |
148
+ | `schema-version-merge` | 0.60-0.85 | 8-12 | Qwen/Qwen2.5-72B-Instruct |
149
+ | `multi-entity-extraction` | 0.40-0.70 | 12-15 | Qwen/Qwen2.5-72B-Instruct |
150
+ | `cascade-migration` | 0.30-0.65 | 15-20 | Qwen/Qwen2.5-72B-Instruct |
151
+ | `dual-source-consolidation` | 0.20-0.50 | 18-20 | Qwen/Qwen2.5-72B-Instruct |
152
 
153
  ---
154
 
 
156
 
157
  - [x] `docker build` succeeds
158
  - [x] `curl /health` returns 200
159
+ - [x] `curl /tasks` returns 7 tasks
160
  - [x] `curl -X POST /reset` returns valid observation
161
  - [x] `openenv validate` passes
162
+ - [x] Baseline script completes all 7 tasks without crashing
163
+ - [x] Grader scores in (0.01, 0.99) range
164
  - [x] Exploit protection: empty-table shortcuts penalized
165
+ - [x] PRAGMA bypass protection enforced
__pycache__/seeds.cpython-312.pyc CHANGED
Binary files a/__pycache__/seeds.cpython-312.pyc and b/__pycache__/seeds.cpython-312.pyc differ
 
inference.py CHANGED
@@ -30,7 +30,7 @@ HF_TOKEN = os.getenv("HF_TOKEN") # No default — must be set by user
30
  # Also support OPENAI_API_KEY as primary (per spec) and API_KEY as alias
31
  API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN or os.getenv("API_KEY")
32
 
33
- SYSTEM_PROMPT = """You are an autonomous SQLite database migration engine. You receive the current schema and a target schema. Write SQL to transform the current state to the target state without losing row data.
34
 
35
  CRITICAL — SQLite-specific rules (violations cause immediate errors):
36
  1. SQLite does NOT support ALTER TABLE ADD CONSTRAINT — never use it.
@@ -45,11 +45,22 @@ CRITICAL — SQLite-specific rules (violations cause immediate errors):
45
  10. Execute exactly ONE SQL statement per step.
46
  11. When migration is complete (schemas match, data preserved), set submit_final to true IMMEDIATELY.
47
 
48
- Respond ONLY with valid JSON no markdown, no code blocks, no text outside the object:
49
- {"sql_command": "your SQL here", "reasoning": "why", "submit_final": false}"""
50
 
51
- ALL_TASKS = ["column-restructure", "table-normalization", "cascade-migration"]
52
- MAX_STEPS = 20 # 20 gives Task 3 enough budget for 4-table cascade + audit
 
 
 
 
 
 
 
 
 
 
 
53
  MAX_PARSE_ERRORS = 5 # Higher tolerance for thinking models (Qwen3, DeepSeek-R1)
54
 
55
  # Auto-submit threshold: if migration_progress >= this, force submit_final
@@ -139,18 +150,24 @@ def run_task_local(task_name: str) -> dict:
139
  sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
140
  from server.environment import DbMigrationEnvironment
141
  from models import MigrationAction
 
142
 
143
  env = DbMigrationEnvironment(task_name=task_name)
144
 
 
 
 
145
  print(f"[START] task={task_name} env=sql-migration-agent model={MODEL_NAME}", flush=True)
146
 
147
  obs = env.reset()
148
- history = [{"role": "system", "content": SYSTEM_PROMPT}]
149
 
150
- # Initial observation message
 
 
 
 
151
  initial_msg = (
152
  f"CURRENT DATABASE SCHEMA:\n{obs.current_schema_sql}\n\n"
153
- f"TARGET SCHEMA:\n{obs.target_schema_sql}\n\n"
154
  f"Status: {obs.last_execution_result}\n"
155
  f"Migration progress: {obs.migration_progress:.2f}\n\n"
156
  f"Write your first SQL command to begin the migration."
@@ -164,7 +181,7 @@ def run_task_local(task_name: str) -> dict:
164
  done = False
165
  peak_score = 0.0 # Track the highest score we've reached
166
 
167
- for step in range(MAX_STEPS):
168
  if done:
169
  break
170
 
@@ -253,10 +270,11 @@ def run_task_local(task_name: str) -> dict:
253
  # Add to conversation history
254
  history.append({"role": "assistant", "content": json.dumps(action_dict)})
255
 
 
256
  feedback_msg = (
257
  f"EXECUTION RESULT: {obs.last_execution_result}\n\n"
258
  f"CURRENT SCHEMA:\n{obs.current_schema_sql}\n\n"
259
- f"Migration progress: {obs.migration_progress:.2f}"
260
  )
261
  if done:
262
  feedback_msg += "\n\nEpisode complete."
@@ -309,11 +327,9 @@ def main():
309
  # Summary
310
  scores = list(results.values())
311
  avg = sum(scores) / len(scores) if scores else 0.0
 
312
  print(
313
- f"[SUMMARY] task1={results.get('column-restructure', 0):.2f} "
314
- f"task2={results.get('table-normalization', 0):.2f} "
315
- f"task3={results.get('cascade-migration', 0):.2f} "
316
- f"avg={avg:.2f}",
317
  flush=True,
318
  )
319
 
 
30
  # Also support OPENAI_API_KEY as primary (per spec) and API_KEY as alias
31
  API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN or os.getenv("API_KEY")
32
 
33
+ SYSTEM_PROMPT_TEMPLATE = """You are an autonomous SQLite database migration engine. You receive the current schema and a target schema. Write SQL to transform the current state to the target state without losing row data.
34
 
35
  CRITICAL — SQLite-specific rules (violations cause immediate errors):
36
  1. SQLite does NOT support ALTER TABLE ADD CONSTRAINT — never use it.
 
45
  10. Execute exactly ONE SQL statement per step.
46
  11. When migration is complete (schemas match, data preserved), set submit_final to true IMMEDIATELY.
47
 
48
+ TARGET SCHEMA (fixedachieve this exactly):
49
+ {target_ddl}
50
 
51
+ Respond ONLY with valid JSON — no markdown, no code blocks, no text outside the object:
52
+ {{"sql_command": "your SQL here", "reasoning": "why", "submit_final": false}}"""
53
+
54
+ ALL_TASKS = [
55
+ "column-restructure",
56
+ "soft-delete-restoration",
57
+ "table-normalization",
58
+ "schema-version-merge",
59
+ "multi-entity-extraction",
60
+ "cascade-migration",
61
+ "dual-source-consolidation",
62
+ ]
63
+ MAX_STEPS = 20 # Global fallback; per-task limits override this
64
  MAX_PARSE_ERRORS = 5 # Higher tolerance for thinking models (Qwen3, DeepSeek-R1)
65
 
66
  # Auto-submit threshold: if migration_progress >= this, force submit_final
 
150
  sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
151
  from server.environment import DbMigrationEnvironment
152
  from models import MigrationAction
153
+ import seeds
154
 
155
  env = DbMigrationEnvironment(task_name=task_name)
156
 
157
+ # Use task-specific step budget (defaults to global MAX_STEPS)
158
+ task_max_steps = seeds.TASKS.get(task_name, {}).get("max_steps", MAX_STEPS)
159
+
160
  print(f"[START] task={task_name} env=sql-migration-agent model={MODEL_NAME}", flush=True)
161
 
162
  obs = env.reset()
 
163
 
164
+ # Build task-specific system prompt with target DDL baked in (sent ONCE)
165
+ task_system_prompt = SYSTEM_PROMPT_TEMPLATE.format(target_ddl=obs.target_schema_sql)
166
+ history = [{"role": "system", "content": task_system_prompt}]
167
+
168
+ # Initial observation — only current schema (target already in system prompt)
169
  initial_msg = (
170
  f"CURRENT DATABASE SCHEMA:\n{obs.current_schema_sql}\n\n"
 
171
  f"Status: {obs.last_execution_result}\n"
172
  f"Migration progress: {obs.migration_progress:.2f}\n\n"
173
  f"Write your first SQL command to begin the migration."
 
181
  done = False
182
  peak_score = 0.0 # Track the highest score we've reached
183
 
184
+ for step in range(task_max_steps):
185
  if done:
186
  break
187
 
 
270
  # Add to conversation history
271
  history.append({"role": "assistant", "content": json.dumps(action_dict)})
272
 
273
+ # Lean feedback — target is already in the system prompt, no need to repeat
274
  feedback_msg = (
275
  f"EXECUTION RESULT: {obs.last_execution_result}\n\n"
276
  f"CURRENT SCHEMA:\n{obs.current_schema_sql}\n\n"
277
+ f"Progress: {obs.migration_progress:.2f}"
278
  )
279
  if done:
280
  feedback_msg += "\n\nEpisode complete."
 
327
  # Summary
328
  scores = list(results.values())
329
  avg = sum(scores) / len(scores) if scores else 0.0
330
+ scores_str = " ".join(f"{t}={s:.2f}" for t, s in results.items())
331
  print(
332
+ f"[SUMMARY] {scores_str} avg={avg:.2f}",
 
 
 
333
  flush=True,
334
  )
335
 
seeds.py CHANGED
@@ -255,6 +255,429 @@ def seed_task3(conn: sqlite3.Connection) -> None:
255
  conn.commit()
256
 
257
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
258
  # =============================================================================
259
  # Task Registry
260
  # =============================================================================
@@ -265,17 +688,49 @@ TASKS = {
265
  "target_ddl": TASK1_TARGET_DDL,
266
  "description": "Merge first_name and last_name into a single full_name column without data loss",
267
  "difficulty": "easy",
 
 
 
 
 
 
 
 
268
  },
269
  "table-normalization": {
270
  "seed_fn": seed_task2,
271
  "target_ddl": TASK2_TARGET_DDL,
272
  "description": "Decompose a flat purchases table into normalized customers and orders tables with FK",
273
  "difficulty": "medium",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
274
  },
275
  "cascade-migration": {
276
  "seed_fn": seed_task3,
277
  "target_ddl": TASK3_TARGET_DDL,
278
  "description": "Multi-table FK cascade with type coercion, NULL handling, and orphan audit logging",
279
  "difficulty": "hard",
 
 
 
 
 
 
 
 
280
  },
281
  }
 
 
255
  conn.commit()
256
 
257
 
258
+ # =============================================================================
259
+ # TASK 4: Soft-Delete Restoration (Easy)
260
+ # =============================================================================
261
+ # Agent must restore deleted products from a deletion_log, add is_deleted/deleted_at columns.
262
+ # Adversarial: "O'Brien Desk" (apostrophe), stock=0 on Webcam (must NOT confuse with is_deleted).
263
+
264
+ TASK4_SOURCE_DDL = """
265
+ CREATE TABLE products (
266
+ id INTEGER PRIMARY KEY,
267
+ name TEXT NOT NULL,
268
+ price REAL NOT NULL,
269
+ stock INTEGER NOT NULL
270
+ );
271
+
272
+ CREATE TABLE deletion_log (
273
+ id INTEGER PRIMARY KEY,
274
+ product_id INTEGER NOT NULL,
275
+ product_name TEXT NOT NULL,
276
+ product_price REAL NOT NULL,
277
+ product_stock INTEGER NOT NULL,
278
+ deleted_at TEXT NOT NULL
279
+ );
280
+ """
281
+
282
+ TASK4_PRODUCTS_DATA = [
283
+ (1, "Laptop", 999.99, 15),
284
+ (2, "O'Brien Desk", 249.99, 8),
285
+ (3, "Monitor", 399.99, 23),
286
+ (4, "Keyboard", 89.99, 45),
287
+ (5, "Mouse", 29.99, 102),
288
+ ]
289
+
290
+ TASK4_DELETION_LOG_DATA = [
291
+ (1, 6, "Headphones", 149.99, 30, "2024-01-15"),
292
+ (2, 7, "Webcam", 79.99, 0, "2024-02-20"), # stock=0 but NOT is_deleted=1
293
+ (3, 8, "USB-C Hub", 49.99, 12, "2024-03-10"),
294
+ ]
295
+
296
+ TASK4_TARGET_DDL = """CREATE TABLE products (
297
+ id INTEGER PRIMARY KEY,
298
+ name TEXT NOT NULL,
299
+ price REAL NOT NULL,
300
+ stock INTEGER NOT NULL,
301
+ is_deleted INTEGER NOT NULL DEFAULT 0,
302
+ deleted_at TEXT
303
+ );"""
304
+
305
+ TASK4_EXPECTED_ROW_COUNT = 8
306
+ TASK4_EXPECTED_ID_SUM = 36 # 1+2+3+4+5+6+7+8
307
+ TASK4_EXPECTED_DELETED_COUNT = 3 # ids 6,7,8
308
+ TASK4_EXPECTED_ACTIVE_COUNT = 5 # ids 1-5
309
+
310
+
311
+ def seed_task4(conn: sqlite3.Connection) -> None:
312
+ """Seed the database for Task 4: Soft-Delete Restoration."""
313
+ conn.executescript(TASK4_SOURCE_DDL)
314
+ conn.executemany(
315
+ "INSERT INTO products (id, name, price, stock) VALUES (?, ?, ?, ?)",
316
+ TASK4_PRODUCTS_DATA,
317
+ )
318
+ conn.executemany(
319
+ "INSERT INTO deletion_log (id, product_id, product_name, product_price, product_stock, deleted_at) "
320
+ "VALUES (?, ?, ?, ?, ?, ?)",
321
+ TASK4_DELETION_LOG_DATA,
322
+ )
323
+ conn.commit()
324
+
325
+
326
+ # =============================================================================
327
+ # TASK 5: Schema Version Merge (Medium)
328
+ # =============================================================================
329
+ # Agent must merge products_v1 (price as "$XX.XX" TEXT) and products_v2 (price as REAL)
330
+ # into a single products table. v2 wins on ID conflicts. Must add source column.
331
+ # Adversarial: id=101 high ID, NULL category, "$" price coercion, conflicting rows.
332
+
333
+ TASK5_SOURCE_DDL = """
334
+ CREATE TABLE products_v1 (
335
+ id INTEGER PRIMARY KEY,
336
+ name TEXT NOT NULL,
337
+ price TEXT NOT NULL,
338
+ category TEXT,
339
+ supplier TEXT
340
+ );
341
+
342
+ CREATE TABLE products_v2 (
343
+ id INTEGER PRIMARY KEY,
344
+ name TEXT NOT NULL,
345
+ unit_cost REAL NOT NULL,
346
+ category TEXT NOT NULL,
347
+ brand TEXT,
348
+ sku TEXT
349
+ );
350
+ """
351
+
352
+ TASK5_V1_DATA = [
353
+ (1, "Widget A", "$12.50", "Electronics", "AcmeCo"),
354
+ (2, "Widget B", "$8.99", "Electronics", "AcmeCo"),
355
+ (3, "Gadget X", "$45.00", None, "TechCorp"),
356
+ (4, "Gadget Y", "$32.50", "Tools", "TechCorp"),
357
+ (5, "Doohickey", "$5.99", "Office", "SupplyPro"),
358
+ (101, "Legacy Item", "$99.99", "Electronics", "OldCo"),
359
+ ]
360
+
361
+ TASK5_V2_DATA = [
362
+ (1, "Widget A", 12.50, "Electronics", "AcmeCo", "SKU-001"),
363
+ (2, "Widget B Updated", 9.99, "Electronics", "AcmeCo", "SKU-002"),
364
+ (6, "New Product F", 67.00, "Tools", "NewCorp", "SKU-006"),
365
+ (7, "New Product G", 23.50, "Office", "NewCorp", "SKU-007"),
366
+ (8, "New Product H", 11.00, "Electronics", "ImportCo","SKU-008"),
367
+ ]
368
+
369
+ TASK5_TARGET_DDL = """CREATE TABLE products (
370
+ id INTEGER PRIMARY KEY,
371
+ name TEXT NOT NULL,
372
+ price REAL NOT NULL,
373
+ category TEXT,
374
+ supplier TEXT,
375
+ brand TEXT,
376
+ sku TEXT,
377
+ source TEXT NOT NULL
378
+ );"""
379
+
380
+ TASK5_EXPECTED_ROW_COUNT = 9
381
+ TASK5_EXPECTED_PRICE_SUM = round(12.50 + 9.99 + 45.00 + 32.50 + 5.99 + 99.99 + 67.00 + 23.50 + 11.00, 2)
382
+ TASK5_EXPECTED_BOTH_COUNT = 2 # ids 1 and 2
383
+
384
+
385
+ def seed_task5(conn: sqlite3.Connection) -> None:
386
+ """Seed the database for Task 5: Schema Version Merge."""
387
+ conn.executescript(TASK5_SOURCE_DDL)
388
+ conn.executemany(
389
+ "INSERT INTO products_v1 (id, name, price, category, supplier) VALUES (?, ?, ?, ?, ?)",
390
+ TASK5_V1_DATA,
391
+ )
392
+ conn.executemany(
393
+ "INSERT INTO products_v2 (id, name, unit_cost, category, brand, sku) VALUES (?, ?, ?, ?, ?, ?)",
394
+ TASK5_V2_DATA,
395
+ )
396
+ conn.commit()
397
+
398
+
399
+ # =============================================================================
400
+ # TASK 6: Multi-Entity Extraction (Medium — Hard End)
401
+ # =============================================================================
402
+ # Agent must decompose a sales_records god-table into 3NF (5 tables).
403
+ # Adversarial: leading whitespace email, empty customer email, comma in SKU.
404
+
405
+ TASK6_SOURCE_DDL = """
406
+ CREATE TABLE sales_records (
407
+ id INTEGER PRIMARY KEY,
408
+ rep_name TEXT NOT NULL,
409
+ rep_email TEXT NOT NULL,
410
+ rep_region TEXT NOT NULL,
411
+ customer_name TEXT NOT NULL,
412
+ customer_email TEXT NOT NULL,
413
+ customer_tier TEXT NOT NULL,
414
+ product_name TEXT NOT NULL,
415
+ product_sku TEXT NOT NULL,
416
+ product_category TEXT NOT NULL,
417
+ quantity INTEGER NOT NULL,
418
+ unit_price REAL NOT NULL,
419
+ discount_pct INTEGER NOT NULL DEFAULT 0,
420
+ sale_date TEXT NOT NULL
421
+ );
422
+ """
423
+
424
+ TASK6_SOURCE_DATA = [
425
+ (1, "Alice Chen", " alice@company.com", "North", "Globex Corp", "globex@corp.com", "enterprise", "Widget Pro", "WIDGET-001", "Electronics", 5, 299.99, 10, "2024-01-10"),
426
+ (2, "Alice Chen", "alice@company.com", "North", "Initech LLC", "info@initech.com", "basic", "Widget Pro", "WIDGET-001", "Electronics", 2, 299.99, 0, "2024-01-15"),
427
+ (3, "Bob Martinez", "bob@company.com", "South", "Globex Corp", "globex@corp.com", "enterprise", "Gadget X", "GADGET-X01", "Hardware", 10, 89.99, 5, "2024-01-20"),
428
+ (4, "Bob Martinez", "bob@company.com", "South", "Umbrella Inc", "sales@umbrella.co", "premium", "Gadget X", "GADGET-X01", "Hardware", 3, 89.99, 0, "2024-02-01"),
429
+ (5, "Carol White", "carol@company.com", "East", "Initech LLC", "info@initech.com", "basic", "Tool Kit", "TOOLS,001", "Hardware", 1, 199.99, 0, "2024-02-05"),
430
+ (6, "Alice Chen", "alice@company.com", "North", "Pendant Corp", "", "free", "Widget Pro", "WIDGET-001", "Electronics", 7, 299.99, 15, "2024-02-10"),
431
+ (7, "Carol White", "carol@company.com", "East", "Globex Corp", "globex@corp.com", "enterprise", "Nano Device", "NANO-D01", "Electronics", 2, 549.99, 20, "2024-02-15"),
432
+ (8, "Bob Martinez", "bob@company.com", "South", "Umbrella Inc", "sales@umbrella.co", "premium", "Tool Kit", "TOOLS,001", "Hardware", 4, 199.99, 10, "2024-03-01"),
433
+ (9, "Alice Chen", "alice@company.com", "North", "Initech LLC", "info@initech.com", "basic", "Nano Device", "NANO-D01", "Electronics", 1, 549.99, 0, "2024-03-05"),
434
+ (10, "Carol White", "carol@company.com", "East", "Umbrella Inc", "sales@umbrella.co", "premium", "Cable Bundle","CABLE-5PK", "Accessories", 20, 14.99, 0, "2024-03-10"),
435
+ (11, "Bob Martinez", "bob@company.com", "South", "Globex Corp", "globex@corp.com", "enterprise", "Cable Bundle","CABLE-5PK", "Accessories", 15, 14.99, 5, "2024-03-15"),
436
+ (12, "Carol White", "carol@company.com", "East", "Pendant Corp", "orders@pendant.io", "free", "Gadget X", "GADGET-X01", "Hardware", 6, 89.99, 0, "2024-03-20"),
437
+ ]
438
+
439
+ TASK6_TARGET_DDL = """CREATE TABLE salespersons (
440
+ id INTEGER PRIMARY KEY,
441
+ name TEXT NOT NULL,
442
+ email TEXT NOT NULL UNIQUE,
443
+ region TEXT NOT NULL
444
+ );
445
+
446
+ CREATE TABLE customers (
447
+ id INTEGER PRIMARY KEY,
448
+ name TEXT NOT NULL,
449
+ email TEXT NOT NULL UNIQUE,
450
+ tier TEXT NOT NULL
451
+ );
452
+
453
+ CREATE TABLE products (
454
+ id INTEGER PRIMARY KEY,
455
+ name TEXT NOT NULL,
456
+ sku TEXT NOT NULL UNIQUE,
457
+ category TEXT NOT NULL
458
+ );
459
+
460
+ CREATE TABLE sales (
461
+ id INTEGER PRIMARY KEY,
462
+ salesperson_id INTEGER NOT NULL,
463
+ customer_id INTEGER NOT NULL,
464
+ product_id INTEGER NOT NULL,
465
+ quantity INTEGER NOT NULL,
466
+ unit_price REAL NOT NULL,
467
+ discount_pct INTEGER NOT NULL DEFAULT 0,
468
+ sale_date TEXT NOT NULL,
469
+ FOREIGN KEY (salesperson_id) REFERENCES salespersons(id),
470
+ FOREIGN KEY (customer_id) REFERENCES customers(id),
471
+ FOREIGN KEY (product_id) REFERENCES products(id)
472
+ );
473
+
474
+ CREATE TABLE data_issues (
475
+ id INTEGER PRIMARY KEY,
476
+ source_table TEXT NOT NULL,
477
+ source_row_id INTEGER NOT NULL,
478
+ issue_type TEXT NOT NULL,
479
+ issue_detail TEXT NOT NULL
480
+ );"""
481
+
482
+ TASK6_EXPECTED_SALESPERSON_COUNT = 3
483
+ TASK6_EXPECTED_CUSTOMER_COUNT = 3 # Pendant Corp row 6 excluded (empty email)
484
+ TASK6_EXPECTED_PRODUCT_COUNT = 5
485
+ TASK6_EXPECTED_SALES_COUNT = 11 # row 6 excluded
486
+ TASK6_EXPECTED_DATA_ISSUES_COUNT = 1
487
+
488
+
489
+ def seed_task6(conn: sqlite3.Connection) -> None:
490
+ """Seed the database for Task 6: Multi-Entity Extraction."""
491
+ conn.executescript(TASK6_SOURCE_DDL)
492
+ conn.executemany(
493
+ "INSERT INTO sales_records (id, rep_name, rep_email, rep_region, "
494
+ "customer_name, customer_email, customer_tier, product_name, product_sku, "
495
+ "product_category, quantity, unit_price, discount_pct, sale_date) "
496
+ "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
497
+ TASK6_SOURCE_DATA,
498
+ )
499
+ conn.commit()
500
+
501
+
502
+ # =============================================================================
503
+ # TASK 7: Dual-Source Consolidation (Hard)
504
+ # =============================================================================
505
+ # Agent must merge 6 source tables from two incompatible systems (Legacy CRM + Modern SaaS)
506
+ # into 4 unified target tables. Cross-system email dedup, currency coercion, orphan detection.
507
+
508
+ TASK7_LEGACY_CUSTOMERS_DDL = """
509
+ CREATE TABLE legacy_customers (
510
+ id INTEGER PRIMARY KEY,
511
+ full_name TEXT,
512
+ contact_email TEXT,
513
+ phone TEXT,
514
+ account_type TEXT,
515
+ join_date TEXT
516
+ );
517
+ """
518
+
519
+ TASK7_LEGACY_ORDERS_DDL = """
520
+ CREATE TABLE legacy_orders (
521
+ id INTEGER PRIMARY KEY,
522
+ customer_id INTEGER,
523
+ product_code TEXT,
524
+ total_amount TEXT,
525
+ order_status TEXT,
526
+ order_date TEXT
527
+ );
528
+ """
529
+
530
+ TASK7_LEGACY_PRODUCTS_DDL = """
531
+ CREATE TABLE legacy_products (
532
+ code TEXT PRIMARY KEY,
533
+ description TEXT,
534
+ unit_price TEXT
535
+ );
536
+ """
537
+
538
+ TASK7_MODERN_USERS_DDL = """
539
+ CREATE TABLE modern_users (
540
+ uuid TEXT PRIMARY KEY,
541
+ display_name TEXT,
542
+ email_address TEXT,
543
+ subscription_tier INTEGER,
544
+ created_at TEXT
545
+ );
546
+ """
547
+
548
+ TASK7_MODERN_TRANSACTIONS_DDL = """
549
+ CREATE TABLE modern_transactions (
550
+ id INTEGER PRIMARY KEY,
551
+ user_uuid TEXT,
552
+ item_sku TEXT,
553
+ amount REAL,
554
+ currency TEXT,
555
+ tx_status INTEGER,
556
+ created_at TEXT
557
+ );
558
+ """
559
+
560
+ TASK7_MODERN_CATALOG_DDL = """
561
+ CREATE TABLE modern_catalog (
562
+ sku TEXT PRIMARY KEY,
563
+ title TEXT,
564
+ base_price REAL
565
+ );
566
+ """
567
+
568
+ TASK7_LEGACY_CUSTOMERS_DATA = [
569
+ (1, "Alice Johnson", "alice@example.com", "+1-555-0101", "premium", "2021-03-15"),
570
+ (2, "Bob Chen", "bob@example.com", "+1-555-0102", "basic", "2022-07-01"),
571
+ (3, "Carol Davis", None, "+1-555-0103", "free", "2023-01-10"),
572
+ (4, "Dave Wilson", "dave@example.com", "+1-555-0104", "premium", "2021-11-20"),
573
+ (5, "Eve Martinez", "eve@example.com", "+1-555-0105", "free", "2023-06-05"),
574
+ ]
575
+
576
+ TASK7_MODERN_USERS_DATA = [
577
+ ("uuid-A1", "Alice J.", "alice@example.com", 3, "2021-03-15"),
578
+ ("uuid-B2", "R. Bob Chen", "bob@example.com", 2, "2022-07-01"),
579
+ ("uuid-F6", "Frank Lee", "frank@example.com", 4, "2022-09-30"),
580
+ ("uuid-G7", "Grace Kim", "grace@example.com", 1, "2024-01-15"),
581
+ ]
582
+
583
+ TASK7_LEGACY_ORDERS_DATA = [
584
+ (1, 1, "PROD-A", "$1,234.56", "delivered", "2022-01-10"),
585
+ (2, 2, "PROD-B", "$89.99", "shipped", "2022-03-15"),
586
+ (3, 4, "PROD-A", "$2,500.00", "delivered", "2022-05-20"),
587
+ (4, 3, "PROD-C", "$45.00", "pending", "2023-02-01"),
588
+ ]
589
+
590
+ TASK7_LEGACY_PRODUCTS_DATA = [
591
+ ("PROD-A", "Enterprise Widget", "$1,234.56"),
592
+ ("PROD-B", "Basic Gadget", "$89.99"),
593
+ ("PROD-C", "Starter Kit", "$45.00"),
594
+ ]
595
+
596
+ TASK7_MODERN_TRANSACTIONS_DATA = [
597
+ (1, "uuid-A1", "SKU-001", 299.99, "USD", 3, "2023-06-01"),
598
+ (2, "uuid-B2", "SKU-002", 89.99, None, 2, "2023-07-15"),
599
+ (3, "uuid-F6", "SKU-001", 299.99, None, 3, "2023-08-20"),
600
+ (4, "uuid-DEAD", "SKU-003", 15.99, None, 1, "2023-09-01"), # orphan
601
+ (5, "uuid-G7", "SKU-002", 89.99, "USD", 4, "2023-10-10"),
602
+ (6, "uuid-A1", "SKU-003", 15.99, "EUR", 5, "2023-11-01"),
603
+ ]
604
+
605
+ TASK7_MODERN_CATALOG_DATA = [
606
+ ("SKU-001", "Pro Widget", 299.99),
607
+ ("SKU-002", "Smart Gadget", 89.99),
608
+ ("SKU-003", "Mini Accessory", 15.99),
609
+ ]
610
+
611
+ TASK7_TARGET_DDL = """CREATE TABLE unified_customers (
612
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
613
+ legacy_id INTEGER,
614
+ modern_uuid TEXT,
615
+ name TEXT,
616
+ email TEXT,
617
+ phone TEXT,
618
+ tier TEXT NOT NULL DEFAULT 'free',
619
+ source TEXT NOT NULL,
620
+ created_at TEXT
621
+ );
622
+
623
+ CREATE TABLE unified_products (
624
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
625
+ code TEXT NOT NULL UNIQUE,
626
+ title TEXT NOT NULL,
627
+ price REAL NOT NULL,
628
+ source TEXT NOT NULL
629
+ );
630
+
631
+ CREATE TABLE unified_orders (
632
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
633
+ customer_id INTEGER NOT NULL,
634
+ product_id INTEGER,
635
+ amount REAL NOT NULL,
636
+ currency TEXT NOT NULL DEFAULT 'USD',
637
+ status TEXT NOT NULL,
638
+ order_date TEXT,
639
+ source TEXT NOT NULL,
640
+ FOREIGN KEY (customer_id) REFERENCES unified_customers(id)
641
+ );
642
+
643
+ CREATE TABLE migration_issues (
644
+ id INTEGER PRIMARY KEY,
645
+ source_system TEXT NOT NULL,
646
+ source_table TEXT NOT NULL,
647
+ source_id TEXT NOT NULL,
648
+ issue_type TEXT NOT NULL,
649
+ resolution TEXT NOT NULL
650
+ );"""
651
+
652
+ TASK7_EXPECTED_UNIFIED_CUSTOMERS = 7
653
+ TASK7_EXPECTED_BOTH_SOURCE_COUNT = 2
654
+ TASK7_EXPECTED_UNIFIED_ORDERS = 9
655
+ TASK7_EXPECTED_MIGRATION_ISSUES = 2
656
+
657
+ # Tier mapping: 1→'free', 2→'basic', 3→'premium', 4→'enterprise'
658
+ TASK7_TIER_MAP = {1: "free", 2: "basic", 3: "premium", 4: "enterprise"}
659
+ # Status mapping: 1→'pending', 2→'processing', 3→'complete', 4→'failed', 5→'refunded'
660
+ TASK7_STATUS_MAP = {1: "pending", 2: "processing", 3: "complete", 4: "failed", 5: "refunded"}
661
+
662
+
663
+ def seed_task7(conn: sqlite3.Connection) -> None:
664
+ """Seed the database for Task 7: Dual-Source Consolidation."""
665
+ conn.executescript(TASK7_LEGACY_CUSTOMERS_DDL)
666
+ conn.executescript(TASK7_LEGACY_ORDERS_DDL)
667
+ conn.executescript(TASK7_LEGACY_PRODUCTS_DDL)
668
+ conn.executescript(TASK7_MODERN_USERS_DDL)
669
+ conn.executescript(TASK7_MODERN_TRANSACTIONS_DDL)
670
+ conn.executescript(TASK7_MODERN_CATALOG_DDL)
671
+
672
+ conn.executemany("INSERT INTO legacy_customers VALUES (?, ?, ?, ?, ?, ?)", TASK7_LEGACY_CUSTOMERS_DATA)
673
+ conn.executemany("INSERT INTO legacy_orders VALUES (?, ?, ?, ?, ?, ?)", TASK7_LEGACY_ORDERS_DATA)
674
+ conn.executemany("INSERT INTO legacy_products VALUES (?, ?, ?)", TASK7_LEGACY_PRODUCTS_DATA)
675
+ conn.executemany("INSERT INTO modern_users VALUES (?, ?, ?, ?, ?)", TASK7_MODERN_USERS_DATA)
676
+ conn.executemany("INSERT INTO modern_transactions VALUES (?, ?, ?, ?, ?, ?, ?)", TASK7_MODERN_TRANSACTIONS_DATA)
677
+ conn.executemany("INSERT INTO modern_catalog VALUES (?, ?, ?)", TASK7_MODERN_CATALOG_DATA)
678
+ conn.commit()
679
+
680
+
681
  # =============================================================================
682
  # Task Registry
683
  # =============================================================================
 
688
  "target_ddl": TASK1_TARGET_DDL,
689
  "description": "Merge first_name and last_name into a single full_name column without data loss",
690
  "difficulty": "easy",
691
+ "max_steps": 10,
692
+ },
693
+ "soft-delete-restoration": {
694
+ "seed_fn": seed_task4,
695
+ "target_ddl": TASK4_TARGET_DDL,
696
+ "description": "Restore deleted products from deletion_log, add is_deleted/deleted_at columns",
697
+ "difficulty": "easy",
698
+ "max_steps": 10,
699
  },
700
  "table-normalization": {
701
  "seed_fn": seed_task2,
702
  "target_ddl": TASK2_TARGET_DDL,
703
  "description": "Decompose a flat purchases table into normalized customers and orders tables with FK",
704
  "difficulty": "medium",
705
+ "max_steps": 15,
706
+ },
707
+ "schema-version-merge": {
708
+ "seed_fn": seed_task5,
709
+ "target_ddl": TASK5_TARGET_DDL,
710
+ "description": "Merge overlapping v1/v2 product tables with price coercion and conflict resolution",
711
+ "difficulty": "medium",
712
+ "max_steps": 15,
713
+ },
714
+ "multi-entity-extraction": {
715
+ "seed_fn": seed_task6,
716
+ "target_ddl": TASK6_TARGET_DDL,
717
+ "description": "Decompose a sales god-table into 3NF with 3 FKs and invalid data routing",
718
+ "difficulty": "medium",
719
+ "max_steps": 15,
720
  },
721
  "cascade-migration": {
722
  "seed_fn": seed_task3,
723
  "target_ddl": TASK3_TARGET_DDL,
724
  "description": "Multi-table FK cascade with type coercion, NULL handling, and orphan audit logging",
725
  "difficulty": "hard",
726
+ "max_steps": 20,
727
+ },
728
+ "dual-source-consolidation": {
729
+ "seed_fn": seed_task7,
730
+ "target_ddl": TASK7_TARGET_DDL,
731
+ "description": "Merge 6 tables from two incompatible systems into 4 unified tables with cross-system dedup",
732
+ "difficulty": "hard",
733
+ "max_steps": 20,
734
  },
735
  }
736
+
server/__pycache__/environment.cpython-312.pyc CHANGED
Binary files a/server/__pycache__/environment.cpython-312.pyc and b/server/__pycache__/environment.cpython-312.pyc differ
 
server/__pycache__/grader.cpython-312.pyc CHANGED
Binary files a/server/__pycache__/grader.cpython-312.pyc and b/server/__pycache__/grader.cpython-312.pyc differ
 
server/app.py CHANGED
@@ -61,35 +61,40 @@ async def root():
61
  return """<!DOCTYPE html>
62
  <html>
63
  <head>
64
- <title>SQL Migration Agent OpenEnv</title>
65
  <style>
66
  body { font-family: monospace; background: #0d1117; color: #e6edf3; padding: 40px; }
67
  h1 { color: #58a6ff; } h2 { color: #79c0ff; }
68
  .ok { color: #3fb950; } .endpoint { color: #d2a8ff; }
69
  pre { background: #161b22; padding: 12px; border-radius: 6px; }
70
  a { color: #58a6ff; }
 
71
  </style>
72
  </head>
73
  <body>
74
- <h1>🗄️ SQL Schema Migration Agent</h1>
75
- <p class="ok">Server running OpenEnv hackathon environment</p>
76
  <h2>API Endpoints</h2>
77
  <pre>
78
- <span class="endpoint">POST /reset</span> Start a new migration episode
79
- <span class="endpoint">POST /step</span> Execute a SQL action
80
- <span class="endpoint">GET /state</span> Current environment state
81
- <span class="endpoint">GET /tasks</span> List all 3 tasks
82
- <span class="endpoint">POST /grader</span> Run grader on all tasks
83
- <span class="endpoint">GET /health</span> Health check
84
- <span class="endpoint">GET /docs</span> Interactive API documentation
85
  </pre>
86
- <h2>Tasks</h2>
87
  <pre>
88
- 1. column-restructure (Easy) Merge first_name + last_name full_name
89
- 2. table-normalization (Medium) Normalize purchases customers + orders + FK
90
- 3. cascade-migration (Hard) 4-table FK cascade, type coercion, orphan audit
 
 
 
 
91
  </pre>
92
- <p><a href="/docs">📖 Open API Docs</a> | <a href="/tasks">📋 View Tasks</a> | <a href="/health">💚 Health Check</a></p>
93
  </body>
94
  </html>"""
95
 
@@ -101,31 +106,27 @@ async def list_tasks() -> Dict[str, Any]:
101
 
102
  Returns JSON with task definitions and action schema for automated validation.
103
  """
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
104
  return {
105
- "tasks": [
106
- {
107
- "name": "column-restructure",
108
- "description": "Merge first_name and last_name into a single full_name column without data loss",
109
- "difficulty": "easy",
110
- "max_steps": 20,
111
- },
112
- {
113
- "name": "table-normalization",
114
- "description": "Decompose a flat purchases table into normalized customers and orders tables with FK",
115
- "difficulty": "medium",
116
- "max_steps": 20,
117
- },
118
- {
119
- "name": "cascade-migration",
120
- "description": "Multi-table FK cascade with type coercion, NULL handling, and orphan audit logging",
121
- "difficulty": "hard",
122
- "max_steps": 20,
123
- },
124
- ],
125
  "action_schema": {
126
- "sql_command": "string The SQL statement to execute",
127
- "reasoning": "string Explanation of the action (optional)",
128
- "submit_final": "boolean Set true when migration is complete (default: false)",
129
  },
130
  }
131
 
@@ -141,25 +142,29 @@ async def grade_task(
141
  Returns per-task grader scores after running the environment's internal scorer.
142
  """
143
  task_name = body.get("task_name", None)
144
- tasks_to_grade = [task_name] if task_name else ["column-restructure", "table-normalization", "cascade-migration"]
 
 
 
 
 
 
145
 
146
  results = {}
147
  for t in tasks_to_grade:
148
  try:
149
  env = DbMigrationEnvironment(task_name=t)
150
  obs = env.reset()
151
- # Return the initial score (before any agent action)
152
- # This proves the grader works and returns values in [0.0, 1.0]
153
  results[t] = {
154
- "initial_score": max(0.0, min(1.0, obs.migration_progress)),
155
  "grader_functional": True,
156
- "reward_range": [0.0, 1.0],
157
- "max_steps": 20,
158
  }
159
  env.close()
160
  except Exception as e:
161
  results[t] = {
162
- "initial_score": 0.0,
163
  "grader_functional": False,
164
  "error": str(e),
165
  }
 
61
  return """<!DOCTYPE html>
62
  <html>
63
  <head>
64
+ <title>SQL Migration Agent -- OpenEnv</title>
65
  <style>
66
  body { font-family: monospace; background: #0d1117; color: #e6edf3; padding: 40px; }
67
  h1 { color: #58a6ff; } h2 { color: #79c0ff; }
68
  .ok { color: #3fb950; } .endpoint { color: #d2a8ff; }
69
  pre { background: #161b22; padding: 12px; border-radius: 6px; }
70
  a { color: #58a6ff; }
71
+ .easy { color: #3fb950; } .medium { color: #d29922; } .hard { color: #f85149; }
72
  </style>
73
  </head>
74
  <body>
75
+ <h1>SQL Schema Migration Agent</h1>
76
+ <p class="ok">Server running -- OpenEnv hackathon environment (7 tasks)</p>
77
  <h2>API Endpoints</h2>
78
  <pre>
79
+ <span class="endpoint">POST /reset</span> -- Start a new migration episode
80
+ <span class="endpoint">POST /step</span> -- Execute a SQL action
81
+ <span class="endpoint">GET /state</span> -- Current environment state
82
+ <span class="endpoint">GET /tasks</span> -- List all 7 tasks
83
+ <span class="endpoint">POST /grader</span> -- Run grader on all tasks
84
+ <span class="endpoint">GET /health</span> -- Health check
85
+ <span class="endpoint">GET /docs</span> -- Interactive API documentation
86
  </pre>
87
+ <h2>Tasks (2 Easy / 3 Medium / 2 Hard)</h2>
88
  <pre>
89
+ <span class="easy">1. column-restructure (Easy) -- Merge first_name + last_name -> full_name</span>
90
+ <span class="easy">2. soft-delete-restoration (Easy) -- Restore deleted products from deletion_log</span>
91
+ <span class="medium">3. table-normalization (Medium) -- Normalize purchases -> customers + orders + FK</span>
92
+ <span class="medium">4. schema-version-merge (Medium) -- Merge v1/v2 product tables with coercion</span>
93
+ <span class="medium">5. multi-entity-extraction (Medium) -- 3NF decomposition with invalid data routing</span>
94
+ <span class="hard">6. cascade-migration (Hard) -- 4-table FK cascade, type coercion, orphan audit</span>
95
+ <span class="hard">7. dual-source-consolidation(Hard) -- 6->4 table merge, cross-system email dedup</span>
96
  </pre>
97
+ <p><a href="/docs">Open API Docs</a> | <a href="/tasks">View Tasks</a> | <a href="/health">Health Check</a></p>
98
  </body>
99
  </html>"""
100
 
 
106
 
107
  Returns JSON with task definitions and action schema for automated validation.
108
  """
109
+ # Import seeds to dynamically build task list
110
+ try:
111
+ from .. import seeds as _seeds
112
+ except ImportError:
113
+ import seeds as _seeds
114
+
115
+ task_list = []
116
+ for name, cfg in _seeds.TASKS.items():
117
+ task_list.append({
118
+ "name": name,
119
+ "description": cfg["description"],
120
+ "difficulty": cfg["difficulty"],
121
+ "max_steps": cfg.get("max_steps", 20),
122
+ })
123
+
124
  return {
125
+ "tasks": task_list,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
  "action_schema": {
127
+ "sql_command": "string -- The SQL statement to execute",
128
+ "reasoning": "string -- Explanation of the action (optional)",
129
+ "submit_final": "boolean -- Set true when migration is complete (default: false)",
130
  },
131
  }
132
 
 
142
  Returns per-task grader scores after running the environment's internal scorer.
143
  """
144
  task_name = body.get("task_name", None)
145
+
146
+ try:
147
+ from .. import seeds as _seeds
148
+ except ImportError:
149
+ import seeds as _seeds
150
+
151
+ tasks_to_grade = [task_name] if task_name else list(_seeds.TASKS.keys())
152
 
153
  results = {}
154
  for t in tasks_to_grade:
155
  try:
156
  env = DbMigrationEnvironment(task_name=t)
157
  obs = env.reset()
 
 
158
  results[t] = {
159
+ "initial_score": obs.migration_progress,
160
  "grader_functional": True,
161
+ "reward_range": [0.01, 0.99],
162
+ "max_steps": _seeds.TASKS[t].get("max_steps", 20),
163
  }
164
  env.close()
165
  except Exception as e:
166
  results[t] = {
167
+ "initial_score": 0.01,
168
  "grader_functional": False,
169
  "error": str(e),
170
  }
server/environment.py CHANGED
@@ -71,7 +71,8 @@ class DbMigrationEnvironment(Environment):
71
  return ""
72
  try:
73
  cursor = self._conn.execute(
74
- "SELECT sql FROM sqlite_master WHERE type='table' AND sql IS NOT NULL ORDER BY name"
 
75
  )
76
  schemas = [row[0] for row in cursor.fetchall()]
77
  return ";\n\n".join(schemas) + ";" if schemas else ""
@@ -186,6 +187,17 @@ class DbMigrationEnvironment(Environment):
186
  self._conn.commit()
187
  rows_affected = cursor.rowcount
188
  execution_result = f"Success: {rows_affected} rows affected"
 
 
 
 
 
 
 
 
 
 
 
189
  except Exception as e:
190
  # Never crash — feed the error back to the agent
191
  execution_result = str(e)
@@ -199,8 +211,9 @@ class DbMigrationEnvironment(Environment):
199
  # Compute scores
200
  current_score, step_reward = self._reconciler.compute_step_reward(self._conn)
201
 
202
- # Episode termination: submit_final, max steps (20), OR perfect score
203
- done = action.submit_final or self._step_count >= 20 or current_score >= 0.99
 
204
 
205
  # Update state
206
  self._state.step_count = self._step_count
 
71
  return ""
72
  try:
73
  cursor = self._conn.execute(
74
+ "SELECT sql FROM sqlite_master WHERE type='table' "
75
+ "AND sql IS NOT NULL AND name NOT LIKE 'sqlite_%' ORDER BY name"
76
  )
77
  schemas = [row[0] for row in cursor.fetchall()]
78
  return ";\n\n".join(schemas) + ";" if schemas else ""
 
187
  self._conn.commit()
188
  rows_affected = cursor.rowcount
189
  execution_result = f"Success: {rows_affected} rows affected"
190
+ except sqlite3.Warning as e:
191
+ # Multi-statement attempt — agent tried to combine statements
192
+ execution_result = (
193
+ f"Error: SQLite requires one statement per step. "
194
+ f"Split your commands into separate steps. Original error: {e}"
195
+ )
196
+ action_error = "multi_statement"
197
+ try:
198
+ self._conn.rollback()
199
+ except Exception:
200
+ pass
201
  except Exception as e:
202
  # Never crash — feed the error back to the agent
203
  execution_result = str(e)
 
211
  # Compute scores
212
  current_score, step_reward = self._reconciler.compute_step_reward(self._conn)
213
 
214
+ # Episode termination: submit_final, max steps, OR perfect score
215
+ task_max = self._task_config.get("max_steps", 20)
216
+ done = action.submit_final or self._step_count >= task_max or current_score >= 0.99
217
 
218
  # Update state
219
  self._state.step_count = self._step_count
server/grader.py CHANGED
@@ -29,6 +29,22 @@ from seeds import (
29
  TASK3_EXPECTED_AUDIT_ENTRIES,
30
  TASK3_EXPECTED_EMPLOYEE_COUNT,
31
  TASK3_EXPECTED_SALARIES,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  )
33
 
34
 
@@ -36,7 +52,8 @@ def _get_table_names(conn: sqlite3.Connection) -> Set[str]:
36
  """Get all table names in the database."""
37
  try:
38
  cursor = conn.execute(
39
- "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"
 
40
  )
41
  return {row[0] for row in cursor.fetchall()}
42
  except Exception:
@@ -98,10 +115,18 @@ class StateReconciler:
98
  return self._score_task2(conn)
99
  elif self.task_name == "cascade-migration":
100
  return self._score_task3(conn)
 
 
 
 
 
 
 
 
101
  else:
102
- return 0.0
103
  except Exception:
104
- return 0.0
105
 
106
  def compute_step_reward(self, conn: sqlite3.Connection) -> Tuple[float, float]:
107
  """
@@ -173,6 +198,11 @@ class StateReconciler:
173
  # order_count=0.2, no_null_ids=0.1, integrity=0.2
174
 
175
  def _score_task2(self, conn: sqlite3.Connection) -> float:
 
 
 
 
 
176
  score = 0.0
177
  tables = _get_table_names(conn)
178
 
@@ -242,6 +272,11 @@ class StateReconciler:
242
  # Total max = 0.90 for all grader checks + 0.10 integrity = 1.00
243
 
244
  def _score_task3(self, conn: sqlite3.Connection) -> float:
 
 
 
 
 
245
  score = 0.0
246
  tables = _get_table_names(conn)
247
 
@@ -351,3 +386,371 @@ class StateReconciler:
351
  score = min(score, 0.1)
352
 
353
  return max(0.01, min(0.99, score))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
  TASK3_EXPECTED_AUDIT_ENTRIES,
30
  TASK3_EXPECTED_EMPLOYEE_COUNT,
31
  TASK3_EXPECTED_SALARIES,
32
+ TASK4_EXPECTED_ROW_COUNT,
33
+ TASK4_EXPECTED_ID_SUM,
34
+ TASK4_EXPECTED_DELETED_COUNT,
35
+ TASK4_EXPECTED_ACTIVE_COUNT,
36
+ TASK5_EXPECTED_ROW_COUNT,
37
+ TASK5_EXPECTED_PRICE_SUM,
38
+ TASK5_EXPECTED_BOTH_COUNT,
39
+ TASK6_EXPECTED_SALESPERSON_COUNT,
40
+ TASK6_EXPECTED_CUSTOMER_COUNT,
41
+ TASK6_EXPECTED_PRODUCT_COUNT,
42
+ TASK6_EXPECTED_SALES_COUNT,
43
+ TASK6_EXPECTED_DATA_ISSUES_COUNT,
44
+ TASK7_EXPECTED_UNIFIED_CUSTOMERS,
45
+ TASK7_EXPECTED_BOTH_SOURCE_COUNT,
46
+ TASK7_EXPECTED_UNIFIED_ORDERS,
47
+ TASK7_EXPECTED_MIGRATION_ISSUES,
48
  )
49
 
50
 
 
52
  """Get all table names in the database."""
53
  try:
54
  cursor = conn.execute(
55
+ "SELECT name FROM sqlite_master WHERE type='table' "
56
+ "AND name NOT LIKE 'sqlite_%' ORDER BY name"
57
  )
58
  return {row[0] for row in cursor.fetchall()}
59
  except Exception:
 
115
  return self._score_task2(conn)
116
  elif self.task_name == "cascade-migration":
117
  return self._score_task3(conn)
118
+ elif self.task_name == "soft-delete-restoration":
119
+ return self._score_task4(conn)
120
+ elif self.task_name == "schema-version-merge":
121
+ return self._score_task5(conn)
122
+ elif self.task_name == "multi-entity-extraction":
123
+ return self._score_task6(conn)
124
+ elif self.task_name == "dual-source-consolidation":
125
+ return self._score_task7(conn)
126
  else:
127
+ return 0.01
128
  except Exception:
129
+ return 0.01
130
 
131
  def compute_step_reward(self, conn: sqlite3.Connection) -> Tuple[float, float]:
132
  """
 
198
  # order_count=0.2, no_null_ids=0.1, integrity=0.2
199
 
200
  def _score_task2(self, conn: sqlite3.Connection) -> float:
201
+ # Re-assert FK enforcement to prevent PRAGMA bypass exploit
202
+ try:
203
+ conn.execute("PRAGMA foreign_keys = ON")
204
+ except Exception:
205
+ pass
206
  score = 0.0
207
  tables = _get_table_names(conn)
208
 
 
272
  # Total max = 0.90 for all grader checks + 0.10 integrity = 1.00
273
 
274
  def _score_task3(self, conn: sqlite3.Connection) -> float:
275
+ # Re-assert FK enforcement to prevent PRAGMA bypass exploit
276
+ try:
277
+ conn.execute("PRAGMA foreign_keys = ON")
278
+ except Exception:
279
+ pass
280
  score = 0.0
281
  tables = _get_table_names(conn)
282
 
 
386
  score = min(score, 0.1)
387
 
388
  return max(0.01, min(0.99, score))
389
+
390
+ # =========================================================================
391
+ # Task 4: Soft-Delete Restoration (Easy)
392
+ # =========================================================================
393
+
394
+ def _score_task4(self, conn: sqlite3.Connection) -> float:
395
+ score = 0.0
396
+ tables = _get_table_names(conn)
397
+
398
+ if "products" not in tables:
399
+ return 0.01
400
+
401
+ cols = _get_column_names(conn, "products")
402
+
403
+ # is_deleted column exists (+0.15)
404
+ if "is_deleted" in cols:
405
+ score += 0.15
406
+
407
+ # deleted_at column exists (+0.10)
408
+ if "deleted_at" in cols:
409
+ score += 0.10
410
+
411
+ # Row count = 8 (+0.20)
412
+ row_count = _get_row_count(conn, "products")
413
+ if row_count == TASK4_EXPECTED_ROW_COUNT:
414
+ score += 0.20
415
+
416
+ # Active products: is_deleted=0, deleted_at IS NULL (+0.25)
417
+ if "is_deleted" in cols:
418
+ try:
419
+ cursor = conn.execute(
420
+ "SELECT COUNT(*) FROM products WHERE is_deleted = 0 AND deleted_at IS NULL"
421
+ )
422
+ active = cursor.fetchone()[0]
423
+ if active == TASK4_EXPECTED_ACTIVE_COUNT:
424
+ score += 0.25
425
+ except Exception:
426
+ pass
427
+
428
+ # Restored products: is_deleted=1, deleted_at IS NOT NULL (+0.20)
429
+ if "is_deleted" in cols:
430
+ try:
431
+ cursor = conn.execute(
432
+ "SELECT COUNT(*) FROM products WHERE is_deleted = 1 AND deleted_at IS NOT NULL"
433
+ )
434
+ restored = cursor.fetchone()[0]
435
+ if restored == TASK4_EXPECTED_DELETED_COUNT:
436
+ score += 0.20
437
+ except Exception:
438
+ pass
439
+
440
+ # SUM(id) fingerprint = 36 — no phantom rows (+0.10)
441
+ try:
442
+ cursor = conn.execute("SELECT SUM(id) FROM products")
443
+ id_sum = cursor.fetchone()[0]
444
+ if id_sum == TASK4_EXPECTED_ID_SUM:
445
+ score += 0.10
446
+ except Exception:
447
+ pass
448
+
449
+ # Exploit check
450
+ if row_count == 0:
451
+ score = min(score, 0.1)
452
+
453
+ return max(0.01, min(0.99, score))
454
+
455
+ # =========================================================================
456
+ # Task 5: Schema Version Merge (Medium)
457
+ # =========================================================================
458
+
459
+ def _score_task5(self, conn: sqlite3.Connection) -> float:
460
+ # Re-assert FK enforcement
461
+ try:
462
+ conn.execute("PRAGMA foreign_keys = ON")
463
+ except Exception:
464
+ pass
465
+ score = 0.0
466
+ tables = _get_table_names(conn)
467
+
468
+ if "products" not in tables:
469
+ return 0.01
470
+
471
+ cols = _get_column_names(conn, "products")
472
+
473
+ # Schema completeness: all 8 columns (+0.10)
474
+ expected_cols = {"id", "name", "price", "category", "supplier", "brand", "sku", "source"}
475
+ if expected_cols.issubset(cols):
476
+ score += 0.10
477
+
478
+ # Row count = 9 (+0.15)
479
+ row_count = _get_row_count(conn, "products")
480
+ if row_count == TASK5_EXPECTED_ROW_COUNT:
481
+ score += 0.15
482
+
483
+ # PRICE_SUM fingerprint (+0.20)
484
+ try:
485
+ cursor = conn.execute("SELECT ROUND(SUM(price), 2) FROM products")
486
+ price_sum = cursor.fetchone()[0]
487
+ if price_sum is not None and abs(price_sum - TASK5_EXPECTED_PRICE_SUM) < 0.02:
488
+ score += 0.20
489
+ except Exception:
490
+ pass
491
+
492
+ # source='both' for conflicted ids 1,2 (+0.15)
493
+ if "source" in cols:
494
+ try:
495
+ cursor = conn.execute(
496
+ "SELECT COUNT(*) FROM products WHERE source = 'both'"
497
+ )
498
+ both_count = cursor.fetchone()[0]
499
+ if both_count == TASK5_EXPECTED_BOTH_COUNT:
500
+ score += 0.15
501
+ except Exception:
502
+ pass
503
+
504
+ # v2 name wins for conflicted rows (+0.15)
505
+ try:
506
+ cursor = conn.execute("SELECT name FROM products WHERE id = 2")
507
+ row = cursor.fetchone()
508
+ if row and "Updated" in row[0]:
509
+ score += 0.15
510
+ except Exception:
511
+ pass
512
+
513
+ # No NULL prices (+0.10)
514
+ try:
515
+ cursor = conn.execute("SELECT COUNT(*) FROM products WHERE price IS NULL")
516
+ null_count = cursor.fetchone()[0]
517
+ if null_count == 0:
518
+ score += 0.10
519
+ except Exception:
520
+ pass
521
+
522
+ # PRAGMA integrity_check (+0.15)
523
+ try:
524
+ cursor = conn.execute("PRAGMA integrity_check")
525
+ result = cursor.fetchone()[0]
526
+ if result == "ok":
527
+ score += 0.15
528
+ except Exception:
529
+ pass
530
+
531
+ # Exploit check
532
+ if row_count == 0:
533
+ score = min(score, 0.1)
534
+
535
+ return max(0.01, min(0.99, score))
536
+
537
+ # =========================================================================
538
+ # Task 6: Multi-Entity Extraction (Medium — Hard End)
539
+ # =========================================================================
540
+
541
+ def _score_task6(self, conn: sqlite3.Connection) -> float:
542
+ # Re-assert FK enforcement
543
+ try:
544
+ conn.execute("PRAGMA foreign_keys = ON")
545
+ except Exception:
546
+ pass
547
+ score = 0.0
548
+ tables = _get_table_names(conn)
549
+
550
+ # All 5 tables exist (+0.10)
551
+ required = {"salespersons", "customers", "products", "sales", "data_issues"}
552
+ if required.issubset(tables):
553
+ score += 0.10
554
+
555
+ # salesperson count = 3 (+0.10)
556
+ if "salespersons" in tables:
557
+ count = _get_row_count(conn, "salespersons")
558
+ if count == TASK6_EXPECTED_SALESPERSON_COUNT:
559
+ score += 0.10
560
+
561
+ # customer count = 3 (invalid excluded) (+0.12)
562
+ if "customers" in tables:
563
+ count = _get_row_count(conn, "customers")
564
+ if count == TASK6_EXPECTED_CUSTOMER_COUNT:
565
+ score += 0.12
566
+
567
+ # product count = 5 (+0.10)
568
+ if "products" in tables:
569
+ count = _get_row_count(conn, "products")
570
+ if count == TASK6_EXPECTED_PRODUCT_COUNT:
571
+ score += 0.10
572
+
573
+ # sales count = 11 (bad row excluded) (+0.12)
574
+ if "sales" in tables:
575
+ count = _get_row_count(conn, "sales")
576
+ if count == TASK6_EXPECTED_SALES_COUNT:
577
+ score += 0.12
578
+
579
+ # All 3 FKs present in sales (+0.15)
580
+ if "sales" in tables:
581
+ fk_count = 0
582
+ if _has_foreign_key(conn, "sales", "salespersons"): fk_count += 1
583
+ if _has_foreign_key(conn, "sales", "customers"): fk_count += 1
584
+ if _has_foreign_key(conn, "sales", "products"): fk_count += 1
585
+ score += 0.05 * fk_count # 0.15 total for all 3
586
+
587
+ # data_issues count = 1, for row 6 (+0.11)
588
+ if "data_issues" in tables:
589
+ count = _get_row_count(conn, "data_issues")
590
+ if count == TASK6_EXPECTED_DATA_ISSUES_COUNT:
591
+ score += 0.11
592
+
593
+ # alice email is trimmed (+0.10)
594
+ if "salespersons" in tables:
595
+ try:
596
+ cursor = conn.execute(
597
+ "SELECT email FROM salespersons WHERE name LIKE '%Alice%'"
598
+ )
599
+ row = cursor.fetchone()
600
+ if row and row[0] == "alice@company.com":
601
+ score += 0.10
602
+ except Exception:
603
+ pass
604
+
605
+ # PRAGMA integrity_check (+0.10)
606
+ try:
607
+ cursor = conn.execute("PRAGMA integrity_check")
608
+ result = cursor.fetchone()[0]
609
+ if result == "ok":
610
+ score += 0.10
611
+ except Exception:
612
+ pass
613
+
614
+ # Exploit check
615
+ sales_count = _get_row_count(conn, "sales") if "sales" in tables else 0
616
+ if sales_count == 0 and "sales" in tables:
617
+ score = min(score, 0.1)
618
+
619
+ return max(0.01, min(0.99, score))
620
+
621
+ # =========================================================================
622
+ # Task 7: Dual-Source Consolidation (Hard)
623
+ # =========================================================================
624
+
625
+ def _score_task7(self, conn: sqlite3.Connection) -> float:
626
+ # Re-assert FK enforcement
627
+ try:
628
+ conn.execute("PRAGMA foreign_keys = ON")
629
+ except Exception:
630
+ pass
631
+ score = 0.0
632
+ tables = _get_table_names(conn)
633
+
634
+ # All 4 tables exist (+0.05)
635
+ required = {"unified_customers", "unified_products", "unified_orders", "migration_issues"}
636
+ if required.issubset(tables):
637
+ score += 0.05
638
+
639
+ # unified_customers count = 7 (+0.08)
640
+ if "unified_customers" in tables:
641
+ count = _get_row_count(conn, "unified_customers")
642
+ if count == TASK7_EXPECTED_UNIFIED_CUSTOMERS:
643
+ score += 0.08
644
+
645
+ # source='both' for email-matched records (+0.08)
646
+ if "unified_customers" in tables:
647
+ try:
648
+ cursor = conn.execute(
649
+ "SELECT COUNT(*) FROM unified_customers WHERE source = 'both'"
650
+ )
651
+ both = cursor.fetchone()[0]
652
+ if both == TASK7_EXPECTED_BOTH_SOURCE_COUNT:
653
+ score += 0.08
654
+ except Exception:
655
+ pass
656
+
657
+ # Legacy amount coercion — check unified_orders has REAL amounts (+0.10)
658
+ if "unified_orders" in tables:
659
+ try:
660
+ cursor = conn.execute(
661
+ "SELECT COUNT(*) FROM unified_orders WHERE typeof(amount) = 'real' OR typeof(amount) = 'integer'"
662
+ )
663
+ real_count = cursor.fetchone()[0]
664
+ order_count = _get_row_count(conn, "unified_orders")
665
+ if real_count == order_count and order_count > 0:
666
+ score += 0.10
667
+ except Exception:
668
+ pass
669
+
670
+ # NULL currency → 'USD' fill (+0.07)
671
+ if "unified_orders" in tables:
672
+ try:
673
+ cursor = conn.execute(
674
+ "SELECT COUNT(*) FROM unified_orders WHERE currency IS NULL"
675
+ )
676
+ null_curr = cursor.fetchone()[0]
677
+ if null_curr == 0:
678
+ score += 0.07
679
+ except Exception:
680
+ pass
681
+
682
+ # tx_status mapped to strings (+0.10)
683
+ if "unified_orders" in tables:
684
+ try:
685
+ cursor = conn.execute(
686
+ "SELECT COUNT(*) FROM unified_orders WHERE typeof(status) = 'text'"
687
+ )
688
+ text_count = cursor.fetchone()[0]
689
+ order_count = _get_row_count(conn, "unified_orders")
690
+ if text_count == order_count and order_count > 0:
691
+ score += 0.10
692
+ except Exception:
693
+ pass
694
+
695
+ # subscription_tier mapped to strings (+0.08)
696
+ if "unified_customers" in tables:
697
+ try:
698
+ cursor = conn.execute(
699
+ "SELECT COUNT(*) FROM unified_customers WHERE typeof(tier) = 'text'"
700
+ )
701
+ text_count = cursor.fetchone()[0]
702
+ cust_count = _get_row_count(conn, "unified_customers")
703
+ if text_count == cust_count and cust_count > 0:
704
+ score += 0.08
705
+ except Exception:
706
+ pass
707
+
708
+ # migration_issues count = 2 (+0.08)
709
+ if "migration_issues" in tables:
710
+ count = _get_row_count(conn, "migration_issues")
711
+ if count == TASK7_EXPECTED_MIGRATION_ISSUES:
712
+ score += 0.08
713
+
714
+ # Orphaned transaction in issues (+0.07)
715
+ if "migration_issues" in tables:
716
+ try:
717
+ cursor = conn.execute(
718
+ "SELECT COUNT(*) FROM migration_issues WHERE issue_type = 'orphaned_record'"
719
+ )
720
+ orphan_issues = cursor.fetchone()[0]
721
+ if orphan_issues >= 1:
722
+ score += 0.07
723
+ except Exception:
724
+ pass
725
+
726
+ # NULL email customer in issues (+0.07)
727
+ if "migration_issues" in tables:
728
+ try:
729
+ cursor = conn.execute(
730
+ "SELECT COUNT(*) FROM migration_issues WHERE issue_type = 'null_email'"
731
+ )
732
+ null_issues = cursor.fetchone()[0]
733
+ if null_issues >= 1:
734
+ score += 0.07
735
+ except Exception:
736
+ pass
737
+
738
+ # FK integrity on unified_orders (+0.10)
739
+ if "unified_orders" in tables:
740
+ if _has_foreign_key(conn, "unified_orders", "unified_customers"):
741
+ score += 0.10
742
+
743
+ # PRAGMA integrity_check (+0.10)
744
+ try:
745
+ cursor = conn.execute("PRAGMA integrity_check")
746
+ result = cursor.fetchone()[0]
747
+ if result == "ok":
748
+ score += 0.10
749
+ except Exception:
750
+ pass
751
+
752
+ # Exploit check
753
+ if "unified_orders" in tables and _get_row_count(conn, "unified_orders") == 0:
754
+ score = min(score, 0.1)
755
+
756
+ return max(0.01, min(0.99, score))
test_all_tasks.py ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Quick validation of all 7 tasks: seeds + graders."""
2
+ import sqlite3
3
+ import sys
4
+ import os
5
+
6
+ sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
7
+
8
+ from seeds import TASKS
9
+ from server.grader import StateReconciler
10
+
11
+ print(f"Tasks registered: {len(TASKS)}")
12
+ assert len(TASKS) == 7, f"Expected 7 tasks, got {len(TASKS)}"
13
+ print(f" Names: {list(TASKS.keys())}")
14
+
15
+ for name, cfg in TASKS.items():
16
+ # Seed
17
+ conn = sqlite3.connect(":memory:")
18
+ conn.execute("PRAGMA foreign_keys = ON")
19
+ cfg["seed_fn"](conn)
20
+
21
+ cursor = conn.execute(
22
+ "SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'"
23
+ )
24
+ tables = [r[0] for r in cursor.fetchall()]
25
+ print(f"\n[{name}] ({cfg['difficulty']}, max_steps={cfg.get('max_steps', 20)})")
26
+ print(f" Tables: {tables}")
27
+
28
+ # Grade
29
+ reconciler = StateReconciler(name)
30
+ score = reconciler.score(conn)
31
+ assert 0.01 <= score <= 0.99, f"Score {score} out of [0.01, 0.99]!"
32
+ print(f" Initial score: {score:.2f} OK")
33
+
34
+ conn.close()
35
+
36
+ # Also test environment resets for each task
37
+ from server.environment import DbMigrationEnvironment
38
+
39
+ for name in TASKS:
40
+ env = DbMigrationEnvironment(task_name=name)
41
+ obs = env.reset()
42
+ assert obs.done == False
43
+ assert obs.step_number == 0
44
+ print(f" [{name}] Environment reset OK")
45
+ env.close()
46
+
47
+ print("\n" + "=" * 50)
48
+ print("ALL 7 TASKS VALIDATED SUCCESSFULLY!")
49
+ print("=" * 50)