Spaces:

Eishaan
/

sql-migration-env

Sleeping

Eishaan commited on Apr 9

Commit

71fa486

1 Parent(s): b08a347

feat: expand to 7 tasks (2E/3M/2H) + engine hardening

- Add 4 new tasks: soft-delete-restoration, schema-version-merge,
multi-entity-extraction, dual-source-consolidation
- Harden grader: PRAGMA bypass fix, sqlite_% table filtering
- Harden environment: multi-statement error handling, schema pollution
- Fix inference context window: target DDL baked into system prompt
- Per-task step budgets: Easy=10, Medium=15, Hard=20
- Update app.py, README.md for 7-task architecture
- All tests passing

Files changed (10) hide show

README.md +38 -44
__pycache__/seeds.cpython-312.pyc +0 -0
inference.py +30 -14
seeds.py +455 -0
server/__pycache__/environment.cpython-312.pyc +0 -0
server/__pycache__/grader.cpython-312.pyc +0 -0
server/app.py +50 -45
server/environment.py +16 -3
server/grader.py +406 -3
test_all_tasks.py +49 -0

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: SQL Migration Agent
-emoji: 🗄️
 colorFrom: blue
 colorTo: purple
 sdk: docker
@@ -11,10 +11,9 @@ tags:
 # SQL Schema Migration Agent
 > **An OpenEnv environment for benchmarking autonomous database migration agents.**
 >
-> Built for the Meta × Hugging Face OpenEnv Hackathon.
 ---
@@ -22,7 +21,7 @@ tags:
 Database schema migrations are among the most error-prone, high-stakes tasks in software engineering. Every production system faces them as application models evolve, yet they are extremely difficult to automate safely because data must be perfectly preserved.
-This environment trains AI agents to autonomously reconcile schema drift the exact way a real CI/CD pipeline would — given a flawed current state and an ideal target state, the agent must compute and safely execute the transformation sequence using raw SQL.
 **Real-world analogues:** `Flyway`, `Liquibase`, Django `makemigrations`, `Terraform` state transitions. This environment models that exact problem, reduced to an agentic RL core.
@@ -32,19 +31,24 @@ This environment trains AI agents to autonomously reconcile schema drift the exa
 Unlike simplistic environments that merely string-match SQL schemas, this environment uses a **deep structural reconciliation grader** built specifically to prevent LLM gamification:
-1. **Zero-Sum Exploit Protection:** Naive agents will often execute `DROP TABLE x; CREATE TABLE x (...)` to easily match the target schema, silently destroying all data. Our grader actively runs `SELECT COUNT(*)` and data-integrity hashing. If a table's schema matches but the data is gone, the score is brutally clamped to `0.01`.
-2. **Granular Partial Credit:** Multi-step migrations (like Task 3's 4-table cascade) require 15+ steps. Binary pass/fail rewards provide zero learning signal. Our grader assigns fractional weights to individual FK constraints, data type coercions, and orphaned record audit logs, providing continuous RL reward gradients.
-3. **Deterministic Adversarial Seeds:** Our injected data isn't generic. It includes edge cases that break naive SQL (e.g. `O'Brien` testing quote-escaping parametrization) and orphaned foreign keys testing `CASCADE` knowledge.
 ---
-## Tasks
-| # | Name | Difficulty | Description |
-|---|------|-----------|-------------|
-| 1 | `column-restructure` | Easy | Merge `first_name` + `last_name` → `full_name` without data loss. Adversarial: apostrophes (`O'Brien`), mid-caps (`McDonald`) |
-| 2 | `table-normalization` | Medium | Decompose flat `purchases` into `customers` + `orders` with FK. Adversarial: duplicate emails (`alice@` ×3), commas in item names |
-| 3 | `cascade-migration` | Hard | 4-table FK cascade: type coercion (`$90000` TEXT → `90000` INTEGER), orphan audit logging, NULL salary removal, full FK chain enforcement |
 ---
@@ -55,8 +59,8 @@ Unlike simplistic environments that merely string-match SQL schemas, this enviro
 | `current_schema_sql` | `str` | Current database DDL extracted from `sqlite_master` |
 | `target_schema_sql` | `str` | Target DDL the agent must reach |
 | `last_execution_result` | `str` | Result of last SQL execution, or error message |
-| `step_number` | `int` | Current step count (0–20) |
-| `migration_progress` | `float` | Current grader score [0.0–1.0] |
 | `task_name` | `str` | Name of the active task |
 | `done` | `bool` | Whether the episode has terminated |
 | `reward` | `float` | Step reward: score delta from previous step (can be negative) |
@@ -73,26 +77,11 @@ Unlike simplistic environments that merely string-match SQL schemas, this enviro
 ## Reward Function
-- **Step reward**: Delta between current and previous migration score. Strongly negative for destructive actions (e.g., wrong DROP TABLE → -0.4).
-- **Episode score**: Clamped to [0.0, 1.0]. Final state wins — regressions hurt.
-- **Exploit protection**: If schema matches target, but tables are empty (agent deleted data), score is capped at 0.1.
-- **Auto-termination**: Episode ends immediately when score reaches 1.0, preventing post-success regression.
-### Task 3 Scoring Breakdown
-| Check | Weight | Description |
-|-------|--------|-------------|
-| `audit_log` exists | 0.10 | Orphan audit table created |
-| `audit_log` row count ≥ 3 | 0.10 | All orphaned/invalid records logged |
-| Correct audit entries | 0.20 | Right `(source_table, reason)` pairs |
-| FK: `departments→companies` | 0.05 | FK chain step 1 |
-| FK: `employees→departments` | 0.05 | FK chain step 2 |
-| FK: `assets→employees` | 0.05 | FK chain step 3 |
-| `companies.name` NOT NULL | 0.05 | Constraint enforcement |
-| Employee count = 4 | 0.05 | Hal Patel (NULL salary) removed |
-| Salary coercion correct | 0.15 | All `$90000` → `90000` INTEGER |
-| No orphaned assets | 0.10 | All `asset.employee_id` valid |
-| `PRAGMA integrity_check` | 0.10 | Full DB integrity passes |
 ---
@@ -102,7 +91,7 @@ Unlike simplistic environments that merely string-match SQL schemas, this enviro
 # Install dependencies
 pip install -r requirements.txt
-# Run baseline inference
 export HF_TOKEN=your_token_here
 export API_BASE_URL=https://router.huggingface.co/v1
 export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
@@ -110,6 +99,7 @@ python inference.py
 # Run validation tests
 python test_smoke.py
 # Start environment server locally
 uvicorn server.app:app --host 0.0.0.0 --port 7860
@@ -125,7 +115,7 @@ uvicorn server.app:app --host 0.0.0.0 --port 7860
 | `/reset` | POST | Reset environment, returns initial observation |
 | `/step` | POST | Execute action, returns observation + reward |
 | `/state` | GET | Current environment state |
-| `/tasks` | GET | List all 3 tasks with descriptions |
 | `/grader` | POST | Run grader on all tasks, return scores |
 | `/schema` | GET | OpenEnv schema (action/observation types) |
 | `/ws` | WS | WebSocket for real-time interaction |
@@ -152,10 +142,13 @@ docker run -p 7860:7860 \
 | Task | Score | Steps | Model |
 |------|-------|-------|-------|
-| `column-restructure` | 1.00 | 4 | qwen/qwen3-32b |
-| `table-normalization` | 1.00 | 5-8 | qwen/qwen3-32b |
-| `cascade-migration` | 0.30–0.65 | 15-20 | qwen/qwen3-32b |
-| **Average** | **0.77** | — | — |
 ---
@@ -163,9 +156,10 @@ docker run -p 7860:7860 \
 - [x] `docker build` succeeds
 - [x] `curl /health` returns 200
-- [x] `curl /tasks` returns 3 tasks
 - [x] `curl -X POST /reset` returns valid observation
 - [x] `openenv validate` passes
-- [x] Baseline script completes all 3 tasks without crashing
-- [x] Grader scores in [0.0, 1.0] range
 - [x] Exploit protection: empty-table shortcuts penalized

 ---
 title: SQL Migration Agent
+emoji: "\U0001F5C4\uFE0F"
 colorFrom: blue
 colorTo: purple
 sdk: docker
 # SQL Schema Migration Agent
 > **An OpenEnv environment for benchmarking autonomous database migration agents.**
 >
+> Built for the Meta x Hugging Face OpenEnv Hackathon.
 ---
 Database schema migrations are among the most error-prone, high-stakes tasks in software engineering. Every production system faces them as application models evolve, yet they are extremely difficult to automate safely because data must be perfectly preserved.
+This environment trains AI agents to autonomously reconcile schema drift the exact way a real CI/CD pipeline would -- given a flawed current state and an ideal target state, the agent must compute and safely execute the transformation sequence using raw SQL.
 **Real-world analogues:** `Flyway`, `Liquibase`, Django `makemigrations`, `Terraform` state transitions. This environment models that exact problem, reduced to an agentic RL core.
 Unlike simplistic environments that merely string-match SQL schemas, this environment uses a **deep structural reconciliation grader** built specifically to prevent LLM gamification:
+1. **Zero-Sum Exploit Protection:** Naive agents will often execute `DROP TABLE x; CREATE TABLE x (...)` to easily match the target schema, silently destroying all data. Our grader actively runs `SELECT COUNT(*)`, `SUM(id)`, and data-integrity fingerprinting. If a table's schema matches but the data is gone, the score is brutally clamped to `0.01`.
+2. **PRAGMA Bypass Prevention:** The grader re-asserts `PRAGMA foreign_keys = ON` before every scoring pass, preventing agents from disabling FK constraints to cheat.
+3. **Granular Partial Credit:** Multi-step migrations (like Task 7's 6-to-4 table consolidation) require 18+ steps. Binary pass/fail rewards provide zero learning signal. Our grader assigns fractional weights to individual FK constraints, data type coercions, and orphaned record audit logs, providing continuous RL reward gradients.
+4. **Deterministic Adversarial Seeds:** Our injected data includes edge cases that break naive SQL: `O'Brien` (apostrophes), `$1,234.56` (comma+dollar coercion), orphaned foreign keys, NULL emails, and leading whitespace in emails.
 ---
+## Tasks (2 Easy / 3 Medium / 2 Hard)
+| # | Name | Difficulty | Steps | Description |
+|---|------|-----------|-------|-------------|
+| 1 | `column-restructure` | Easy | 10 | Merge `first_name` + `last_name` into `full_name` without data loss. Adversarial: apostrophes (`O'Brien`), mid-caps (`McDonald`) |
+| 2 | `soft-delete-restoration` | Easy | 10 | Restore deleted products from `deletion_log`, add `is_deleted`/`deleted_at` columns. Adversarial: `stock=0` must not be confused with `is_deleted=1` |
+| 3 | `table-normalization` | Medium | 15 | Decompose flat `purchases` into `customers` + `orders` with FK. Adversarial: duplicate emails (x3), commas in item names |
+| 4 | `schema-version-merge` | Medium | 15 | Merge overlapping `products_v1` (TEXT prices) and `products_v2` (REAL prices) with conflict resolution and `source` tracking. Adversarial: `$XX.XX` coercion, NULL category, high ID=101 |
+| 5 | `multi-entity-extraction` | Medium | 15 | Decompose `sales_records` god-table into 3NF (5 tables) with 3 FKs and invalid data routing. Adversarial: leading whitespace email, empty email, comma in SKU |
+| 6 | `cascade-migration` | Hard | 20 | 4-table FK cascade: type coercion (`$90000` TEXT to `90000` INTEGER), orphan audit logging, NULL salary removal, full FK chain enforcement |
+| 7 | `dual-source-consolidation` | Hard | 20 | Merge 6 tables from two incompatible systems (Legacy CRM + Modern SaaS) into 4 unified tables with cross-system email dedup, currency coercion, orphan detection |
 ---
 | `current_schema_sql` | `str` | Current database DDL extracted from `sqlite_master` |
 | `target_schema_sql` | `str` | Target DDL the agent must reach |
 | `last_execution_result` | `str` | Result of last SQL execution, or error message |
+| `step_number` | `int` | Current step count |
+| `migration_progress` | `float` | Current grader score [0.01-0.99] |
 | `task_name` | `str` | Name of the active task |
 | `done` | `bool` | Whether the episode has terminated |
 | `reward` | `float` | Step reward: score delta from previous step (can be negative) |
 ## Reward Function
+- **Step reward**: Delta between current and previous migration score. Strongly negative for destructive actions (e.g., wrong DROP TABLE leads to -0.4).
+- **Episode score**: Clamped to (0.01, 0.99). Final state wins -- regressions hurt.
+- **Exploit protection**: If schema matches target but tables are empty (agent deleted data), score is capped at 0.01.
+- **PRAGMA protection**: `PRAGMA foreign_keys = ON` is re-asserted before every grading pass.
+- **Auto-termination**: Episode ends immediately when score reaches 0.99, preventing post-success regression.
 ---
 # Install dependencies
 pip install -r requirements.txt
+# Run baseline inference (requires HF_TOKEN)
 export HF_TOKEN=your_token_here
 export API_BASE_URL=https://router.huggingface.co/v1
 export MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
 # Run validation tests
 python test_smoke.py
+python test_all_tasks.py
 # Start environment server locally
 uvicorn server.app:app --host 0.0.0.0 --port 7860
 | `/reset` | POST | Reset environment, returns initial observation |
 | `/step` | POST | Execute action, returns observation + reward |
 | `/state` | GET | Current environment state |
+| `/tasks` | GET | List all 7 tasks with descriptions |
 | `/grader` | POST | Run grader on all tasks, return scores |
 | `/schema` | GET | OpenEnv schema (action/observation types) |
 | `/ws` | WS | WebSocket for real-time interaction |
 | Task | Score | Steps | Model |
 |------|-------|-------|-------|
+| `column-restructure` | 0.99 | 4 | Qwen/Qwen2.5-72B-Instruct |
+| `soft-delete-restoration` | 0.99 | 5-7 | Qwen/Qwen2.5-72B-Instruct |
+| `table-normalization` | 0.99 | 5-8 | Qwen/Qwen2.5-72B-Instruct |
+| `schema-version-merge` | 0.60-0.85 | 8-12 | Qwen/Qwen2.5-72B-Instruct |
+| `multi-entity-extraction` | 0.40-0.70 | 12-15 | Qwen/Qwen2.5-72B-Instruct |
+| `cascade-migration` | 0.30-0.65 | 15-20 | Qwen/Qwen2.5-72B-Instruct |
+| `dual-source-consolidation` | 0.20-0.50 | 18-20 | Qwen/Qwen2.5-72B-Instruct |
 ---
 - [x] `docker build` succeeds
 - [x] `curl /health` returns 200
+- [x] `curl /tasks` returns 7 tasks
 - [x] `curl -X POST /reset` returns valid observation
 - [x] `openenv validate` passes
+- [x] Baseline script completes all 7 tasks without crashing
+- [x] Grader scores in (0.01, 0.99) range
 - [x] Exploit protection: empty-table shortcuts penalized
+- [x] PRAGMA bypass protection enforced

__pycache__/seeds.cpython-312.pyc CHANGED Viewed

Binary files a/__pycache__/seeds.cpython-312.pyc and b/__pycache__/seeds.cpython-312.pyc differ

inference.py CHANGED Viewed

@@ -30,7 +30,7 @@ HF_TOKEN = os.getenv("HF_TOKEN")  # No default — must be set by user
 # Also support OPENAI_API_KEY as primary (per spec) and API_KEY as alias
 API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN or os.getenv("API_KEY")
-SYSTEM_PROMPT = """You are an autonomous SQLite database migration engine. You receive the current schema and a target schema. Write SQL to transform the current state to the target state without losing row data.
 CRITICAL — SQLite-specific rules (violations cause immediate errors):
 1. SQLite does NOT support ALTER TABLE ADD CONSTRAINT — never use it.
@@ -45,11 +45,22 @@ CRITICAL — SQLite-specific rules (violations cause immediate errors):
 10. Execute exactly ONE SQL statement per step.
 11. When migration is complete (schemas match, data preserved), set submit_final to true IMMEDIATELY.
-Respond ONLY with valid JSON — no markdown, no code blocks, no text outside the object:
-{"sql_command": "your SQL here", "reasoning": "why", "submit_final": false}"""
-ALL_TASKS = ["column-restructure", "table-normalization", "cascade-migration"]
-MAX_STEPS = 20  # 20 gives Task 3 enough budget for 4-table cascade + audit
 MAX_PARSE_ERRORS = 5  # Higher tolerance for thinking models (Qwen3, DeepSeek-R1)
 # Auto-submit threshold: if migration_progress >= this, force submit_final
@@ -139,18 +150,24 @@ def run_task_local(task_name: str) -> dict:
     sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
     from server.environment import DbMigrationEnvironment
     from models import MigrationAction
     env = DbMigrationEnvironment(task_name=task_name)
     print(f"[START] task={task_name} env=sql-migration-agent model={MODEL_NAME}", flush=True)
     obs = env.reset()
-    history = [{"role": "system", "content": SYSTEM_PROMPT}]
-    # Initial observation message
     initial_msg = (
         f"CURRENT DATABASE SCHEMA:\n{obs.current_schema_sql}\n\n"
-        f"TARGET SCHEMA:\n{obs.target_schema_sql}\n\n"
         f"Status: {obs.last_execution_result}\n"
         f"Migration progress: {obs.migration_progress:.2f}\n\n"
         f"Write your first SQL command to begin the migration."
@@ -164,7 +181,7 @@ def run_task_local(task_name: str) -> dict:
     done = False
     peak_score = 0.0  # Track the highest score we've reached
-    for step in range(MAX_STEPS):
         if done:
             break
@@ -253,10 +270,11 @@ def run_task_local(task_name: str) -> dict:
         # Add to conversation history
         history.append({"role": "assistant", "content": json.dumps(action_dict)})
         feedback_msg = (
             f"EXECUTION RESULT: {obs.last_execution_result}\n\n"
             f"CURRENT SCHEMA:\n{obs.current_schema_sql}\n\n"
-            f"Migration progress: {obs.migration_progress:.2f}"
         )
         if done:
             feedback_msg += "\n\nEpisode complete."
@@ -309,11 +327,9 @@ def main():
     # Summary
     scores = list(results.values())
     avg = sum(scores) / len(scores) if scores else 0.0
     print(
-        f"[SUMMARY] task1={results.get('column-restructure', 0):.2f} "
-        f"task2={results.get('table-normalization', 0):.2f} "
-        f"task3={results.get('cascade-migration', 0):.2f} "
-        f"avg={avg:.2f}",
         flush=True,
     )

 # Also support OPENAI_API_KEY as primary (per spec) and API_KEY as alias
 API_KEY = os.getenv("OPENAI_API_KEY") or HF_TOKEN or os.getenv("API_KEY")
+SYSTEM_PROMPT_TEMPLATE = """You are an autonomous SQLite database migration engine. You receive the current schema and a target schema. Write SQL to transform the current state to the target state without losing row data.
 CRITICAL — SQLite-specific rules (violations cause immediate errors):
 1. SQLite does NOT support ALTER TABLE ADD CONSTRAINT — never use it.
 10. Execute exactly ONE SQL statement per step.
 11. When migration is complete (schemas match, data preserved), set submit_final to true IMMEDIATELY.
+TARGET SCHEMA (fixed — achieve this exactly):
+{target_ddl}
+Respond ONLY with valid JSON — no markdown, no code blocks, no text outside the object:
+{{"sql_command": "your SQL here", "reasoning": "why", "submit_final": false}}"""
+ALL_TASKS = [
+    "column-restructure",
+    "soft-delete-restoration",
+    "table-normalization",
+    "schema-version-merge",
+    "multi-entity-extraction",
+    "cascade-migration",
+    "dual-source-consolidation",
+]
+MAX_STEPS = 20  # Global fallback; per-task limits override this
 MAX_PARSE_ERRORS = 5  # Higher tolerance for thinking models (Qwen3, DeepSeek-R1)
 # Auto-submit threshold: if migration_progress >= this, force submit_final
     sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
     from server.environment import DbMigrationEnvironment
     from models import MigrationAction
+    import seeds
     env = DbMigrationEnvironment(task_name=task_name)
+    # Use task-specific step budget (defaults to global MAX_STEPS)
+    task_max_steps = seeds.TASKS.get(task_name, {}).get("max_steps", MAX_STEPS)
     print(f"[START] task={task_name} env=sql-migration-agent model={MODEL_NAME}", flush=True)
     obs = env.reset()
+    # Build task-specific system prompt with target DDL baked in (sent ONCE)
+    task_system_prompt = SYSTEM_PROMPT_TEMPLATE.format(target_ddl=obs.target_schema_sql)
+    history = [{"role": "system", "content": task_system_prompt}]
+    # Initial observation — only current schema (target already in system prompt)
     initial_msg = (
         f"CURRENT DATABASE SCHEMA:\n{obs.current_schema_sql}\n\n"
         f"Status: {obs.last_execution_result}\n"
         f"Migration progress: {obs.migration_progress:.2f}\n\n"
         f"Write your first SQL command to begin the migration."
     done = False
     peak_score = 0.0  # Track the highest score we've reached
+    for step in range(task_max_steps):
         if done:
             break
         # Add to conversation history
         history.append({"role": "assistant", "content": json.dumps(action_dict)})
+        # Lean feedback — target is already in the system prompt, no need to repeat
         feedback_msg = (
             f"EXECUTION RESULT: {obs.last_execution_result}\n\n"
             f"CURRENT SCHEMA:\n{obs.current_schema_sql}\n\n"
+            f"Progress: {obs.migration_progress:.2f}"
         )
         if done:
             feedback_msg += "\n\nEpisode complete."
     # Summary
     scores = list(results.values())
     avg = sum(scores) / len(scores) if scores else 0.0
+    scores_str = " ".join(f"{t}={s:.2f}" for t, s in results.items())
     print(
+        f"[SUMMARY] {scores_str} avg={avg:.2f}",
         flush=True,
     )

seeds.py CHANGED Viewed

@@ -255,6 +255,429 @@ def seed_task3(conn: sqlite3.Connection) -> None:
     conn.commit()
 # =============================================================================
 # Task Registry
 # =============================================================================
@@ -265,17 +688,49 @@ TASKS = {
         "target_ddl": TASK1_TARGET_DDL,
         "description": "Merge first_name and last_name into a single full_name column without data loss",
         "difficulty": "easy",
     },
     "table-normalization": {
         "seed_fn": seed_task2,
         "target_ddl": TASK2_TARGET_DDL,
         "description": "Decompose a flat purchases table into normalized customers and orders tables with FK",
         "difficulty": "medium",
     },
     "cascade-migration": {
         "seed_fn": seed_task3,
         "target_ddl": TASK3_TARGET_DDL,
         "description": "Multi-table FK cascade with type coercion, NULL handling, and orphan audit logging",
         "difficulty": "hard",
     },
 }

     conn.commit()
+# =============================================================================
+# TASK 4: Soft-Delete Restoration (Easy)
+# =============================================================================
+# Agent must restore deleted products from a deletion_log, add is_deleted/deleted_at columns.
+# Adversarial: "O'Brien Desk" (apostrophe), stock=0 on Webcam (must NOT confuse with is_deleted).
+TASK4_SOURCE_DDL = """
+CREATE TABLE products (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    price REAL NOT NULL,
+    stock INTEGER NOT NULL
+);
+CREATE TABLE deletion_log (
+    id INTEGER PRIMARY KEY,
+    product_id INTEGER NOT NULL,
+    product_name TEXT NOT NULL,
+    product_price REAL NOT NULL,
+    product_stock INTEGER NOT NULL,
+    deleted_at TEXT NOT NULL
+);
+"""
+TASK4_PRODUCTS_DATA = [
+    (1, "Laptop",       999.99, 15),
+    (2, "O'Brien Desk", 249.99, 8),
+    (3, "Monitor",      399.99, 23),
+    (4, "Keyboard",     89.99,  45),
+    (5, "Mouse",        29.99,  102),
+]
+TASK4_DELETION_LOG_DATA = [
+    (1, 6, "Headphones", 149.99, 30, "2024-01-15"),
+    (2, 7, "Webcam",      79.99,  0, "2024-02-20"),   # stock=0 but NOT is_deleted=1
+    (3, 8, "USB-C Hub",   49.99, 12, "2024-03-10"),
+]
+TASK4_TARGET_DDL = """CREATE TABLE products (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    price REAL NOT NULL,
+    stock INTEGER NOT NULL,
+    is_deleted INTEGER NOT NULL DEFAULT 0,
+    deleted_at TEXT
+);"""
+TASK4_EXPECTED_ROW_COUNT = 8
+TASK4_EXPECTED_ID_SUM = 36           # 1+2+3+4+5+6+7+8
+TASK4_EXPECTED_DELETED_COUNT = 3     # ids 6,7,8
+TASK4_EXPECTED_ACTIVE_COUNT = 5      # ids 1-5
+def seed_task4(conn: sqlite3.Connection) -> None:
+    """Seed the database for Task 4: Soft-Delete Restoration."""
+    conn.executescript(TASK4_SOURCE_DDL)
+    conn.executemany(
+        "INSERT INTO products (id, name, price, stock) VALUES (?, ?, ?, ?)",
+        TASK4_PRODUCTS_DATA,
+    )
+    conn.executemany(
+        "INSERT INTO deletion_log (id, product_id, product_name, product_price, product_stock, deleted_at) "
+        "VALUES (?, ?, ?, ?, ?, ?)",
+        TASK4_DELETION_LOG_DATA,
+    )
+    conn.commit()
+# =============================================================================
+# TASK 5: Schema Version Merge (Medium)
+# =============================================================================
+# Agent must merge products_v1 (price as "$XX.XX" TEXT) and products_v2 (price as REAL)
+# into a single products table. v2 wins on ID conflicts. Must add source column.
+# Adversarial: id=101 high ID, NULL category, "$" price coercion, conflicting rows.
+TASK5_SOURCE_DDL = """
+CREATE TABLE products_v1 (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    price TEXT NOT NULL,
+    category TEXT,
+    supplier TEXT
+);
+CREATE TABLE products_v2 (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    unit_cost REAL NOT NULL,
+    category TEXT NOT NULL,
+    brand TEXT,
+    sku TEXT
+);
+"""
+TASK5_V1_DATA = [
+    (1,   "Widget A",    "$12.50", "Electronics", "AcmeCo"),
+    (2,   "Widget B",    "$8.99",  "Electronics", "AcmeCo"),
+    (3,   "Gadget X",    "$45.00", None,          "TechCorp"),
+    (4,   "Gadget Y",    "$32.50", "Tools",       "TechCorp"),
+    (5,   "Doohickey",   "$5.99",  "Office",      "SupplyPro"),
+    (101, "Legacy Item", "$99.99", "Electronics", "OldCo"),
+]
+TASK5_V2_DATA = [
+    (1, "Widget A",          12.50, "Electronics", "AcmeCo",  "SKU-001"),
+    (2, "Widget B Updated",   9.99, "Electronics", "AcmeCo",  "SKU-002"),
+    (6, "New Product F",     67.00, "Tools",       "NewCorp", "SKU-006"),
+    (7, "New Product G",     23.50, "Office",      "NewCorp", "SKU-007"),
+    (8, "New Product H",     11.00, "Electronics", "ImportCo","SKU-008"),
+]
+TASK5_TARGET_DDL = """CREATE TABLE products (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    price REAL NOT NULL,
+    category TEXT,
+    supplier TEXT,
+    brand TEXT,
+    sku TEXT,
+    source TEXT NOT NULL
+);"""
+TASK5_EXPECTED_ROW_COUNT = 9
+TASK5_EXPECTED_PRICE_SUM = round(12.50 + 9.99 + 45.00 + 32.50 + 5.99 + 99.99 + 67.00 + 23.50 + 11.00, 2)
+TASK5_EXPECTED_BOTH_COUNT = 2      # ids 1 and 2
+def seed_task5(conn: sqlite3.Connection) -> None:
+    """Seed the database for Task 5: Schema Version Merge."""
+    conn.executescript(TASK5_SOURCE_DDL)
+    conn.executemany(
+        "INSERT INTO products_v1 (id, name, price, category, supplier) VALUES (?, ?, ?, ?, ?)",
+        TASK5_V1_DATA,
+    )
+    conn.executemany(
+        "INSERT INTO products_v2 (id, name, unit_cost, category, brand, sku) VALUES (?, ?, ?, ?, ?, ?)",
+        TASK5_V2_DATA,
+    )
+    conn.commit()
+# =============================================================================
+# TASK 6: Multi-Entity Extraction (Medium — Hard End)
+# =============================================================================
+# Agent must decompose a sales_records god-table into 3NF (5 tables).
+# Adversarial: leading whitespace email, empty customer email, comma in SKU.
+TASK6_SOURCE_DDL = """
+CREATE TABLE sales_records (
+    id INTEGER PRIMARY KEY,
+    rep_name TEXT NOT NULL,
+    rep_email TEXT NOT NULL,
+    rep_region TEXT NOT NULL,
+    customer_name TEXT NOT NULL,
+    customer_email TEXT NOT NULL,
+    customer_tier TEXT NOT NULL,
+    product_name TEXT NOT NULL,
+    product_sku TEXT NOT NULL,
+    product_category TEXT NOT NULL,
+    quantity INTEGER NOT NULL,
+    unit_price REAL NOT NULL,
+    discount_pct INTEGER NOT NULL DEFAULT 0,
+    sale_date TEXT NOT NULL
+);
+"""
+TASK6_SOURCE_DATA = [
+    (1,  "Alice Chen",   " alice@company.com", "North", "Globex Corp",    "globex@corp.com",   "enterprise", "Widget Pro",  "WIDGET-001", "Electronics", 5,  299.99, 10, "2024-01-10"),
+    (2,  "Alice Chen",   "alice@company.com",  "North", "Initech LLC",    "info@initech.com",  "basic",      "Widget Pro",  "WIDGET-001", "Electronics", 2,  299.99, 0,  "2024-01-15"),
+    (3,  "Bob Martinez", "bob@company.com",    "South", "Globex Corp",    "globex@corp.com",   "enterprise", "Gadget X",    "GADGET-X01", "Hardware",    10, 89.99,  5,  "2024-01-20"),
+    (4,  "Bob Martinez", "bob@company.com",    "South", "Umbrella Inc",   "sales@umbrella.co", "premium",    "Gadget X",    "GADGET-X01", "Hardware",    3,  89.99,  0,  "2024-02-01"),
+    (5,  "Carol White",  "carol@company.com",  "East",  "Initech LLC",    "info@initech.com",  "basic",      "Tool Kit",    "TOOLS,001",  "Hardware",    1,  199.99, 0,  "2024-02-05"),
+    (6,  "Alice Chen",   "alice@company.com",  "North", "Pendant Corp",   "",                  "free",       "Widget Pro",  "WIDGET-001", "Electronics", 7,  299.99, 15, "2024-02-10"),
+    (7,  "Carol White",  "carol@company.com",  "East",  "Globex Corp",    "globex@corp.com",   "enterprise", "Nano Device", "NANO-D01",   "Electronics", 2,  549.99, 20, "2024-02-15"),
+    (8,  "Bob Martinez", "bob@company.com",    "South", "Umbrella Inc",   "sales@umbrella.co", "premium",    "Tool Kit",    "TOOLS,001",  "Hardware",    4,  199.99, 10, "2024-03-01"),
+    (9,  "Alice Chen",   "alice@company.com",  "North", "Initech LLC",    "info@initech.com",  "basic",      "Nano Device", "NANO-D01",   "Electronics", 1,  549.99, 0,  "2024-03-05"),
+    (10, "Carol White",  "carol@company.com",  "East",  "Umbrella Inc",   "sales@umbrella.co", "premium",    "Cable Bundle","CABLE-5PK",  "Accessories", 20, 14.99,  0,  "2024-03-10"),
+    (11, "Bob Martinez", "bob@company.com",    "South", "Globex Corp",    "globex@corp.com",   "enterprise", "Cable Bundle","CABLE-5PK",  "Accessories", 15, 14.99,  5,  "2024-03-15"),
+    (12, "Carol White",  "carol@company.com",  "East",  "Pendant Corp",   "orders@pendant.io", "free",       "Gadget X",    "GADGET-X01", "Hardware",    6,  89.99,  0,  "2024-03-20"),
+]
+TASK6_TARGET_DDL = """CREATE TABLE salespersons (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    email TEXT NOT NULL UNIQUE,
+    region TEXT NOT NULL
+);
+CREATE TABLE customers (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    email TEXT NOT NULL UNIQUE,
+    tier TEXT NOT NULL
+);
+CREATE TABLE products (
+    id INTEGER PRIMARY KEY,
+    name TEXT NOT NULL,
+    sku TEXT NOT NULL UNIQUE,
+    category TEXT NOT NULL
+);
+CREATE TABLE sales (
+    id INTEGER PRIMARY KEY,
+    salesperson_id INTEGER NOT NULL,
+    customer_id INTEGER NOT NULL,
+    product_id INTEGER NOT NULL,
+    quantity INTEGER NOT NULL,
+    unit_price REAL NOT NULL,
+    discount_pct INTEGER NOT NULL DEFAULT 0,
+    sale_date TEXT NOT NULL,
+    FOREIGN KEY (salesperson_id) REFERENCES salespersons(id),
+    FOREIGN KEY (customer_id) REFERENCES customers(id),
+    FOREIGN KEY (product_id) REFERENCES products(id)
+);
+CREATE TABLE data_issues (
+    id INTEGER PRIMARY KEY,
+    source_table TEXT NOT NULL,
+    source_row_id INTEGER NOT NULL,
+    issue_type TEXT NOT NULL,
+    issue_detail TEXT NOT NULL
+);"""
+TASK6_EXPECTED_SALESPERSON_COUNT = 3
+TASK6_EXPECTED_CUSTOMER_COUNT = 3   # Pendant Corp row 6 excluded (empty email)
+TASK6_EXPECTED_PRODUCT_COUNT = 5
+TASK6_EXPECTED_SALES_COUNT = 11     # row 6 excluded
+TASK6_EXPECTED_DATA_ISSUES_COUNT = 1
+def seed_task6(conn: sqlite3.Connection) -> None:
+    """Seed the database for Task 6: Multi-Entity Extraction."""
+    conn.executescript(TASK6_SOURCE_DDL)
+    conn.executemany(
+        "INSERT INTO sales_records (id, rep_name, rep_email, rep_region, "
+        "customer_name, customer_email, customer_tier, product_name, product_sku, "
+        "product_category, quantity, unit_price, discount_pct, sale_date) "
+        "VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
+        TASK6_SOURCE_DATA,
+    )
+    conn.commit()
+# =============================================================================
+# TASK 7: Dual-Source Consolidation (Hard)
+# =============================================================================
+# Agent must merge 6 source tables from two incompatible systems (Legacy CRM + Modern SaaS)
+# into 4 unified target tables. Cross-system email dedup, currency coercion, orphan detection.
+TASK7_LEGACY_CUSTOMERS_DDL = """
+CREATE TABLE legacy_customers (
+    id INTEGER PRIMARY KEY,
+    full_name TEXT,
+    contact_email TEXT,
+    phone TEXT,
+    account_type TEXT,
+    join_date TEXT
+);
+"""
+TASK7_LEGACY_ORDERS_DDL = """
+CREATE TABLE legacy_orders (
+    id INTEGER PRIMARY KEY,
+    customer_id INTEGER,
+    product_code TEXT,
+    total_amount TEXT,
+    order_status TEXT,
+    order_date TEXT
+);
+"""
+TASK7_LEGACY_PRODUCTS_DDL = """
+CREATE TABLE legacy_products (
+    code TEXT PRIMARY KEY,
+    description TEXT,
+    unit_price TEXT
+);
+"""
+TASK7_MODERN_USERS_DDL = """
+CREATE TABLE modern_users (
+    uuid TEXT PRIMARY KEY,
+    display_name TEXT,
+    email_address TEXT,
+    subscription_tier INTEGER,
+    created_at TEXT
+);
+"""
+TASK7_MODERN_TRANSACTIONS_DDL = """
+CREATE TABLE modern_transactions (
+    id INTEGER PRIMARY KEY,
+    user_uuid TEXT,
+    item_sku TEXT,
+    amount REAL,
+    currency TEXT,
+    tx_status INTEGER,
+    created_at TEXT
+);
+"""
+TASK7_MODERN_CATALOG_DDL = """
+CREATE TABLE modern_catalog (
+    sku TEXT PRIMARY KEY,
+    title TEXT,
+    base_price REAL
+);
+"""
+TASK7_LEGACY_CUSTOMERS_DATA = [
+    (1, "Alice Johnson", "alice@example.com", "+1-555-0101", "premium", "2021-03-15"),
+    (2, "Bob Chen",      "bob@example.com",   "+1-555-0102", "basic",   "2022-07-01"),
+    (3, "Carol Davis",   None,                "+1-555-0103", "free",    "2023-01-10"),
+    (4, "Dave Wilson",   "dave@example.com",  "+1-555-0104", "premium", "2021-11-20"),
+    (5, "Eve Martinez",  "eve@example.com",   "+1-555-0105", "free",    "2023-06-05"),
+]
+TASK7_MODERN_USERS_DATA = [
+    ("uuid-A1", "Alice J.",    "alice@example.com", 3, "2021-03-15"),
+    ("uuid-B2", "R. Bob Chen", "bob@example.com",   2, "2022-07-01"),
+    ("uuid-F6", "Frank Lee",   "frank@example.com", 4, "2022-09-30"),
+    ("uuid-G7", "Grace Kim",   "grace@example.com", 1, "2024-01-15"),
+]
+TASK7_LEGACY_ORDERS_DATA = [
+    (1, 1, "PROD-A", "$1,234.56", "delivered", "2022-01-10"),
+    (2, 2, "PROD-B", "$89.99",    "shipped",   "2022-03-15"),
+    (3, 4, "PROD-A", "$2,500.00", "delivered", "2022-05-20"),
+    (4, 3, "PROD-C", "$45.00",    "pending",   "2023-02-01"),
+]
+TASK7_LEGACY_PRODUCTS_DATA = [
+    ("PROD-A", "Enterprise Widget",   "$1,234.56"),
+    ("PROD-B", "Basic Gadget",        "$89.99"),
+    ("PROD-C", "Starter Kit",         "$45.00"),
+]
+TASK7_MODERN_TRANSACTIONS_DATA = [
+    (1, "uuid-A1",   "SKU-001", 299.99, "USD",  3, "2023-06-01"),
+    (2, "uuid-B2",   "SKU-002", 89.99,  None,   2, "2023-07-15"),
+    (3, "uuid-F6",   "SKU-001", 299.99, None,   3, "2023-08-20"),
+    (4, "uuid-DEAD", "SKU-003", 15.99,  None,   1, "2023-09-01"),   # orphan
+    (5, "uuid-G7",   "SKU-002", 89.99,  "USD",  4, "2023-10-10"),
+    (6, "uuid-A1",   "SKU-003", 15.99,  "EUR",  5, "2023-11-01"),
+]
+TASK7_MODERN_CATALOG_DATA = [
+    ("SKU-001", "Pro Widget",    299.99),
+    ("SKU-002", "Smart Gadget",  89.99),
+    ("SKU-003", "Mini Accessory", 15.99),
+]
+TASK7_TARGET_DDL = """CREATE TABLE unified_customers (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    legacy_id INTEGER,
+    modern_uuid TEXT,
+    name TEXT,
+    email TEXT,
+    phone TEXT,
+    tier TEXT NOT NULL DEFAULT 'free',
+    source TEXT NOT NULL,
+    created_at TEXT
+);
+CREATE TABLE unified_products (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    code TEXT NOT NULL UNIQUE,
+    title TEXT NOT NULL,
+    price REAL NOT NULL,
+    source TEXT NOT NULL
+);
+CREATE TABLE unified_orders (
+    id INTEGER PRIMARY KEY AUTOINCREMENT,
+    customer_id INTEGER NOT NULL,
+    product_id INTEGER,
+    amount REAL NOT NULL,
+    currency TEXT NOT NULL DEFAULT 'USD',
+    status TEXT NOT NULL,
+    order_date TEXT,
+    source TEXT NOT NULL,
+    FOREIGN KEY (customer_id) REFERENCES unified_customers(id)
+);
+CREATE TABLE migration_issues (
+    id INTEGER PRIMARY KEY,
+    source_system TEXT NOT NULL,
+    source_table TEXT NOT NULL,
+    source_id TEXT NOT NULL,
+    issue_type TEXT NOT NULL,
+    resolution TEXT NOT NULL
+);"""
+TASK7_EXPECTED_UNIFIED_CUSTOMERS = 7
+TASK7_EXPECTED_BOTH_SOURCE_COUNT = 2
+TASK7_EXPECTED_UNIFIED_ORDERS = 9
+TASK7_EXPECTED_MIGRATION_ISSUES = 2
+# Tier mapping: 1→'free', 2→'basic', 3→'premium', 4→'enterprise'
+TASK7_TIER_MAP = {1: "free", 2: "basic", 3: "premium", 4: "enterprise"}
+# Status mapping: 1→'pending', 2→'processing', 3→'complete', 4→'failed', 5→'refunded'
+TASK7_STATUS_MAP = {1: "pending", 2: "processing", 3: "complete", 4: "failed", 5: "refunded"}
+def seed_task7(conn: sqlite3.Connection) -> None:
+    """Seed the database for Task 7: Dual-Source Consolidation."""
+    conn.executescript(TASK7_LEGACY_CUSTOMERS_DDL)
+    conn.executescript(TASK7_LEGACY_ORDERS_DDL)
+    conn.executescript(TASK7_LEGACY_PRODUCTS_DDL)
+    conn.executescript(TASK7_MODERN_USERS_DDL)
+    conn.executescript(TASK7_MODERN_TRANSACTIONS_DDL)
+    conn.executescript(TASK7_MODERN_CATALOG_DDL)
+    conn.executemany("INSERT INTO legacy_customers VALUES (?, ?, ?, ?, ?, ?)", TASK7_LEGACY_CUSTOMERS_DATA)
+    conn.executemany("INSERT INTO legacy_orders VALUES (?, ?, ?, ?, ?, ?)", TASK7_LEGACY_ORDERS_DATA)
+    conn.executemany("INSERT INTO legacy_products VALUES (?, ?, ?)", TASK7_LEGACY_PRODUCTS_DATA)
+    conn.executemany("INSERT INTO modern_users VALUES (?, ?, ?, ?, ?)", TASK7_MODERN_USERS_DATA)
+    conn.executemany("INSERT INTO modern_transactions VALUES (?, ?, ?, ?, ?, ?, ?)", TASK7_MODERN_TRANSACTIONS_DATA)
+    conn.executemany("INSERT INTO modern_catalog VALUES (?, ?, ?)", TASK7_MODERN_CATALOG_DATA)
+    conn.commit()
 # =============================================================================
 # Task Registry
 # =============================================================================
         "target_ddl": TASK1_TARGET_DDL,
         "description": "Merge first_name and last_name into a single full_name column without data loss",
         "difficulty": "easy",
+        "max_steps": 10,
+    },
+    "soft-delete-restoration": {
+        "seed_fn": seed_task4,
+        "target_ddl": TASK4_TARGET_DDL,
+        "description": "Restore deleted products from deletion_log, add is_deleted/deleted_at columns",
+        "difficulty": "easy",
+        "max_steps": 10,
     },
     "table-normalization": {
         "seed_fn": seed_task2,
         "target_ddl": TASK2_TARGET_DDL,
         "description": "Decompose a flat purchases table into normalized customers and orders tables with FK",
         "difficulty": "medium",
+        "max_steps": 15,
+    },
+    "schema-version-merge": {
+        "seed_fn": seed_task5,
+        "target_ddl": TASK5_TARGET_DDL,
+        "description": "Merge overlapping v1/v2 product tables with price coercion and conflict resolution",
+        "difficulty": "medium",
+        "max_steps": 15,
+    },
+    "multi-entity-extraction": {
+        "seed_fn": seed_task6,
+        "target_ddl": TASK6_TARGET_DDL,
+        "description": "Decompose a sales god-table into 3NF with 3 FKs and invalid data routing",
+        "difficulty": "medium",
+        "max_steps": 15,
     },
     "cascade-migration": {
         "seed_fn": seed_task3,
         "target_ddl": TASK3_TARGET_DDL,
         "description": "Multi-table FK cascade with type coercion, NULL handling, and orphan audit logging",
         "difficulty": "hard",
+        "max_steps": 20,
+    },
+    "dual-source-consolidation": {
+        "seed_fn": seed_task7,
+        "target_ddl": TASK7_TARGET_DDL,
+        "description": "Merge 6 tables from two incompatible systems into 4 unified tables with cross-system dedup",
+        "difficulty": "hard",
+        "max_steps": 20,
     },
 }

server/__pycache__/environment.cpython-312.pyc CHANGED Viewed

Binary files a/server/__pycache__/environment.cpython-312.pyc and b/server/__pycache__/environment.cpython-312.pyc differ

server/__pycache__/grader.cpython-312.pyc CHANGED Viewed

Binary files a/server/__pycache__/grader.cpython-312.pyc and b/server/__pycache__/grader.cpython-312.pyc differ

server/app.py CHANGED Viewed

@@ -61,35 +61,40 @@ async def root():
     return """<!DOCTYPE html>
 <html>
 <head>
-    <title>SQL Migration Agent — OpenEnv</title>
     <style>
         body { font-family: monospace; background: #0d1117; color: #e6edf3; padding: 40px; }
         h1 { color: #58a6ff; } h2 { color: #79c0ff; }
         .ok { color: #3fb950; } .endpoint { color: #d2a8ff; }
         pre { background: #161b22; padding: 12px; border-radius: 6px; }
         a { color: #58a6ff; }
     </style>
 </head>
 <body>
-    <h1>🗄️ SQL Schema Migration Agent</h1>
-    <p class="ok">✅ Server running — OpenEnv hackathon environment</p>
     <h2>API Endpoints</h2>
     <pre>
-<span class="endpoint">POST /reset</span>   — Start a new migration episode
-<span class="endpoint">POST /step</span>    — Execute a SQL action
-<span class="endpoint">GET  /state</span>   — Current environment state
-<span class="endpoint">GET  /tasks</span>   — List all 3 tasks
-<span class="endpoint">POST /grader</span>  — Run grader on all tasks
-<span class="endpoint">GET  /health</span>  — Health check
-<span class="endpoint">GET  /docs</span>    — Interactive API documentation
     </pre>
-    <h2>Tasks</h2>
     <pre>
-1. column-restructure   (Easy)   — Merge first_name + last_name → full_name
-2. table-normalization  (Medium) — Normalize purchases → customers + orders + FK
-3. cascade-migration    (Hard)   — 4-table FK cascade, type coercion, orphan audit
     </pre>
-    <p><a href="/docs">📖 Open API Docs</a> | <a href="/tasks">📋 View Tasks</a> | <a href="/health">💚 Health Check</a></p>
 </body>
 </html>"""
@@ -101,31 +106,27 @@ async def list_tasks() -> Dict[str, Any]:
     Returns JSON with task definitions and action schema for automated validation.
     """
     return {
-        "tasks": [
-            {
-                "name": "column-restructure",
-                "description": "Merge first_name and last_name into a single full_name column without data loss",
-                "difficulty": "easy",
-                "max_steps": 20,
-            },
-            {
-                "name": "table-normalization",
-                "description": "Decompose a flat purchases table into normalized customers and orders tables with FK",
-                "difficulty": "medium",
-                "max_steps": 20,
-            },
-            {
-                "name": "cascade-migration",
-                "description": "Multi-table FK cascade with type coercion, NULL handling, and orphan audit logging",
-                "difficulty": "hard",
-                "max_steps": 20,
-            },
-        ],
         "action_schema": {
-            "sql_command": "string — The SQL statement to execute",
-            "reasoning": "string — Explanation of the action (optional)",
-            "submit_final": "boolean — Set true when migration is complete (default: false)",
         },
     }
@@ -141,25 +142,29 @@ async def grade_task(
     Returns per-task grader scores after running the environment's internal scorer.
     """
     task_name = body.get("task_name", None)
-    tasks_to_grade = [task_name] if task_name else ["column-restructure", "table-normalization", "cascade-migration"]
     results = {}
     for t in tasks_to_grade:
         try:
             env = DbMigrationEnvironment(task_name=t)
             obs = env.reset()
-            # Return the initial score (before any agent action)
-            # This proves the grader works and returns values in [0.0, 1.0]
             results[t] = {
-                "initial_score": max(0.0, min(1.0, obs.migration_progress)),
                 "grader_functional": True,
-                "reward_range": [0.0, 1.0],
-                "max_steps": 20,
             }
             env.close()
         except Exception as e:
             results[t] = {
-                "initial_score": 0.0,
                 "grader_functional": False,
                 "error": str(e),
             }

     return """<!DOCTYPE html>
 <html>
 <head>
+    <title>SQL Migration Agent -- OpenEnv</title>
     <style>
         body { font-family: monospace; background: #0d1117; color: #e6edf3; padding: 40px; }
         h1 { color: #58a6ff; } h2 { color: #79c0ff; }
         .ok { color: #3fb950; } .endpoint { color: #d2a8ff; }
         pre { background: #161b22; padding: 12px; border-radius: 6px; }
         a { color: #58a6ff; }
+        .easy { color: #3fb950; } .medium { color: #d29922; } .hard { color: #f85149; }
     </style>
 </head>
 <body>
+    <h1>SQL Schema Migration Agent</h1>
+    <p class="ok">Server running -- OpenEnv hackathon environment (7 tasks)</p>
     <h2>API Endpoints</h2>
     <pre>
+<span class="endpoint">POST /reset</span>   -- Start a new migration episode
+<span class="endpoint">POST /step</span>    -- Execute a SQL action
+<span class="endpoint">GET  /state</span>   -- Current environment state
+<span class="endpoint">GET  /tasks</span>   -- List all 7 tasks
+<span class="endpoint">POST /grader</span>  -- Run grader on all tasks
+<span class="endpoint">GET  /health</span>  -- Health check
+<span class="endpoint">GET  /docs</span>    -- Interactive API documentation
     </pre>
+    <h2>Tasks (2 Easy / 3 Medium / 2 Hard)</h2>
     <pre>
+<span class="easy">1. column-restructure      (Easy)   -- Merge first_name + last_name -> full_name</span>
+<span class="easy">2. soft-delete-restoration  (Easy)   -- Restore deleted products from deletion_log</span>
+<span class="medium">3. table-normalization      (Medium) -- Normalize purchases -> customers + orders + FK</span>
+<span class="medium">4. schema-version-merge     (Medium) -- Merge v1/v2 product tables with coercion</span>
+<span class="medium">5. multi-entity-extraction  (Medium) -- 3NF decomposition with invalid data routing</span>
+<span class="hard">6. cascade-migration        (Hard)   -- 4-table FK cascade, type coercion, orphan audit</span>
+<span class="hard">7. dual-source-consolidation(Hard)   -- 6->4 table merge, cross-system email dedup</span>
     </pre>
+    <p><a href="/docs">Open API Docs</a> | <a href="/tasks">View Tasks</a> | <a href="/health">Health Check</a></p>
 </body>
 </html>"""
     Returns JSON with task definitions and action schema for automated validation.
     """
+    # Import seeds to dynamically build task list
+    try:
+        from .. import seeds as _seeds
+    except ImportError:
+        import seeds as _seeds
+    task_list = []
+    for name, cfg in _seeds.TASKS.items():
+        task_list.append({
+            "name": name,
+            "description": cfg["description"],
+            "difficulty": cfg["difficulty"],
+            "max_steps": cfg.get("max_steps", 20),
+        })
     return {
+        "tasks": task_list,
         "action_schema": {
+            "sql_command": "string -- The SQL statement to execute",
+            "reasoning": "string -- Explanation of the action (optional)",
+            "submit_final": "boolean -- Set true when migration is complete (default: false)",
         },
     }
     Returns per-task grader scores after running the environment's internal scorer.
     """
     task_name = body.get("task_name", None)
+    try:
+        from .. import seeds as _seeds
+    except ImportError:
+        import seeds as _seeds
+    tasks_to_grade = [task_name] if task_name else list(_seeds.TASKS.keys())
     results = {}
     for t in tasks_to_grade:
         try:
             env = DbMigrationEnvironment(task_name=t)
             obs = env.reset()
             results[t] = {
+                "initial_score": obs.migration_progress,
                 "grader_functional": True,
+                "reward_range": [0.01, 0.99],
+                "max_steps": _seeds.TASKS[t].get("max_steps", 20),
             }
             env.close()
         except Exception as e:
             results[t] = {
+                "initial_score": 0.01,
                 "grader_functional": False,
                 "error": str(e),
             }

server/environment.py CHANGED Viewed

@@ -71,7 +71,8 @@ class DbMigrationEnvironment(Environment):
             return ""
         try:
             cursor = self._conn.execute(
-                "SELECT sql FROM sqlite_master WHERE type='table' AND sql IS NOT NULL ORDER BY name"
             )
             schemas = [row[0] for row in cursor.fetchall()]
             return ";\n\n".join(schemas) + ";" if schemas else ""
@@ -186,6 +187,17 @@ class DbMigrationEnvironment(Environment):
             self._conn.commit()
             rows_affected = cursor.rowcount
             execution_result = f"Success: {rows_affected} rows affected"
         except Exception as e:
             # Never crash — feed the error back to the agent
             execution_result = str(e)
@@ -199,8 +211,9 @@ class DbMigrationEnvironment(Environment):
         # Compute scores
         current_score, step_reward = self._reconciler.compute_step_reward(self._conn)
-        # Episode termination: submit_final, max steps (20), OR perfect score
-        done = action.submit_final or self._step_count >= 20 or current_score >= 0.99
         # Update state
         self._state.step_count = self._step_count

             return ""
         try:
             cursor = self._conn.execute(
+                "SELECT sql FROM sqlite_master WHERE type='table' "
+                "AND sql IS NOT NULL AND name NOT LIKE 'sqlite_%' ORDER BY name"
             )
             schemas = [row[0] for row in cursor.fetchall()]
             return ";\n\n".join(schemas) + ";" if schemas else ""
             self._conn.commit()
             rows_affected = cursor.rowcount
             execution_result = f"Success: {rows_affected} rows affected"
+        except sqlite3.Warning as e:
+            # Multi-statement attempt — agent tried to combine statements
+            execution_result = (
+                f"Error: SQLite requires one statement per step. "
+                f"Split your commands into separate steps. Original error: {e}"
+            )
+            action_error = "multi_statement"
+            try:
+                self._conn.rollback()
+            except Exception:
+                pass
         except Exception as e:
             # Never crash — feed the error back to the agent
             execution_result = str(e)
         # Compute scores
         current_score, step_reward = self._reconciler.compute_step_reward(self._conn)
+        # Episode termination: submit_final, max steps, OR perfect score
+        task_max = self._task_config.get("max_steps", 20)
+        done = action.submit_final or self._step_count >= task_max or current_score >= 0.99
         # Update state
         self._state.step_count = self._step_count

server/grader.py CHANGED Viewed

@@ -29,6 +29,22 @@ from seeds import (
     TASK3_EXPECTED_AUDIT_ENTRIES,
     TASK3_EXPECTED_EMPLOYEE_COUNT,
     TASK3_EXPECTED_SALARIES,
 )
@@ -36,7 +52,8 @@ def _get_table_names(conn: sqlite3.Connection) -> Set[str]:
     """Get all table names in the database."""
     try:
         cursor = conn.execute(
-            "SELECT name FROM sqlite_master WHERE type='table' ORDER BY name"
         )
         return {row[0] for row in cursor.fetchall()}
     except Exception:
@@ -98,10 +115,18 @@ class StateReconciler:
                 return self._score_task2(conn)
             elif self.task_name == "cascade-migration":
                 return self._score_task3(conn)
             else:
-                return 0.0
         except Exception:
-            return 0.0
     def compute_step_reward(self, conn: sqlite3.Connection) -> Tuple[float, float]:
         """
@@ -173,6 +198,11 @@ class StateReconciler:
     #          order_count=0.2, no_null_ids=0.1, integrity=0.2
     def _score_task2(self, conn: sqlite3.Connection) -> float:
         score = 0.0
         tables = _get_table_names(conn)
@@ -242,6 +272,11 @@ class StateReconciler:
     # Total max = 0.90 for all grader checks + 0.10 integrity = 1.00
     def _score_task3(self, conn: sqlite3.Connection) -> float:
         score = 0.0
         tables = _get_table_names(conn)
@@ -351,3 +386,371 @@ class StateReconciler:
             score = min(score, 0.1)
         return max(0.01, min(0.99, score))

     TASK3_EXPECTED_AUDIT_ENTRIES,
     TASK3_EXPECTED_EMPLOYEE_COUNT,
     TASK3_EXPECTED_SALARIES,
+    TASK4_EXPECTED_ROW_COUNT,
+    TASK4_EXPECTED_ID_SUM,
+    TASK4_EXPECTED_DELETED_COUNT,
+    TASK4_EXPECTED_ACTIVE_COUNT,
+    TASK5_EXPECTED_ROW_COUNT,
+    TASK5_EXPECTED_PRICE_SUM,
+    TASK5_EXPECTED_BOTH_COUNT,
+    TASK6_EXPECTED_SALESPERSON_COUNT,
+    TASK6_EXPECTED_CUSTOMER_COUNT,
+    TASK6_EXPECTED_PRODUCT_COUNT,
+    TASK6_EXPECTED_SALES_COUNT,
+    TASK6_EXPECTED_DATA_ISSUES_COUNT,
+    TASK7_EXPECTED_UNIFIED_CUSTOMERS,
+    TASK7_EXPECTED_BOTH_SOURCE_COUNT,
+    TASK7_EXPECTED_UNIFIED_ORDERS,
+    TASK7_EXPECTED_MIGRATION_ISSUES,
 )
     """Get all table names in the database."""
     try:
         cursor = conn.execute(
+            "SELECT name FROM sqlite_master WHERE type='table' "
+            "AND name NOT LIKE 'sqlite_%' ORDER BY name"
         )
         return {row[0] for row in cursor.fetchall()}
     except Exception:
                 return self._score_task2(conn)
             elif self.task_name == "cascade-migration":
                 return self._score_task3(conn)
+            elif self.task_name == "soft-delete-restoration":
+                return self._score_task4(conn)
+            elif self.task_name == "schema-version-merge":
+                return self._score_task5(conn)
+            elif self.task_name == "multi-entity-extraction":
+                return self._score_task6(conn)
+            elif self.task_name == "dual-source-consolidation":
+                return self._score_task7(conn)
             else:
+                return 0.01
         except Exception:
+            return 0.01
     def compute_step_reward(self, conn: sqlite3.Connection) -> Tuple[float, float]:
         """
     #          order_count=0.2, no_null_ids=0.1, integrity=0.2
     def _score_task2(self, conn: sqlite3.Connection) -> float:
+        # Re-assert FK enforcement to prevent PRAGMA bypass exploit
+        try:
+            conn.execute("PRAGMA foreign_keys = ON")
+        except Exception:
+            pass
         score = 0.0
         tables = _get_table_names(conn)
     # Total max = 0.90 for all grader checks + 0.10 integrity = 1.00
     def _score_task3(self, conn: sqlite3.Connection) -> float:
+        # Re-assert FK enforcement to prevent PRAGMA bypass exploit
+        try:
+            conn.execute("PRAGMA foreign_keys = ON")
+        except Exception:
+            pass
         score = 0.0
         tables = _get_table_names(conn)
             score = min(score, 0.1)
         return max(0.01, min(0.99, score))
+    # =========================================================================
+    # Task 4: Soft-Delete Restoration (Easy)
+    # =========================================================================
+    def _score_task4(self, conn: sqlite3.Connection) -> float:
+        score = 0.0
+        tables = _get_table_names(conn)
+        if "products" not in tables:
+            return 0.01
+        cols = _get_column_names(conn, "products")
+        # is_deleted column exists (+0.15)
+        if "is_deleted" in cols:
+            score += 0.15
+        # deleted_at column exists (+0.10)
+        if "deleted_at" in cols:
+            score += 0.10
+        # Row count = 8 (+0.20)
+        row_count = _get_row_count(conn, "products")
+        if row_count == TASK4_EXPECTED_ROW_COUNT:
+            score += 0.20
+        # Active products: is_deleted=0, deleted_at IS NULL (+0.25)
+        if "is_deleted" in cols:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM products WHERE is_deleted = 0 AND deleted_at IS NULL"
+                )
+                active = cursor.fetchone()[0]
+                if active == TASK4_EXPECTED_ACTIVE_COUNT:
+                    score += 0.25
+            except Exception:
+                pass
+        # Restored products: is_deleted=1, deleted_at IS NOT NULL (+0.20)
+        if "is_deleted" in cols:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM products WHERE is_deleted = 1 AND deleted_at IS NOT NULL"
+                )
+                restored = cursor.fetchone()[0]
+                if restored == TASK4_EXPECTED_DELETED_COUNT:
+                    score += 0.20
+            except Exception:
+                pass
+        # SUM(id) fingerprint = 36 — no phantom rows (+0.10)
+        try:
+            cursor = conn.execute("SELECT SUM(id) FROM products")
+            id_sum = cursor.fetchone()[0]
+            if id_sum == TASK4_EXPECTED_ID_SUM:
+                score += 0.10
+        except Exception:
+            pass
+        # Exploit check
+        if row_count == 0:
+            score = min(score, 0.1)
+        return max(0.01, min(0.99, score))
+    # =========================================================================
+    # Task 5: Schema Version Merge (Medium)
+    # =========================================================================
+    def _score_task5(self, conn: sqlite3.Connection) -> float:
+        # Re-assert FK enforcement
+        try:
+            conn.execute("PRAGMA foreign_keys = ON")
+        except Exception:
+            pass
+        score = 0.0
+        tables = _get_table_names(conn)
+        if "products" not in tables:
+            return 0.01
+        cols = _get_column_names(conn, "products")
+        # Schema completeness: all 8 columns (+0.10)
+        expected_cols = {"id", "name", "price", "category", "supplier", "brand", "sku", "source"}
+        if expected_cols.issubset(cols):
+            score += 0.10
+        # Row count = 9 (+0.15)
+        row_count = _get_row_count(conn, "products")
+        if row_count == TASK5_EXPECTED_ROW_COUNT:
+            score += 0.15
+        # PRICE_SUM fingerprint (+0.20)
+        try:
+            cursor = conn.execute("SELECT ROUND(SUM(price), 2) FROM products")
+            price_sum = cursor.fetchone()[0]
+            if price_sum is not None and abs(price_sum - TASK5_EXPECTED_PRICE_SUM) < 0.02:
+                score += 0.20
+        except Exception:
+            pass
+        # source='both' for conflicted ids 1,2 (+0.15)
+        if "source" in cols:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM products WHERE source = 'both'"
+                )
+                both_count = cursor.fetchone()[0]
+                if both_count == TASK5_EXPECTED_BOTH_COUNT:
+                    score += 0.15
+            except Exception:
+                pass
+        # v2 name wins for conflicted rows (+0.15)
+        try:
+            cursor = conn.execute("SELECT name FROM products WHERE id = 2")
+            row = cursor.fetchone()
+            if row and "Updated" in row[0]:
+                score += 0.15
+        except Exception:
+            pass
+        # No NULL prices (+0.10)
+        try:
+            cursor = conn.execute("SELECT COUNT(*) FROM products WHERE price IS NULL")
+            null_count = cursor.fetchone()[0]
+            if null_count == 0:
+                score += 0.10
+        except Exception:
+            pass
+        # PRAGMA integrity_check (+0.15)
+        try:
+            cursor = conn.execute("PRAGMA integrity_check")
+            result = cursor.fetchone()[0]
+            if result == "ok":
+                score += 0.15
+        except Exception:
+            pass
+        # Exploit check
+        if row_count == 0:
+            score = min(score, 0.1)
+        return max(0.01, min(0.99, score))
+    # =========================================================================
+    # Task 6: Multi-Entity Extraction (Medium — Hard End)
+    # =========================================================================
+    def _score_task6(self, conn: sqlite3.Connection) -> float:
+        # Re-assert FK enforcement
+        try:
+            conn.execute("PRAGMA foreign_keys = ON")
+        except Exception:
+            pass
+        score = 0.0
+        tables = _get_table_names(conn)
+        # All 5 tables exist (+0.10)
+        required = {"salespersons", "customers", "products", "sales", "data_issues"}
+        if required.issubset(tables):
+            score += 0.10
+        # salesperson count = 3 (+0.10)
+        if "salespersons" in tables:
+            count = _get_row_count(conn, "salespersons")
+            if count == TASK6_EXPECTED_SALESPERSON_COUNT:
+                score += 0.10
+        # customer count = 3 (invalid excluded) (+0.12)
+        if "customers" in tables:
+            count = _get_row_count(conn, "customers")
+            if count == TASK6_EXPECTED_CUSTOMER_COUNT:
+                score += 0.12
+        # product count = 5 (+0.10)
+        if "products" in tables:
+            count = _get_row_count(conn, "products")
+            if count == TASK6_EXPECTED_PRODUCT_COUNT:
+                score += 0.10
+        # sales count = 11 (bad row excluded) (+0.12)
+        if "sales" in tables:
+            count = _get_row_count(conn, "sales")
+            if count == TASK6_EXPECTED_SALES_COUNT:
+                score += 0.12
+        # All 3 FKs present in sales (+0.15)
+        if "sales" in tables:
+            fk_count = 0
+            if _has_foreign_key(conn, "sales", "salespersons"): fk_count += 1
+            if _has_foreign_key(conn, "sales", "customers"): fk_count += 1
+            if _has_foreign_key(conn, "sales", "products"): fk_count += 1
+            score += 0.05 * fk_count  # 0.15 total for all 3
+        # data_issues count = 1, for row 6 (+0.11)
+        if "data_issues" in tables:
+            count = _get_row_count(conn, "data_issues")
+            if count == TASK6_EXPECTED_DATA_ISSUES_COUNT:
+                score += 0.11
+        # alice email is trimmed (+0.10)
+        if "salespersons" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT email FROM salespersons WHERE name LIKE '%Alice%'"
+                )
+                row = cursor.fetchone()
+                if row and row[0] == "alice@company.com":
+                    score += 0.10
+            except Exception:
+                pass
+        # PRAGMA integrity_check (+0.10)
+        try:
+            cursor = conn.execute("PRAGMA integrity_check")
+            result = cursor.fetchone()[0]
+            if result == "ok":
+                score += 0.10
+        except Exception:
+            pass
+        # Exploit check
+        sales_count = _get_row_count(conn, "sales") if "sales" in tables else 0
+        if sales_count == 0 and "sales" in tables:
+            score = min(score, 0.1)
+        return max(0.01, min(0.99, score))
+    # =========================================================================
+    # Task 7: Dual-Source Consolidation (Hard)
+    # =========================================================================
+    def _score_task7(self, conn: sqlite3.Connection) -> float:
+        # Re-assert FK enforcement
+        try:
+            conn.execute("PRAGMA foreign_keys = ON")
+        except Exception:
+            pass
+        score = 0.0
+        tables = _get_table_names(conn)
+        # All 4 tables exist (+0.05)
+        required = {"unified_customers", "unified_products", "unified_orders", "migration_issues"}
+        if required.issubset(tables):
+            score += 0.05
+        # unified_customers count = 7 (+0.08)
+        if "unified_customers" in tables:
+            count = _get_row_count(conn, "unified_customers")
+            if count == TASK7_EXPECTED_UNIFIED_CUSTOMERS:
+                score += 0.08
+        # source='both' for email-matched records (+0.08)
+        if "unified_customers" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM unified_customers WHERE source = 'both'"
+                )
+                both = cursor.fetchone()[0]
+                if both == TASK7_EXPECTED_BOTH_SOURCE_COUNT:
+                    score += 0.08
+            except Exception:
+                pass
+        # Legacy amount coercion — check unified_orders has REAL amounts (+0.10)
+        if "unified_orders" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM unified_orders WHERE typeof(amount) = 'real' OR typeof(amount) = 'integer'"
+                )
+                real_count = cursor.fetchone()[0]
+                order_count = _get_row_count(conn, "unified_orders")
+                if real_count == order_count and order_count > 0:
+                    score += 0.10
+            except Exception:
+                pass
+        # NULL currency → 'USD' fill (+0.07)
+        if "unified_orders" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM unified_orders WHERE currency IS NULL"
+                )
+                null_curr = cursor.fetchone()[0]
+                if null_curr == 0:
+                    score += 0.07
+            except Exception:
+                pass
+        # tx_status mapped to strings (+0.10)
+        if "unified_orders" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM unified_orders WHERE typeof(status) = 'text'"
+                )
+                text_count = cursor.fetchone()[0]
+                order_count = _get_row_count(conn, "unified_orders")
+                if text_count == order_count and order_count > 0:
+                    score += 0.10
+            except Exception:
+                pass
+        # subscription_tier mapped to strings (+0.08)
+        if "unified_customers" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM unified_customers WHERE typeof(tier) = 'text'"
+                )
+                text_count = cursor.fetchone()[0]
+                cust_count = _get_row_count(conn, "unified_customers")
+                if text_count == cust_count and cust_count > 0:
+                    score += 0.08
+            except Exception:
+                pass
+        # migration_issues count = 2 (+0.08)
+        if "migration_issues" in tables:
+            count = _get_row_count(conn, "migration_issues")
+            if count == TASK7_EXPECTED_MIGRATION_ISSUES:
+                score += 0.08
+        # Orphaned transaction in issues (+0.07)
+        if "migration_issues" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM migration_issues WHERE issue_type = 'orphaned_record'"
+                )
+                orphan_issues = cursor.fetchone()[0]
+                if orphan_issues >= 1:
+                    score += 0.07
+            except Exception:
+                pass
+        # NULL email customer in issues (+0.07)
+        if "migration_issues" in tables:
+            try:
+                cursor = conn.execute(
+                    "SELECT COUNT(*) FROM migration_issues WHERE issue_type = 'null_email'"
+                )
+                null_issues = cursor.fetchone()[0]
+                if null_issues >= 1:
+                    score += 0.07
+            except Exception:
+                pass
+        # FK integrity on unified_orders (+0.10)
+        if "unified_orders" in tables:
+            if _has_foreign_key(conn, "unified_orders", "unified_customers"):
+                score += 0.10
+        # PRAGMA integrity_check (+0.10)
+        try:
+            cursor = conn.execute("PRAGMA integrity_check")
+            result = cursor.fetchone()[0]
+            if result == "ok":
+                score += 0.10
+        except Exception:
+            pass
+        # Exploit check
+        if "unified_orders" in tables and _get_row_count(conn, "unified_orders") == 0:
+            score = min(score, 0.1)
+        return max(0.01, min(0.99, score))

test_all_tasks.py ADDED Viewed

	@@ -0,0 +1,49 @@

+"""Quick validation of all 7 tasks: seeds + graders."""
+import sqlite3
+import sys
+import os
+sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+from seeds import TASKS
+from server.grader import StateReconciler
+print(f"Tasks registered: {len(TASKS)}")
+assert len(TASKS) == 7, f"Expected 7 tasks, got {len(TASKS)}"
+print(f"  Names: {list(TASKS.keys())}")
+for name, cfg in TASKS.items():
+    # Seed
+    conn = sqlite3.connect(":memory:")
+    conn.execute("PRAGMA foreign_keys = ON")
+    cfg["seed_fn"](conn)
+    cursor = conn.execute(
+        "SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'"
+    )
+    tables = [r[0] for r in cursor.fetchall()]
+    print(f"\n[{name}] ({cfg['difficulty']}, max_steps={cfg.get('max_steps', 20)})")
+    print(f"  Tables: {tables}")
+    # Grade
+    reconciler = StateReconciler(name)
+    score = reconciler.score(conn)
+    assert 0.01 <= score <= 0.99, f"Score {score} out of [0.01, 0.99]!"
+    print(f"  Initial score: {score:.2f} OK")
+    conn.close()
+# Also test environment resets for each task
+from server.environment import DbMigrationEnvironment
+for name in TASKS:
+    env = DbMigrationEnvironment(task_name=name)
+    obs = env.reset()
+    assert obs.done == False
+    assert obs.step_number == 0
+    print(f"  [{name}] Environment reset OK")
+    env.close()
+print("\n" + "=" * 50)
+print("ALL 7 TASKS VALIDATED SUCCESSFULLY!")
+print("=" * 50)