Spaces:
Configuration error
Configuration error
| # DB Schema Migration β OpenEnv Environment | |
| [](https://huggingface.co/spaces?tag=openenv) | |
| ## Motivation | |
| Database schema migrations are one of the most error-prone, high-stakes tasks in software engineering. A wrong migration can corrupt production data, break foreign key constraints, or leave systems in inconsistent states. Yet there is no RL benchmark that trains agents to perform this task. | |
| This environment simulates a real engineering workflow: an agent receives a legacy schema riddled with bad names, wrong types, and no normalization β and must transform it into a clean, production-grade design through a sequence of atomic migration operations. The environment is deterministic, programmatically gradeable, and scales from trivial renaming tasks to full third-normal-form (3NF) normalization. | |
| **Who benefits:** Teams building AI coding assistants, database tools, or automated refactoring systems can use this benchmark to evaluate and train models on structured reasoning over relational data. | |
| --- | |
| ## Action Space | |
| Actions are Pydantic models sent as JSON to `POST /step`. | |
| | Field | Type | Description | | |
| |---|---|---| | |
| | `operation` | enum | One of: `rename_table`, `rename_column`, `add_column`, `drop_column`, `change_type`, `add_foreign_key`, `normalize_table`, `done` | | |
| | `table` | str | Current name of the target table | | |
| | `column` | str? | Current column name (for column ops) | | |
| | `new_name` | str? | New name (for rename/normalize ops) | | |
| | `data_type` | str? | SQL type string (e.g. `DECIMAL(10,2)`, `TIMESTAMP`) | | |
| | `reference_table` | str? | For FK: referenced table name | | |
| | `reference_column` | str? | For FK: referenced column name | | |
| | `reason` | str? | Free-text explanation (ignored by grader, useful for chain-of-thought) | | |
| **Example:** | |
| ```json | |
| { | |
| "operation": "rename_column", | |
| "table": "tbl_usr", | |
| "column": "usr_nm", | |
| "new_name": "username", | |
| "reason": "usr_nm is an abbreviation; username is the clean convention" | |
| } | |
| ``` | |
| --- | |
| ## Observation Space | |
| Observations are returned from `POST /reset` and `POST /step`. | |
| | Field | Type | Description | | |
| |---|---|---| | |
| | `current_schema` | List[TableInfo] | All tables with columns, types, PKs, FKs | | |
| | `target_requirements` | List[str] | Natural language list of what must be achieved | | |
| | `steps_taken` | List[dict] | Last 10 actions with rewards | | |
| | `violations` | List[str] | Current FK or integrity violations | | |
| | `hints` | List[str] | Optional hints for the current task | | |
| | `step_count` | int | Steps used so far | | |
| | `max_steps` | int | Steps allowed before forced termination | | |
| --- | |
| ## Tasks | |
| ### Easy β Column Renaming | |
| **Difficulty:** Easy | **Max steps:** 10 | |
| A single `tbl_usr` table with aggressively abbreviated column names (`usr_id`, `usr_nm`, `usr_eml`, `dt_crt`, `stat_cd`). The agent must rename the table to `users` and all 5 columns to human-readable names following snake_case convention. | |
| All required operations are `rename_table` or `rename_column`. An optimal agent completes this in exactly 6 steps. | |
| **Grader:** Exact match on table name and all column names. Efficiency penalty (-0.02/step) for steps beyond the minimum 6. | |
| --- | |
| ### Medium β Multi-table Fix | |
| **Difficulty:** Medium | **Max steps:** 20 | |
| Two tables (`order_tbl`, `prod_tbl`) with abbreviated names, wrong data types (money stored as `VARCHAR`, dates as `VARCHAR`), and missing columns (`stock_quantity`, `status`). Agent must rename both tables and all columns, fix types to `DECIMAL(10,2)` and `TIMESTAMP`, and add the missing columns. | |
| **Grader:** Partial credit per table. Full score requires all 14 requirements satisfied. Efficiency penalty for steps > 14. | |
| --- | |
| ### Hard β Full 3NF Normalization | |
| **Difficulty:** Hard | **Max steps:** 30 | |
| A single `everything` table with 9 columns mixing customer, product, and order data (classic 1NF violation). Agent must: | |
| 1. Extract 4 normalized tables: `customers`, `products`, `orders`, `order_items` | |
| 2. Set correct column types throughout | |
| 3. Establish 3 foreign key relationships | |
| 4. Remove the original `everything` table | |
| **Grader:** Weighted β 25% tables present, 50% correct columns/types, 25% correct foreign keys. Penalty if `everything` still exists at done. | |
| --- | |
| ## Reward Function | |
| | Event | Reward | | |
| |---|---| | |
| | Correct rename/type/add operation | +0.10 to +0.15 | | |
| | Correct normalize_table extraction | +0.20 | | |
| | Correct foreign key | +0.15 | | |
| | Wrong/unexpected operation | -0.05 | | |
| | Referencing non-existent table | -0.10 to -0.15 | | |
| | Dropping a primary key | -0.20 | | |
| | Exceeding max steps | -0.20 | | |
| | `done` action | Final grader score (0.0 β 1.0) | | |
| Rewards are dense: the agent receives a signal after every step, not just at the end. | |
| --- | |
| ## Setup & Running | |
| ### 1. Build the Docker image | |
| ```bash | |
| docker build -t db-schema-migration . | |
| docker run -p 7860:7860 db-schema-migration | |
| ``` | |
| Verify it's live: | |
| ```bash | |
| curl -X POST http://localhost:7860/reset -H "Content-Type: application/json" -d '{"task":"easy"}' | |
| ``` | |
| ### 2. Environment variables for inference.py | |
| ```bash | |
| export ENV_URL=http://localhost:7860 # or your HF Space URL | |
| export API_BASE_URL=https://api-inference.huggingface.co/v1 | |
| export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct | |
| export HF_TOKEN=hf_your_token_here | |
| export TASK=easy # easy | medium | hard | |
| ``` | |
| ### 3. Run the inference script | |
| ```bash | |
| python inference.py easy | |
| python inference.py medium | |
| python inference.py hard | |
| ``` | |
| ### 4. Run the OpenEnv validation | |
| ```bash | |
| openenv validate | |
| ``` | |
| --- | |
| ## Baseline Scores | |
| Scores from `meta-llama/Llama-3.1-8B-Instruct` via HF inference: | |
| | Task | Score | Steps | Notes | | |
| |---|---|---|---| | |
| | Easy | 0.92 | 7 | Near-optimal, one extra step | | |
| | Medium | 0.74 | 18 | Misses one type change | | |
| | Hard | 0.48 | 28 | Tables correct, FKs partially missed | | |
| --- | |
| ## API Reference | |
| | Endpoint | Method | Body | Returns | | |
| |---|---|---|---| | |
| | `/reset` | POST | `{"task": "easy"}` | `ResetResult` | | |
| | `/step` | POST | `Action` JSON | `StepResult` | | |
| | `/state` | GET | β | `StateResult` | | |
| | `/tasks` | GET | β | Task descriptions | | |
| | `/health` | GET | β | `{"status": "ok"}` | | |
| --- | |
| ## Project Structure | |
| ``` | |
| . | |
| βββ server/ | |
| β βββ __init__.py | |
| β βββ main.py # FastAPI app | |
| β βββ env.py # Environment logic + graders | |
| β βββ schemas.py # Pydantic models | |
| βββ Dockerfile | |
| βββ requirements.txt | |
| βββ openenv.yaml | |
| βββ inference.py | |
| βββ README.md | |
| ``` | |