| Metadata-Version: 2.4 |
| Name: sql-data-engineer-env |
| Version: 0.1.0 |
| Summary: A real-world SQL data engineering environment for agent evaluation. |
| Requires-Python: >=3.10 |
| Description-Content-Type: text/markdown |
| Requires-Dist: openenv-core |
| Requires-Dist: fastapi |
| Requires-Dist: uvicorn |
| Requires-Dist: pydantic |
| Requires-Dist: openai |
| Requires-Dist: pandas |
|
|
| --- |
| title: SQL Data Engineer Environment |
| emoji: 🗄️ |
| colorFrom: blue |
| colorTo: indigo |
| sdk: docker |
| pinned: false |
| app_port: 7860 |
| --- |
| |
|
|
| Welcome to the **SQL Data Engineer Environment**—a robust, fully-compliant baseline environment built for the Meta OpenEnv Hackathon. This project tests an AI agent's ability to natively interact with a live SQL engine to perform pragmatic data extraction, cleansing, and complex schema normalization tasks. |
|
|
| |
|
|
| |
| The industry needs reliable agents that can act as backend developers, DB administrators, and data engineers. While many environments focus on web browsing or gaming, manipulating relational databases is a high-value, real-world task. This environment simulates authentic obstacles developers face: |
| - Analyzing undocumented database schemas. |
| - Cleansing noisy string data into strict scalar types. |
| - Restructuring and normalizing flat tables into relational architectures while rigorously preserving foreign-key constraints. |
|
|
| It presents an excellent metric to gauge an LLM's structured reasoning and precise SQL generation capabilities. |
|
|
| --- |
| |
| |
|
|
| The environment adheres strictly to the OpenEnv Pydantic specification, enabling seamless API integration. |
|
|
| |
| The observation space is tailored to provide dense context while remaining token-efficient: |
| - `goal` (string): The explicit task prompt/requirement dictating what the agent must achieve. |
| - `schema_dump` (string | null): The current DDL representing all tables and views in the DB (schema definition). Sent back only when the schema dynamically changes or the state is stable. |
| - `result` (string): The standard output of the previously executed query (capped to 10 rows for SELECTs) or a clear `rowcount` confirmation for INSERTs/UPDATEs. |
| - `last_action_error` (boolean): Flag indicating if the previous SQL Action threw a syntax or logic engine error. |
| - `step` (integer): The current episode step tally. |
|
|
| |
| - `action_str` (string): The agent must return a JSON dictionary containing a single, syntactically correct SQLite query to be executed against the backend state. |
|
|
| --- |
| |
| |
|
|
| Each episode challenges the agent with one of 3 tasks featuring deterministic OpenEnv graders scoring between `0.0` and `1.0`. |
|
|
| |
| - **Goal**: Read a `customers` table, filter out metrics > 1000.0, and construct a targeted `high_value_customers` SQL View. |
| - **Difficulty**: Easy. Tests basic SELECT syntax and DDL proficiency. |
| - **Grader**: Validates if the correct view exists in `sqlite_master`, assigns `0.5` points. Exact row and content matching grants the remaining `+0.5`. |
|
|
| |
| - **Goal**: Coerce a messy `products` table. The agent must standardize categorical string sizes (e.g., converting 'ELEC' to 'ELECTRONICS') and extract numeric floats from dirty string pricing (e.g., '$85.00' -> `85.0`) into a new generated float column. |
| - **Difficulty**: Medium. Tests native string pattern matching and targeted UPDATE pipelines. |
| - **Grader**: Adding the column yields `0.3` points. Correct categorical string mapping grants up to `0.3`, and correctly extracted float prices yield `0.4` respectively. |
|
|
| |
| - **Goal**: Normalize a completely flat `hospital_records` repository into a structured 3-table format (`patients`, `doctors`, `appointments`). Data must be completely migrated and bound by Primary/Foreign key constraints. |
| - **Difficulty**: Hard. Tests multi-step schema architectural reasoning and safe data-migration pipelines. |
| - **Grader**: Validating table signatures issues `0.1` per table. Proper data counts yield `0.1` each, and if a relational JOIN across the new DB perfectly rebuilds the original flat map, the final `0.3` is awarded. |
|
|
| |
| Scores are completely dense over the episode lifecycle. |
| At `step(action)`, the grader executes. The mathematical reward signal is continuous: |
| `Reward = (Current_Score - Previous_Score)`. |
| *Note: A `-0.05` penalty is actively applied when `last_action_error` triggers, strongly discouraging hallucinated or malformed SQL loops.* |
|
|
| --- |
| |
| |
|
|
| To validate the OpenEnv schema, install the framework, and run the OpenAI-compatible baseline script locally: |
|
|
| ```bash |
| |
| git clone <your-repo-url> |
| cd OpenEnv-SQL-Data-Engineer |
|
|
| |
| python -m venv venv |
| source venv/bin/activate |
|
|
| |
| pip install openenv openenv-core openai pydantic fastapi uvicorn requests |
|
|
| |
| openenv validate |
|
|
| |
| export OPENAI_API_KEY="your-api-key" |
| export MODEL_NAME="gpt-4o" |
| python inference.py |
| ``` |
|
|
| --- |
| |
| |
|
|
| |
| The environment provides a native Hugging Face structured `Dockerfile` configured to launch on port 7860 as an unprivileged user. |
|
|
| ```bash |
| docker build -t openenv-sql . |
| docker run -p 7860:7860 openenv-sql |
| ``` |
|
|
| |
| To finalize your Hackathon deployment and spin up the live inference API: |
| 1. First, create a new minimal **Docker Space** inside Hugging Face. |
| 2. Ensure you add `HF_TOKEN` globally inside your HF Space Repository secrets. |
| 3. Push this directory to the Space via git: |
| ```bash |
| git remote add space https://huggingface.co/spaces/<your-username>/<your-space-name> |
| git push space main |
| ``` |
| 4. The environment URL will naturally respond to ping checks and `/reset` on HF endpoints. |
|
|
|
|
| |
| - **Easy Task:** 1.0 (Passed) |
| - **Medium Task:** 0.62 (Partial Success - struggled with complex string casting) |
| - **Hard Task:** 0.2 (Challenging - requires higher reasoning/longer context) |
|
|