| # OpenEnv Challenge β Deliverables & Status |
|
|
| ## Competition |
|
|
| **OpenEnv Challenge: SOTA Environments to drive general intelligence** |
|
|
| Sponsors: PyTorch team at Meta, HuggingFace, Unsloth |
|
|
| Prizes: |
| - $10K in HuggingFace credits |
| - Invitation to publish on PyTorch.org blog |
|
|
| ## Judging Criteria |
|
|
| Evaluated primarily on the submission blog. Judging panel grades on: |
|
|
| 1. Creative and robust use of OpenEnv |
| 2. Technical excellence |
| 3. Storytelling |
| 4. Open-source demo |
| 5. Green Agent wrapper for the environment |
|
|
| ## Required Deliverables |
|
|
| ### 1. HuggingFace Space |
|
|
| Environment on the HF Hub. Judges interact with the action space |
| (DESCRIBE, SAMPLE, QUERY, ANSWER) against real Spider databases. |
|
|
| Live at: https://huggingface.co/spaces/hjerpe/sql_env |
| Docker image: `registry.hf.space/hjerpe-sql_env:latest` |
| Published via `uv run openenv push` on 2026-03-29 (see `specs/F007-DEMO.md`). |
|
|
| **Status:** Live. Endpoints `/health`, `/docs`, `/web`, `/reset`, `/step`, `/ws` |
| exposed by the FastAPI server in `envs/sql_env/server/`. Python client: |
| `SQLEnv(base_url="https://hjerpe-sql-env.hf.space")`. |
|
|
| ### 2. Training notebooks/scripts (GitHub) |
|
|
| Colab-ready notebooks: |
| - `notebooks/train_grpo.ipynb` β Full SFT + GRPO pipeline, Colab L4, ~7h |
| - `notebooks/compare_methods.ipynb` β Base vs GRPO evaluation (zero-shot, 1-shot, 3-shot, GRPO v1, v2) |
| - `notebooks/showcase_sqlenv.ipynb` β Interactive environment demo with Random and Oracle baselines |
|
|
| **Status:** Complete |
|
|
| ### 3. Blog post (HuggingFace) |
|
|
| Analyst exploration framing, reward architecture with theory, |
| training results (0% to ~30%), failure analysis, lessons learned. |
|
|
| Draft: `docs/blog-post-v1.md` |
|
|
| **Status:** Draft v1 complete, not yet published |
|
|
| ## Additional Deliverables |
|
|
| ### 4. GitHub repo |
|
|
| Clean codebase: zero ruff errors, typed Pydantic models, 280 passing |
| tests, architecture docs, training artifacts. |
|
|
| **Status:** Complete (F016 quality sweep done) |
|
|
| ### 5. Trained checkpoints (HuggingFace Hub) |
|
|
| - `hjerpe/sqlenv-qwen3-0.6b-grpo` (v1) |
| - `hjerpe/sqlenv-qwen3-0.6b-grpo-v2` (v2) |
|
|
| **Status:** Uploaded |
|
|
| ### 6. Green Agent wrapper |
|
|
| OpenEnv evaluation wrapper pattern. A `Policy` protocol with |
| `evaluate(env, policy, n_episodes, seed)` that reports success rate, |
| average reward, and average steps. Includes `RandomPolicy` and |
| `OraclePolicy` baselines for standardized comparison. |
|
|
| Implementation: `evaluation/policies.py`, `evaluation/oracle_policy.py` |
| Tests: `tests/test_evaluation.py` (17 tests, all passing) |
| Used by: `notebooks/showcase_sqlenv.ipynb`, `notebooks/compare_methods.ipynb` |
|
|
| **Status:** Complete |
|
|
| ### 7. TRL `environment_factory` adapter |
| |
| HuggingFace TRL's native OpenEnv integration: pass a class with |
| `reset()` + named tool methods as `environment_factory=` and `GRPOTrainer` |
| runs the multi-turn tool-calling loop automatically (no custom |
| `rollout_func`). |
|
|
| Implementation: `training/trl_adapter.py` β class `SQLEnvTRL` exposing |
| `describe()`, `sample()`, `query()`, `answer()` as tool methods plus |
| `sql_env_reward_func`. Used by `notebooks/train_grpo.ipynb` (cell 16: |
| `environment_factory=SQLEnvTRL`). |
|
|
| Note: the adapter instantiates a **local** in-process `SQLEnvironment`, |
| not a WebSocket client to the hosted HF Space. Intentional β training |
| needs N parallel sessions (one per generation), and local is faster and |
| avoids the Space's default 1-session concurrency limit. |
|
|
| **Status:** Complete |
|
|
| ## Our Position |
|
|
| No interactive SQL exploration environment exists. SQL Repair |
| (WALKMAN303) is single-turn fix-it. Calendar Gym (Turing) is |
| real-world but not SQL. We are the only multi-turn |
| strategy-discovery environment for database exploration. |
|
|
| Key narrative: "The environment is the product." The trained agent |
| demonstrates that the environment works, but the contribution is |
| the action space, reward architecture, and episode structure. |
|
|
| ## Open Items |
|
|
| - [x] Deploy HuggingFace Space (live at https://huggingface.co/spaces/hjerpe/sql_env, 2026-03-29) |
| - [ ] Publish blog post on HuggingFace (planned 2026-04-12) |
| - [ ] Final review of blog-post-v1.md |
| - [ ] Verify notebooks run clean on fresh Colab |
| - [ ] Post-launch: enable `SUPPORTS_CONCURRENT_SESSIONS=True` + `max_concurrent_envs=64` on the Space for external users who want to retrain against the hosted endpoint |
| |
| ## Resources |
| |
| - OpenEnv tutorial: https://colab.research.google.com/github/meta-pytorch/OpenEnv/blob/main/examples/OpenEnv_Tutorial.ipynb |
| - OpenEnv GitHub: https://github.com/meta-pytorch/OpenEnv |
| - OpenEnv docs: https://meta-pytorch.org/OpenEnv/ |
| - Environment hub: https://huggingface.co/openenv |
| - Discord: https://discord.com/invite/YsTYBh6PD9 |
|
|