# OpenEnv Challenge — Deliverables & Status ## Competition **OpenEnv Challenge: SOTA Environments to drive general intelligence** Sponsors: PyTorch team at Meta, HuggingFace, Unsloth Prizes: - $10K in HuggingFace credits - Invitation to publish on PyTorch.org blog ## Judging Criteria Evaluated primarily on the submission blog. Judging panel grades on: 1. Creative and robust use of OpenEnv 2. Technical excellence 3. Storytelling 4. Open-source demo 5. Green Agent wrapper for the environment ## Required Deliverables ### 1. HuggingFace Space Environment on the HF Hub. Judges interact with the action space (DESCRIBE, SAMPLE, QUERY, ANSWER) against real Spider databases. Live at: https://huggingface.co/spaces/hjerpe/sql_env Docker image: `registry.hf.space/hjerpe-sql_env:latest` Published via `uv run openenv push` on 2026-03-29 (see `specs/F007-DEMO.md`). **Status:** Live. Endpoints `/health`, `/docs`, `/web`, `/reset`, `/step`, `/ws` exposed by the FastAPI server in `envs/sql_env/server/`. Python client: `SQLEnv(base_url="https://hjerpe-sql-env.hf.space")`. ### 2. Training notebooks/scripts (GitHub) Colab-ready notebooks: - `notebooks/train_grpo.ipynb` — Full SFT + GRPO pipeline, Colab L4, ~7h - `notebooks/compare_methods.ipynb` — Base vs GRPO evaluation (zero-shot, 1-shot, 3-shot, GRPO v1, v2) - `notebooks/showcase_sqlenv.ipynb` — Interactive environment demo with Random and Oracle baselines **Status:** Complete ### 3. Blog post (HuggingFace) Analyst exploration framing, reward architecture with theory, training results (0% to ~30%), failure analysis, lessons learned. Draft: `docs/blog-post-v1.md` **Status:** Draft v1 complete, not yet published ## Additional Deliverables ### 4. GitHub repo Clean codebase: zero ruff errors, typed Pydantic models, 280 passing tests, architecture docs, training artifacts. **Status:** Complete (F016 quality sweep done) ### 5. Trained checkpoints (HuggingFace Hub) - `hjerpe/sqlenv-qwen3-0.6b-grpo` (v1) - `hjerpe/sqlenv-qwen3-0.6b-grpo-v2` (v2) **Status:** Uploaded ### 6. Green Agent wrapper OpenEnv evaluation wrapper pattern. A `Policy` protocol with `evaluate(env, policy, n_episodes, seed)` that reports success rate, average reward, and average steps. Includes `RandomPolicy` and `OraclePolicy` baselines for standardized comparison. Implementation: `evaluation/policies.py`, `evaluation/oracle_policy.py` Tests: `tests/test_evaluation.py` (17 tests, all passing) Used by: `notebooks/showcase_sqlenv.ipynb`, `notebooks/compare_methods.ipynb` **Status:** Complete ### 7. TRL `environment_factory` adapter HuggingFace TRL's native OpenEnv integration: pass a class with `reset()` + named tool methods as `environment_factory=` and `GRPOTrainer` runs the multi-turn tool-calling loop automatically (no custom `rollout_func`). Implementation: `training/trl_adapter.py` — class `SQLEnvTRL` exposing `describe()`, `sample()`, `query()`, `answer()` as tool methods plus `sql_env_reward_func`. Used by `notebooks/train_grpo.ipynb` (cell 16: `environment_factory=SQLEnvTRL`). Note: the adapter instantiates a **local** in-process `SQLEnvironment`, not a WebSocket client to the hosted HF Space. Intentional — training needs N parallel sessions (one per generation), and local is faster and avoids the Space's default 1-session concurrency limit. **Status:** Complete ## Our Position No interactive SQL exploration environment exists. SQL Repair (WALKMAN303) is single-turn fix-it. Calendar Gym (Turing) is real-world but not SQL. We are the only multi-turn strategy-discovery environment for database exploration. Key narrative: "The environment is the product." The trained agent demonstrates that the environment works, but the contribution is the action space, reward architecture, and episode structure. ## Open Items - [x] Deploy HuggingFace Space (live at https://huggingface.co/spaces/hjerpe/sql_env, 2026-03-29) - [ ] Publish blog post on HuggingFace (planned 2026-04-12) - [ ] Final review of blog-post-v1.md - [ ] Verify notebooks run clean on fresh Colab - [ ] Post-launch: enable `SUPPORTS_CONCURRENT_SESSIONS=True` + `max_concurrent_envs=64` on the Space for external users who want to retrain against the hosted endpoint ## Resources - OpenEnv tutorial: https://colab.research.google.com/github/meta-pytorch/OpenEnv/blob/main/examples/OpenEnv_Tutorial.ipynb - OpenEnv GitHub: https://github.com/meta-pytorch/OpenEnv - OpenEnv docs: https://meta-pytorch.org/OpenEnv/ - Environment hub: https://huggingface.co/openenv - Discord: https://discord.com/invite/YsTYBh6PD9