sql_env / docs /competition-deliverables.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified
|
Raw
History Blame Contribute Delete
4.59 kB
# OpenEnv Challenge β€” Deliverables & Status
## Competition
**OpenEnv Challenge: SOTA Environments to drive general intelligence**
Sponsors: PyTorch team at Meta, HuggingFace, Unsloth
Prizes:
- $10K in HuggingFace credits
- Invitation to publish on PyTorch.org blog
## Judging Criteria
Evaluated primarily on the submission blog. Judging panel grades on:
1. Creative and robust use of OpenEnv
2. Technical excellence
3. Storytelling
4. Open-source demo
5. Green Agent wrapper for the environment
## Required Deliverables
### 1. HuggingFace Space
Environment on the HF Hub. Judges interact with the action space
(DESCRIBE, SAMPLE, QUERY, ANSWER) against real Spider databases.
Live at: https://huggingface.co/spaces/hjerpe/sql_env
Docker image: `registry.hf.space/hjerpe-sql_env:latest`
Published via `uv run openenv push` on 2026-03-29 (see `specs/F007-DEMO.md`).
**Status:** Live. Endpoints `/health`, `/docs`, `/web`, `/reset`, `/step`, `/ws`
exposed by the FastAPI server in `envs/sql_env/server/`. Python client:
`SQLEnv(base_url="https://hjerpe-sql-env.hf.space")`.
### 2. Training notebooks/scripts (GitHub)
Colab-ready notebooks:
- `notebooks/train_grpo.ipynb` β€” Full SFT + GRPO pipeline, Colab L4, ~7h
- `notebooks/compare_methods.ipynb` β€” Base vs GRPO evaluation (zero-shot, 1-shot, 3-shot, GRPO v1, v2)
- `notebooks/showcase_sqlenv.ipynb` β€” Interactive environment demo with Random and Oracle baselines
**Status:** Complete
### 3. Blog post (HuggingFace)
Analyst exploration framing, reward architecture with theory,
training results (0% to ~30%), failure analysis, lessons learned.
Draft: `docs/blog-post-v1.md`
**Status:** Draft v1 complete, not yet published
## Additional Deliverables
### 4. GitHub repo
Clean codebase: zero ruff errors, typed Pydantic models, 280 passing
tests, architecture docs, training artifacts.
**Status:** Complete (F016 quality sweep done)
### 5. Trained checkpoints (HuggingFace Hub)
- `hjerpe/sqlenv-qwen3-0.6b-grpo` (v1)
- `hjerpe/sqlenv-qwen3-0.6b-grpo-v2` (v2)
**Status:** Uploaded
### 6. Green Agent wrapper
OpenEnv evaluation wrapper pattern. A `Policy` protocol with
`evaluate(env, policy, n_episodes, seed)` that reports success rate,
average reward, and average steps. Includes `RandomPolicy` and
`OraclePolicy` baselines for standardized comparison.
Implementation: `evaluation/policies.py`, `evaluation/oracle_policy.py`
Tests: `tests/test_evaluation.py` (17 tests, all passing)
Used by: `notebooks/showcase_sqlenv.ipynb`, `notebooks/compare_methods.ipynb`
**Status:** Complete
### 7. TRL `environment_factory` adapter
HuggingFace TRL's native OpenEnv integration: pass a class with
`reset()` + named tool methods as `environment_factory=` and `GRPOTrainer`
runs the multi-turn tool-calling loop automatically (no custom
`rollout_func`).
Implementation: `training/trl_adapter.py` β€” class `SQLEnvTRL` exposing
`describe()`, `sample()`, `query()`, `answer()` as tool methods plus
`sql_env_reward_func`. Used by `notebooks/train_grpo.ipynb` (cell 16:
`environment_factory=SQLEnvTRL`).
Note: the adapter instantiates a **local** in-process `SQLEnvironment`,
not a WebSocket client to the hosted HF Space. Intentional β€” training
needs N parallel sessions (one per generation), and local is faster and
avoids the Space's default 1-session concurrency limit.
**Status:** Complete
## Our Position
No interactive SQL exploration environment exists. SQL Repair
(WALKMAN303) is single-turn fix-it. Calendar Gym (Turing) is
real-world but not SQL. We are the only multi-turn
strategy-discovery environment for database exploration.
Key narrative: "The environment is the product." The trained agent
demonstrates that the environment works, but the contribution is
the action space, reward architecture, and episode structure.
## Open Items
- [x] Deploy HuggingFace Space (live at https://huggingface.co/spaces/hjerpe/sql_env, 2026-03-29)
- [ ] Publish blog post on HuggingFace (planned 2026-04-12)
- [ ] Final review of blog-post-v1.md
- [ ] Verify notebooks run clean on fresh Colab
- [ ] Post-launch: enable `SUPPORTS_CONCURRENT_SESSIONS=True` + `max_concurrent_envs=64` on the Space for external users who want to retrain against the hosted endpoint
## Resources
- OpenEnv tutorial: https://colab.research.google.com/github/meta-pytorch/OpenEnv/blob/main/examples/OpenEnv_Tutorial.ipynb
- OpenEnv GitHub: https://github.com/meta-pytorch/OpenEnv
- OpenEnv docs: https://meta-pytorch.org/OpenEnv/
- Environment hub: https://huggingface.co/openenv
- Discord: https://discord.com/invite/YsTYBh6PD9