Spaces:

hjerpe
/

sql_env

Sleeping

App Files Files Community

sql_env / docs /competition-deliverables.md

hjerpe

Upload folder using huggingface_hub

9e64e71 verified 3 months ago

preview code

Raw

History Blame Contribute Delete

4.59 kB

	# OpenEnv Challenge — Deliverables & Status

	## Competition

	OpenEnv Challenge: SOTA Environments to drive general intelligence

	Sponsors: PyTorch team at Meta, HuggingFace, Unsloth

	Prizes:
	- $10K in HuggingFace credits
	- Invitation to publish on PyTorch.org blog

	## Judging Criteria

	Evaluated primarily on the submission blog. Judging panel grades on:

	1. Creative and robust use of OpenEnv
	2. Technical excellence
	3. Storytelling
	4. Open-source demo
	5. Green Agent wrapper for the environment

	## Required Deliverables

	### 1. HuggingFace Space

	Environment on the HF Hub. Judges interact with the action space
	(DESCRIBE, SAMPLE, QUERY, ANSWER) against real Spider databases.

	Live at: https://huggingface.co/spaces/hjerpe/sql_env
	Docker image: `registry.hf.space/hjerpe-sql_env:latest`
	Published via `uv run openenv push` on 2026-03-29 (see `specs/F007-DEMO.md`).

	Status: Live. Endpoints `/health`, `/docs`, `/web`, `/reset`, `/step`, `/ws`
	exposed by the FastAPI server in `envs/sql_env/server/`. Python client:
	`SQLEnv(base_url="https://hjerpe-sql-env.hf.space")`.

	### 2. Training notebooks/scripts (GitHub)

	Colab-ready notebooks:
	- `notebooks/train_grpo.ipynb` — Full SFT + GRPO pipeline, Colab L4, ~7h
	- `notebooks/compare_methods.ipynb` — Base vs GRPO evaluation (zero-shot, 1-shot, 3-shot, GRPO v1, v2)
	- `notebooks/showcase_sqlenv.ipynb` — Interactive environment demo with Random and Oracle baselines

	Status: Complete

	### 3. Blog post (HuggingFace)

	Analyst exploration framing, reward architecture with theory,
	training results (0% to ~30%), failure analysis, lessons learned.

	Draft: `docs/blog-post-v1.md`

	Status: Draft v1 complete, not yet published

	## Additional Deliverables

	### 4. GitHub repo

	Clean codebase: zero ruff errors, typed Pydantic models, 280 passing
	tests, architecture docs, training artifacts.

	Status: Complete (F016 quality sweep done)

	### 5. Trained checkpoints (HuggingFace Hub)

	- `hjerpe/sqlenv-qwen3-0.6b-grpo` (v1)
	- `hjerpe/sqlenv-qwen3-0.6b-grpo-v2` (v2)

	Status: Uploaded

	### 6. Green Agent wrapper

	OpenEnv evaluation wrapper pattern. A `Policy` protocol with
	`evaluate(env, policy, n_episodes, seed)` that reports success rate,
	average reward, and average steps. Includes `RandomPolicy` and
	`OraclePolicy` baselines for standardized comparison.

	Implementation: `evaluation/policies.py`, `evaluation/oracle_policy.py`
	Tests: `tests/test_evaluation.py` (17 tests, all passing)
	Used by: `notebooks/showcase_sqlenv.ipynb`, `notebooks/compare_methods.ipynb`

	Status: Complete

	### 7. TRL `environment_factory` adapter

	HuggingFace TRL's native OpenEnv integration: pass a class with
	`reset()` + named tool methods as `environment_factory=` and `GRPOTrainer`
	runs the multi-turn tool-calling loop automatically (no custom
	`rollout_func`).

	Implementation: `training/trl_adapter.py` — class `SQLEnvTRL` exposing
	`describe()`, `sample()`, `query()`, `answer()` as tool methods plus
	`sql_env_reward_func`. Used by `notebooks/train_grpo.ipynb` (cell 16:
	`environment_factory=SQLEnvTRL`).

	Note: the adapter instantiates a local in-process `SQLEnvironment`,
	not a WebSocket client to the hosted HF Space. Intentional — training
	needs N parallel sessions (one per generation), and local is faster and
	avoids the Space's default 1-session concurrency limit.

	Status: Complete

	## Our Position

	No interactive SQL exploration environment exists. SQL Repair
	(WALKMAN303) is single-turn fix-it. Calendar Gym (Turing) is
	real-world but not SQL. We are the only multi-turn
	strategy-discovery environment for database exploration.

	Key narrative: "The environment is the product." The trained agent
	demonstrates that the environment works, but the contribution is
	the action space, reward architecture, and episode structure.

	## Open Items

	- [x] Deploy HuggingFace Space (live at https://huggingface.co/spaces/hjerpe/sql_env, 2026-03-29)
	- [ ] Publish blog post on HuggingFace (planned 2026-04-12)
	- [ ] Final review of blog-post-v1.md
	- [ ] Verify notebooks run clean on fresh Colab
	- [ ] Post-launch: enable `SUPPORTS_CONCURRENT_SESSIONS=True` + `max_concurrent_envs=64` on the Space for external users who want to retrain against the hosted endpoint

	## Resources

	- OpenEnv tutorial: https://colab.research.google.com/github/meta-pytorch/OpenEnv/blob/main/examples/OpenEnv_Tutorial.ipynb
	- OpenEnv GitHub: https://github.com/meta-pytorch/OpenEnv
	- OpenEnv docs: https://meta-pytorch.org/OpenEnv/
	- Environment hub: https://huggingface.co/openenv
	- Discord: https://discord.com/invite/YsTYBh6PD9