# OpenEnv Challenge — Deliverables & Status

## Competition

**OpenEnv Challenge: SOTA Environments to drive general intelligence**

Sponsors: PyTorch team at Meta, HuggingFace, Unsloth

Prizes:
- $10K in HuggingFace credits
- Invitation to publish on PyTorch.org blog

## Judging Criteria

Evaluated primarily on the submission blog. Judging panel grades on:

1. Creative and robust use of OpenEnv
2. Technical excellence
3. Storytelling
4. Open-source demo
5. Green Agent wrapper for the environment

## Required Deliverables

### 1. HuggingFace Space

Environment on the HF Hub. Judges interact with the action space
(DESCRIBE, SAMPLE, QUERY, ANSWER) against real Spider databases.

Live at: https://huggingface.co/spaces/hjerpe/sql_env
Docker image: `registry.hf.space/hjerpe-sql_env:latest`
Published via `uv run openenv push` on 2026-03-29 (see `specs/F007-DEMO.md`).

**Status:** Live. Endpoints `/health`, `/docs`, `/web`, `/reset`, `/step`, `/ws`
exposed by the FastAPI server in `envs/sql_env/server/`. Python client:
`SQLEnv(base_url="https://hjerpe-sql-env.hf.space")`.

### 2. Training notebooks/scripts (GitHub)

Colab-ready notebooks:
- `notebooks/train_grpo.ipynb` — Full SFT + GRPO pipeline, Colab L4, ~7h
- `notebooks/compare_methods.ipynb` — Base vs GRPO evaluation (zero-shot, 1-shot, 3-shot, GRPO v1, v2)
- `notebooks/showcase_sqlenv.ipynb` — Interactive environment demo with Random and Oracle baselines

**Status:** Complete

### 3. Blog post (HuggingFace)

Analyst exploration framing, reward architecture with theory,
training results (0% to ~30%), failure analysis, lessons learned.

Draft: `docs/blog-post-v1.md`

**Status:** Draft v1 complete, not yet published

## Additional Deliverables

### 4. GitHub repo

Clean codebase: zero ruff errors, typed Pydantic models, 280 passing
tests, architecture docs, training artifacts.

**Status:** Complete (F016 quality sweep done)

### 5. Trained checkpoints (HuggingFace Hub)

- `hjerpe/sqlenv-qwen3-0.6b-grpo` (v1)
- `hjerpe/sqlenv-qwen3-0.6b-grpo-v2` (v2)

**Status:** Uploaded

### 6. Green Agent wrapper

OpenEnv evaluation wrapper pattern. A `Policy` protocol with
`evaluate(env, policy, n_episodes, seed)` that reports success rate,
average reward, and average steps. Includes `RandomPolicy` and
`OraclePolicy` baselines for standardized comparison.

Implementation: `evaluation/policies.py`, `evaluation/oracle_policy.py`
Tests: `tests/test_evaluation.py` (17 tests, all passing)
Used by: `notebooks/showcase_sqlenv.ipynb`, `notebooks/compare_methods.ipynb`

**Status:** Complete

### 7. TRL `environment_factory` adapter

HuggingFace TRL's native OpenEnv integration: pass a class with
`reset()` + named tool methods as `environment_factory=` and `GRPOTrainer`
runs the multi-turn tool-calling loop automatically (no custom
`rollout_func`).

Implementation: `training/trl_adapter.py` — class `SQLEnvTRL` exposing
`describe()`, `sample()`, `query()`, `answer()` as tool methods plus
`sql_env_reward_func`. Used by `notebooks/train_grpo.ipynb` (cell 16:
`environment_factory=SQLEnvTRL`).

Note: the adapter instantiates a **local** in-process `SQLEnvironment`,
not a WebSocket client to the hosted HF Space. Intentional — training
needs N parallel sessions (one per generation), and local is faster and
avoids the Space's default 1-session concurrency limit.

**Status:** Complete

## Our Position

No interactive SQL exploration environment exists. SQL Repair
(WALKMAN303) is single-turn fix-it. Calendar Gym (Turing) is
real-world but not SQL. We are the only multi-turn
strategy-discovery environment for database exploration.

Key narrative: "The environment is the product." The trained agent
demonstrates that the environment works, but the contribution is
the action space, reward architecture, and episode structure.

## Open Items

- [x] Deploy HuggingFace Space (live at https://huggingface.co/spaces/hjerpe/sql_env, 2026-03-29)
- [ ] Publish blog post on HuggingFace (planned 2026-04-12)
- [ ] Final review of blog-post-v1.md
- [ ] Verify notebooks run clean on fresh Colab
- [ ] Post-launch: enable `SUPPORTS_CONCURRENT_SESSIONS=True` + `max_concurrent_envs=64` on the Space for external users who want to retrain against the hosted endpoint

## Resources

- OpenEnv tutorial: https://colab.research.google.com/github/meta-pytorch/OpenEnv/blob/main/examples/OpenEnv_Tutorial.ipynb
- OpenEnv GitHub: https://github.com/meta-pytorch/OpenEnv
- OpenEnv docs: https://meta-pytorch.org/OpenEnv/
- Environment hub: https://huggingface.co/openenv
- Discord: https://discord.com/invite/YsTYBh6PD9